Document control

AreaEGI Federation Operations
Procedure status

FINAL

OwnerAlessandro Paolini 
ApproversOperations Management Board
Approval status

APPROVED

Approved version and date

v.4,  

Statement

This document specifies the procedure for adding new probes to the ARGO Monitoring service

Next procedure reviewupon request

Procedure reviews

The following table is updated after every review of this procedure.

DateReview bySummary of resultsFollow-up actions / Comments

 

Alessandro Paolini copy from PROC07_Adding_new_probes_to_ARGO in EGI Wiki




Table of contents

Overview

The purpose of this document is to clearly describe the procedure for adding new Nagios probes to the ARGO Monitoring service.

Definitions

Please refer to the EGI Glossary for the definitions of the terms used in this procedure.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", “MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

The key words Metric, Probe are defined in the following way:

  • Metric: Metric instances are tuples of flavour, metric name and optionally FQAN. Metric is a synonym for tests used in the development documentation. In operations document "test" is the reference term to be used.
  • Probe: a code which implements single or multiple tests.

Guidelines for monitoring probes

The following document describes the policy to develop, package and integrate new probes into the ARGO Monitoring Engine:

Scope

This procedure only applies to probes which are run under OPS VO and whose scope is global. The implication of this is that any change requested through this procedure has an impact on all of the Operations Centres in EGI.

This procedure does not apply to SAM internal probes which perform monitoring of individual components on the ARGO instance (e.g. process monitoring, ActiveMQ connections, etc.).

Entities involved in the procedure

  • Applicant. The Applicant submits a request for adding a new probe. Anyone in the operations community - Resource Centre administrators, Operations Centre staff, Resource Infrastructure Operations Managers - is allowed to submit such a request. The Applicant is responsible for the development and maintenance of the proposed Nagios probe. The Nagios probe use case needs to be well documented.
  • COO. COO is OMB meeting chair, responsible of processing the request and of accepting/refusing it with the consensus of the Resource Infrastructure Providers.
  • ARGO Product Team. The ARGO Product Team is responsible of scheduling, integrating and releasing the accepted probes.

Steps

StepResponsibleAction
1ApplicantA request is submitted through a EGI GGUS ticket (https:/ggus.eu) assigned to Operations SU.
Subject: Request for adding new probe XXX to ARGO-SAM

We would like to request adding new probe XXX to ARGO-SAM release

Prerequisite data:
* name of the Nagios probe:
* name of service on which the test runs:
* link to documentation page:
* motivation (which part of the infrastructure will be improved with the new probe
 or description of users' problems which will be avoided in future - provide list
 of GGUS tickets is possible)
2COO / Operations

Reviews the requirement submitted, sends an email to NGI Managers to make them aware of the request, and:

  • if it is a completely new probe for monitoring a new technology, also schedules a presentation of the new probe at the next possible OMB meeting, with the Applicant as speaker.
  • if it is a new version of an existing probe, go to step 5.
3ApplicantPresents the new probe.
4OMBOMB decides if the new probe will be included in ARGO. The next steps are performed only if the probe is accepted.
5COO / OperationsReassigns the ticket to "Monitoring (ARGO)" in order to deploy the probe on the test instance
6ARGO Product Team

Deploys the new probe on the test instance and reports back to Operations any issues and the outcomes of the test

7ARGO Product Team

Agree with Operations when deploying the new probe on the production instance, with its inclusion in the ARGO_MON profile.

8ARGO Product Team / OperationsCloses the initial GGUS ticket after the release of the probes