Page tree
Skip to end of metadata
Go to start of metadata

Document control

AreaEGI Federation Operations
Procedure status

FINAL

OwnerAlessandro Paolini 
ApproversOperations Management Board
Approval status

APPROVED

Approved version and date

v7,  

Statement

The purpose of this document is to clearly describe the actions and the relative steps to be undertaken for including Nagios tests into the ARGO_MON_OPERATORS profile: in this way, the operations dashboard will display an alarm in case the test fails.

Next procedure reviewupon request

Procedure reviews

The following table is updated after every review of this procedure.

DateReview bySummary of resultsFollow-up actions / Comments

 

Alessandro Paolini copy from PROC06_Setting_Nagios_test_status_to_operations in EGI Wiki




Table of contents

Overview

The purpose of this document is to clearly describe the actions and the relative steps to be undertaken for including Nagios tests into the ARGO_MON_OPERATORS profile: in this way, the operations dashboard will display an alarm in case the test fails.

This procedure only applies for tests run under OPS VO and its range is global, applies for all Operations Centres in EGI project.

Definitions

Please refer to the EGI Glossary for the definitions of the terms used in this procedure.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", “MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Entities involved in the procedure

Applicant: who makes the request

Operations: team provided by EGI Foundation and responsible for validating the request and follow-up the process

NGIs operators: follow-up with their RCs any problem associated to the given monitoring probe

Triggers

The need of raising alarms on the ROD dashboard for failures with a given monitoring probe.

Prerequisites

The ARGO test needs to meet the following requirements.

  1. It satisfies quality criteria in agreement with the UMD operational capabilities quality criteria: https://documents.egi.eu/document/240.
  2. It is properly documented.
  3. It must be part of an official nagios release.
  4. It must have been deployed in production for at least one month without problems.
  5. It must be available for validation by Operations

Sending a request

  • Anybody can submit the request for making the test an operations test.
  • The request should be submitted to Operations via a GGUS ticket.

Steps

Step#ResponsibleAction
1ApplicantOpens a GGUS ticket to Operations to start the process.
Subject: Request for setting XXX test an operations test

Dear Operations,

We would like to request for setting XXX test an operations test

Prerequisite data:
* name of nagios probe:
* name of service on which the test runs: 
* link to documentation page:
* motivation (which part of the infrastructure will be improved by making XXX test 
 or description of users' problems which will be avoided in future - provide list 
 of GGUS tickets is possible)

Best Regards
XXX
2OperationsChecks the status of the Nagios probe to see if it meets the specified quality criteria.
3OperationsOperations contacts the OMB to request the approval of the new operations test. Date is specified (at least 1 month in future)
4NGIsRequest to the ROD teams to try making the test OK. 75% OK in total (entire EGI) is understood as threshold for passing to the next step. If not possible to proceed, report problems to OMB.
5OperationsReassigns the ticket to "Monitoring (ARGO)" agreeing on the date for the inclusion of the test in the operations profile
6OperationsThe announcement about the new operations test is Monthly broadcast 

(This broadcast should be sent to site managers, NGI managers and ROD teams) See the template below for an indication of the message content.

Subject: XXX have been added to the EGI Operations Profile on XXX 

Dear All,

We would like to announce that test XXX will become operational on XXX

Short description of the test:

The documentation can be found:

Best regards,
7OperationsFinal check. Close parent ticket