Document control


AreaEGI Federation Operations
Procedure status

OwnerAlessandro Paolini 
ApproversOperations Management Board
Approval status

Approved version and date

v5,  

Statement

This document specifies the procedure for modifying the EGI OPS Availability and Reliability profile

Next procedure reviewupon request


Procedure reviews

The following table is updated after every review of this procedure.


DateReview bySummary of resultsFollow-up actions / Comments

 

Alessandro Paolini copy from PROC08_Management_of_the_EGI_OPS_Availability_and_Reliability_Profile in EGI Wiki





Table of contents

Overview

A change in the profile is needed every time a new Nagios test needs to be added/removed to/from the profile, in order to have its results included/removed in/from Availability and Reliability monthly statistics. A change in the OPS Availability and Reliability profile affects the computation of the monthly Availability and Reliability statistics of all EGI Resource Infrastructures and Resource Centres.

Definitions

Please refer to the EGI Glossary for the definitions of the terms used in this procedure.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", “MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Scope

This procedure is applicable to the EGI OPS Availability and Reliability profile. Any change applied is global, as it has effects on all EGI Resource Centres. The ARGO compute engine (CE) uses profiles to generate monthly Availability and Reliability reports.

This procedure is NOT applicable to VO-specific Availability and Reliability profiles used by non-OPS VOs (e.g. user communities, national operations VOs, etc.).

Entities involved in the procedure

Pre-requirements

Steps

StepAction onAction
1ApplicantSends a change request to the attention of the respective own Operations Centre. The request is submitted through a GGUS ticket.

Use the "Affected ROC/NGI" to address the ticket to the appropriate Operations Centre. Template:

Subject: Request for adding/removing XXX(,YYY,...) test(s) to/from the EGI OPS A/R Profile

We would like to request adding/removing XXX(,YYY,...) test(s) to/from from the EGI OPS Profile

Prerequisite data:
* name of ARGO test(s):
* name of service on which the test runs:
* link to documentation page:
* motivation (which part of the infrastructure will be improved with the new test
 or description of users' problems which will be avoided in future - provide list
 of GGUS tickets is possible)
2Operations CentreThe Operations Centre process the request specified in the GGUS ticket for acceptance/rejection.

Motivations for rejection need to specify in the GGUS ticket.

In case of acceptance, a GGUS ticket is opened to EGI Operations Support Unit to forward the request for discussion in the OMB. Template:

Subject: Request for adding/removing XXX(,YYY,...) test(s) to/from the EGI OPS Profile

We would like to request adding/removing XXX(,YYY,...) test(s) to/from the EGI OPS Profile. 
Please see details in GGUS ticket _link to Applicant's GGUS ticket_.
3EGI Operations team  and Resource Infrastructure Operations ManagerEGI Operations team schedules a presentation of the change requested at the next possible OMB meeting. The relevant Resource Infrastructure Operations Manager presents the request during the meeting. The Applicant is invited to attend the meeting. Only one request will be processed at a time as the impact of a change needs to be assessed. Requests are processed depending on their priority, as agreed by the OMB.
4EGI Operations teamOpens a GGUS ticket to "Monitoring (ARGO)" requesting the addition (or removal) to the ARGO_MON_CRITICAL profile.
5ARGO teamImplements the change on the agreed date (generally, the first day of the month).
6EGI Operations teamBefore closing the ticket, verifies there are no anomalies with the new A/R reports and report back to the NGI Managers by email or to the next OMB meeting.
7EGI Operations teamBroadcasts the modification to all relevant parties (i.e. Operations Centres and Resource Centres) through the next Monthly Broadcast. Closes the GGUS ticket