Document control

AreaEGI Federation Operations
Procedure status

FINAL

OwnerMatthew Viljoen 
ApproversOperations Management Board
Approval status

APPROVED

Approved version and date

v3,  

Statement

A procedure describing the steps to decommission Resource Centres in the EGI infrastructure.

Next procedure reviewon demand

Procedure reviews

The following table is updated after every review of this procedure.

DateReview bySummary of resultsFollow-up actions / Comments

 

Alessandro Paolini copy from PROC11_Resource_Centre_Decommissioning in EGI Wiki




Table of contents

Overview

This procedure defines the good practices between a Resource Centre (aka site) and its users when the resource centre is being decommissioned.

It should be noted that the whole process of decommissioning a Resource Centre in an ordered manner will take up to four months. Note: the site hardware decommissioning can start after one month

Note: A separate document provides the process for Resource Centre Registration and Certification.

Definitions

Please refer to the EGI Glossary for the definitions of the terms used in this procedure.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", “MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Entities involved in the procedure

  • Resource Centre Operations Manager: person who is responsible for initiating the decommissioning procedure by contacting the Resource Infrastructure Operations Manager.
  • Resource Infrastructure Operations Manager (aka NGI operations manager) : person who is responsible for finding and agreement with the Resource Centre about the timeline, in order to minimize the impact on the user communities and infrastructure. Resource Infrastructure Operations Manager is responsible of taking care that this procedure and related procedures are properly followed.
  • Virtual Organizations (VO's): Data and other stateful objects of the supported VO's may be stored at the Resource Centre.
  • Virtual Organizations (VO) managers: persons who are responsible for retrieving this data from the Resource Centre in due time. Tracking is done through their support unit in GGUS. If such support unit is not available, the VOs should be contacted directly using the contact information available in the VO ID card.
  • Operations Centre: entity which is technically responsible for carrying out the main ticket and database updates.

The Resource Infrastructure Operations Manager can determine the level of involvement of other actors together with the Resource Centre Operations Manager.

Contact information

  • EGI Operations: operations (at) mailman.egi.eu
  • EGI CSIRT: egi-csirt-team (at) mailman.egi.eu
  • A list of EGI Operations Centres and Resource Centres with their respective contact information is available on GOCDB
  • The list of VO's served by a specific Resource Centre can be retrieved from the BDII and VAPOR.
  • The VO managers and their contact information for a specific VO can be retrieved from the VO ID Cards on the Operations Portal.

Actions and responsibilities

Resource Centre Operations Manager

  1. A Resource Centre Operations Manager is responsible for all Resource Centres (RC's) within its respective domain.
  2. The Resource Centre Operations Manager of a Resource Centre in case of RC decommission is REQUIRED
    • to contact the respective NGI if the Resource Centre is located in Europe,
    • to contact the respective Resource Infrastructure Provider active in a relevant geographical area if the Resource Centre is outside Europe, about the intention of the Resource Centre to decommission operation.
  3. The Resource Centre Operations Manager is REQUIRED to provide the necessary Resource Centre information needed to complete the decommission process, and he/she is responsible for its accuracy and maintenance.
  4. The Resource Centre Operations Manager MUST attend Resource Centre decommissioning applications and MUST provide feedback to the requesting partners in a timely manner to accept or reject the requests received.

Resource Infrastructure Operations Manager

  1. A Resource Infrastructure Provider is REQUIRED to be responsible for all Resource Centres within its respective jurisdiction. For this reason the Resource Infrastructure Provider is responsible for assuring that all the Resource Centres follow this procedure for services decommissioning.

VO's and VO managers

  1. give the users the relevant information about the decommissioning (deadlines, involved resources, files, how to handle it)
  2. follow-up and support users in their file migration procedures until the deadline
  3. inform Resource Centre about the status of the migration(s)

Operations Centre

  1. The Operations Centre is responsible for decommissioning Resource Centre.
  2. The Operations Centre is responsible for updating the corresponding entries in the EGI configuration repository GOCDB.
  3. The Operations Centre MUST keep Resource Centre information up to date and in all operations tools as needed, such as the local NAGIOS server for monitoring of certified Resource Centres, the local helpdesk (if available) for the registration of the Resource Centre support staff, etc.

Workflow

The various steps required by both the Resource Infrastructure Operations Manager and the Resource Centre Operations Manager are explained in the tables below. The procedure below covers the transition from the Certified to the Closed status. The transition from the Suspended to the Closed status can be derived analogously.

The general status flow that a Resource Centre is allowed to follow is illustrated by the following diagram. Information on Resource Centre status and on how to manipulate it is available from GOCDB Documentation.

A Resource Centre cannot be in Candidate state for more than two month, and Suspended state for longer than four months. After this period the Resource Centre SHOULD be closed.

Steps

  • Actions tagged RC are the responsibility of the Resource Centre Operations Manager.
  • Actions tagged RP are the responsibility of the Resource Infrastructure Operations Manager.
  • Actions tagged OC are the responsibility of the Operations Centre
#ResponsibleAction
1RC
  1. The Resource Centre Operations Manager contacts her Resource Infrastructure Operations Manager that the Resource Centre is going to be decommissioned and together they agree on the plan for decommissioning it.
    • The Resource Centre Operations Manager opens a GGUS ticket to Operations Center Support Unit it belongs to, which will be used as Parent ticket to track the whole process. The ticket must remain in an open status until the site is closed in GOCDB. This Parent ticket can be used as parent ticket for the resource centre's services decommission procedures (see PROC12, step 1).
2RC
  1. The Resource Centre Operations Manager should use the broadcast tool (login required) to announce to both VO managers and VO users of the VOs supported by the RC (excluding Ops and dteam VO) that it is starting the decommissioning procedure:
    • Announce a detailed (agreed) timeline for the decommissioning and that the Resource Centre will schedule downtimes of its resources or site downtime to prevent any further usage. In the timeline must be clearly listed the deadlines for the VO Managers' actions.
    • In the ticket should be announced also the list of all the resource centre's decommissioning services and the scheduled date of decommission (this supersedes PROC12 step 2).
    • The timeline is recorded in the Parent ticket (including the timelines of all the services).
    • The broadcast link is recorded in the Parent ticket.
    • The downtime should start no earlier than 15 days and no later than one month after the broadcast.
    • State that the aim is to make the status change to “suspended” in GOCDB within 6 (or 8) weeks from broadcast date.
3RC, VO, RP
  1. The resource centre starts the Service Decommissioning Procedure () for every production service of the site.
    • The procedures for the services can be run in parallel
    • Service decommissioning procedures can start from step 3, using this procedure parent ticket as parent ticket for all the decommissioning procedures.
4OC
  1. Once the PROC12 step 7 -all services end the scheduled downtime- is completed for all services of the site:
    • The Resource Centre's status is changed to suspended.
    • This action must be recorded in the parent ticket.
  2. At this point the Resource Centre is no longer listed in the topBDIIs of EGI and cannot be reached by simply submitting a job. It might still be possible to directly access the Resource Centre for members of VOs which the Resource Centre supported. If hardware is closed down, the Resource Centre will need to address this, possibly informing these users that their data could be at risk.
5RC
  1. Logs are to be kept at the Resource Centre, available for the period of time requested by the Security Traceability and Logging Policy.
6OC
  1. Resource Infrastructure Operations Manager should email the EGI operations team (operations 'at' egi.eu) and EGI CSIRT ( contact) at the end of the 90 days period informing about end of the logs retention period and that site is going to be closed. Revoke the roles of Resource Centre Administrator and people relevant to this Resource Centre in GOCDB and to the relevant CA if appropriate. Resource Infrastructure Operations Manager is to clean the VOMRS dteam server accordingly. In case there is no user left relevant to this very Resource Centre, the Resource Infrastructure Operations Manager has to inform his/her CA in order to close this entity officially to avoid keeping “ghost entities”.
  2. Site is closed in GOCDB, at the end of the logs retention period.
    • This action must be recorded in the parent ticket
  • NOTE: People will have to separately handle any subscriptions to mailing lists which have been initiated by Resource Centre Administrator and which were not triggered by contact definitions in the GOCDB.
7OC
  1. Parent ticket is closed.
    • This operations can be performed only if all the service decommissioning procedures are completed