Document control

AreaCHM
Procedure status

FINALIZED

OwnerMatthew Viljoen 
Approval status

APPROVED

Approved version and date

v. 13   

Statement

Procedure how a change should be registered, approved, and reviewed after implementation but before deployment.

Dissemination Level

TLP:WHITE - Public

Overview and Scope

This procedure describes the lifecycle of all changes affecting (either directly or indirectly) EGI-branded services listed within the EGI Service Catalogue as well as the transition of all major changes and new services coming from SPM. This procedure includes registering, assessing, approving and reviewing Change Requests (CRs), in addition to managing pre-approved or 'standard' CRs. Finally the procedure for managing emergency changes is covered.

The tool for managing the lifecycle of CRs is Jira. This supports the entire lifecycle of change requests from registering to the historical searching of CRs. 

The list of pre-approved or 'standard' CRs for service shall be created and maintained in the EGI CHM - Standard Changes. Its existence shall be made known to all operational staff for the service. 

Some changes brought to Change Management for approval will become a release in Release and Deployment Management RDM having CHM1 triggering the appropriate RDM procedure available under RDM Procedures. The relevant procedure is selected according to the assessment of the appropriate release procedure for the specific change.

Federated Change Management

The EGI Change Management is a centralized process for the EGI Federation. If EGI Resource Provider organizations are delivering EGI branded services and are already running their own internal Change Management process, they may either chose to use the EGI Change Management process or continue to use their own existing process.  In the latter case, their existing process needs to meet the minimum requirements of ISO/IEC 20000-1:2018 clause 9.2 (Change Management).  In addition to this, EGI Resource Providers should agree to a lightweight audit run by EGI Foundation to verify that these minimum requirements are being met.  Finally, if EGI Resource Providers are running their own Change Management process and are planning a change that has the potential to impact other EGI branded services, then the EGI Change Management process should be informed in advance via submission of a ticket to the Jira.

Definitions

Please refer to the EGI Glossary for the definitions of the terms used in this procedure.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", “MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Emergency change

A Change that must be introduced as soon as possible to resolve a Major Incident or to implement a security patch.

Standard change 

A Change that is a recurrent, well-known change that has been proceduralized to follow a pre-defined, relatively risk-free path, and is the accepted response to a specific requirement or set of circumstances, where authorized by the CAB is effectively given in advance of implementation.

Normal change

Any other type of change that is neither an Emergency change nor a Standard change.

Entities involved in the procedure

  • Change Requester: The person who requests the change, wants it to happen, and is following it through from initial planning to implementation and review.
  • Change Owner: The person in charge of the change, following it through from initial planning to implementation and review. Initiates the Release procedure by marking a CHM ticket as ready to be released, assigning it to RDM.
  • CHM Staff: Support the Change Owner over all the process.
  • Release Owner: The person in charge of the release, control and coordinate the activities in the lifecycle of a specific release.
  • RDM Staff: Support the Release Owner over all the process and may provide further people for testing the service or service component. Usually, it is the SDIS team.
  • Service Supplier: Team responsible for the actual development, release, and deployment of the service or service component.

  • CAB: Change Advisory Board is a group of technical and strategic experts (membership decided by SSB) who are tasked with reviewing proposed change requests and reviewing them and approving or rejecting the changes.

Triggers

The process is triggered when a new change is determined to be high risk or otherwise would benefit from the CHM process.  At this point, the CR is created within Jira.

Normal change workflow

Step# ResponsibleActionComment
1Change RequesterCreation of a Change Request (CR) ticket in JiraCreation of a new ticket in Jira.  The CR ticket consists of providing information for standard questions asking about which service is affected, the type of change, testing that has been carried out and potential impact if the change is unsuccessful, in addition to rollback plans (if possible). This helps with the review of the change by the CAB.
2

Change Requester

The risk level is calculated and added to the CR ticket

Risk results from the Impact and Likelihood of the change going wrong.  These values are defined in the CHM Risks.  Calculation of risk is done by the potential Impact (value of 1-4) multiplied by the Likelihood (value of 1-4) of the change going wrong in the Jira ticket. 

LOW or MEDIUM risk changes (score =<4) do not need to be submitted and approved unless the Change Owner feels that there is a benefit in doing so, or if the change has the possibility to directly affect other services under the scope of Change Management.

While considering risks, the impact of each change should be considered on service delivery, customers, users and other interested parties, policies and plans, capacity, service availability and continuity, information security, other services, and current requests for change as well as the releases and plans for deployment.

Any change with a high risk to impact the business outcome of the EGI Federation should be additionally approved by SPM via the procedure SPM1 Add, Change, Retire a service in the service portfolio (access currently limited).

2aCHM staff

The ticket is assigned to the Change Owner who checks the change and validates the risk assessment.


3CAB (including the Change Owner)

Changes with risk level HIGH or EXTREME (score >4) (or candidates for standard changes - see CHM2 Maintain the list, descriptions and step-by-step workflows for well-known and recurring changes) - or lower risk changes having the potential to affect other services - are reviewed and approved.

Assessment according to the Change Management Policy is conducted which decides whether the change should leads to an update to the service portfolio.

The CAB meets, either regularly or on an ad-hoc basis in response to an important change, to review the CR. At the CAB meeting, the Change Requester attends to answer any questions or provide clarification about the change. If the CAB is satisfied that the CR has been adequately prepared, approval is granted and recorded in Jira.

The Change Owner, with the agreement of the Change Requester and CAB, decides whether the change will be implemented as a regular release (a release of a higher risk change or part of multiple changes) or a lightweight release. (a release of a lower impact change)

For changes to the service portfolio, e.g. major new features, or changes to the status of a service (e.g. from beta to production) SPM1 Add, Change, Retire a service in the service portfolio (access currently limited) must be triggered by CHM Staff.

4

Service Supplier and RDM Staff

Follow the appropriate RDM process to implement the change by following the RDM2 Regular release process or RDM3 Lightweight release processUpdates are done to the same Jira CR ticket. 

The Jira workflow makes it clear how these steps proceed.

5CAB

The CAB meets, pending changes are reviewed, updating the Jira CR ticket, and closed.

The CAB meeting should be recorded in Indico, with minutes recording the attendees, the tickets updated and any other discussions.

Once the change is implemented, after a suitable period of time, the change shall undergo a post-implementation review (by adding a comment to the Jira ticket) and closed by the CAB.  This review should be done using input provided by the Change Owner and includes assigning the quality of the change to the Jira ticket (see Quality of Change definition).

The implementation date of the change should be verified, and updated if it was different from the planned date.  Finally, the Jira ticket corresponding to the change is then closed, but still searchable for future reference.

Standard change workflow

A Standard Change is a Change that is a recurrent, well-known change that has been submitted and approved by the CAB as a normal change (see the procedure above).   Managing the list of standard changes (and further information about suitable changes that may be considered as candidates for standard changes) is described in CHM2 Maintain the list, descriptions and step-by-step workflows for well-known and recurring changes.  The workflow when implementing standard changes is as follows:

Step#

Responsible

ActionComment
1Change RequesterCreation of a Change Request (CR) ticket in Jira and marks it as a Standard Change, referring it to the name of the change as listed in the wikiCreation of a new ticket in the Jira queue corresponding to the correct service, with the planned date of the change.  A completed CR document is not required for Standard Changes.

CHM StaffCHM staff assigns the ticket to change owner who checks the change
2

Change Owner

The change is considered and either approved or rejected.  If it is rejected, the procedure terminates.

This can be done by discussions between the Service Supplier and the Change Owner.  If necessary, details of the discussion can be added to the Jira ticket.

3

Service Supplier and RDM Staff

The change is implemented, updating the same Jira CR ticket. Follow the RDM3 Lightweight release process. The Jira workflow makes it clear how these steps proceed.
5CAB 

The change is reviewed, updating the Jira CR ticket and closed.

The CAB meeting should be recorded in Indico, with minutes recording the attendees, the tickets updated and any other discussions.

Once the change is implemented, after a suitable period of time, the change shall undergo a post-implementation review (by adding a comment to the Jira ticket) and closed by the CAB.  This review should be done using input provided by the Change Owner and includes assigning the quality of the change to the Jira ticket (see Quality of Change definition).

The implementation date of the change should be verified, and updated if it was different from the planned date.  Finally, the Jira ticket corresponding to the change is then closed, but still searchable for future reference.

If the change was not successful, the change should be removed from the list of Standard Changes, and any subsequent change similar to it should be submitted as a Normal Change in the usual way (see procedure above).

Emergency change workflow

An Emergency Change is one that needs to be done to address a critical situation.  In such circumstances, it may not be practical to follow the Change Management procedure above.  For example, there may not be time to get sign-off from Change Stakeholders or convene the CAB to discuss and approve the change.  However, it is still important for the change to be recorded.  Such information will be used in a post-implementation review or a serious incident review.

Core services

 Step#ResponsibleActionComment
1Change RequesterCreation of a change ticket in JiraCreation of a new ticket in Jira. The CR ticket consists of providing information for standard questions asking about which service is affected, the type of change, testing that has been carried out (if possible), and potential impact if the change is unsuccessful, in addition to rollback plans (if possible). The Emergency type of change should be selected.
2CHM StaffAssign ticket to the Change Owner who validates the change
3Change OwnerThe change is considered and either approved or rejected. If it is rejected, the procedure terminates.The assessment mainly consists in acknowledging that there is a need for an emergency release, accepting the ticket as an  Emergency Change, triggering RDM1 Emergency release process.
4

Service Supplier and RDM Staff

The change is implemented, updating the same Jira CR ticket. 

Follow RDM1 Emergency release process.

The change is implemented, after as much consideration of the risks and rollback scenarios as is possible given the emergency situation. Ideally this should be done by the Change Implementer consulting with another member of staff with knowledge of the service.

5

Change Owner

Register a Major Incident by following ISRM4 Classifying and managing major incident (access currently limited) if necessaryIf there is a possibility of the change becoming a Major Incident
6CAB

The change is reviewed updating the Jira CR ticket and closed.

The CAB meeting should be recorded in Indico, with minutes recording the attendees, the tickets updated and any other discussions.

Once the change is implemented, after a suitable period of time, the change shall undergo a post-implementation review (by adding a comment to the Jira ticket) and closed by the CAB.  This review should be done using input provided by the Change Owner and includes assigning the quality of the change to the Jira ticket (see Quality of Change definition).

The implementation date of the change should be verified, and updated if it was different from the planned date.  Finally, the Jira ticket corresponding to the change is then closed, but still searchable for future reference.


Proposal to update the procedure to include middleware specifics.

Middleware

As pre-requisite a GGUS ticket is expected to have been created to report a Major Incident (cf. ISRM4). The Product team provides here the outcomes of their investigation and proposed solutions i.e. new product release. The Product team asks UMD team for inclusion of the new product release in the next UMD release. UMD team creates the new release, affected sites are asked to update to the new product release (GGUS tkts to each site?? - TBC). Finally GGUS ticket is closed.

Step

Responsible

Action

Comment

Prerequisites, if any

1

CR (member of Product Team)

Provides the outcomes of Product team investigation and proposed solution i.e. new product release

The GGUS ticket is updated with Product team findings. Ticket category is set as Incident. Priority set as Top priority

GGUS ticket is created to report a possible Major Incident due to a critical bug or critical security patch

2

ChaRDM staff

Liaise with the UMD team for inclusion of the new product release in the next UMD release.



3

UMD team

Creates and publish the new release

A new release of the specific middleware is made available


4

ChaRDM staff

Liaise with affected sites and inform them about availability of new middleware release

Affected sites are asked to update to the new product release. Depending on the magnitude of the problem, GGUS ticket can be created to track updates at each affected site.


5

ChaRDM staff

GGUS ticket is closed

Once the affected sites confirm the new releaase fixed the reported problem, the GGUS ticket is closed and the matter considered resolved.


Schedule of Changes

All changes that have a CR associated with them, both past and planned, are listed in Jira.  As such, it is possible to obtain a list of when past changes were carried out, as well as obtaining a list of future changes along with their planned dates, by inspecting the list of current changes in Jira.

Table of Content