Document control
Table of contents
Overview and scope
This procedure describes the lifecycle of emergency changes affecting (either directly or indirectly) the EGI-branded services listed within the EGI Service Catalogue. This procedure includes registering, assessing, approving and reviewing Change Requests (CRs), as well as planning and implementation of approved emergency changes into production.
The tool for managing the lifecycle of CRs is Jira. It supports the entire lifecycle of change requests from registering to the historical searching of CRs.
Definitions
Please refer to the EGI Glossary for the definitions of the terms used in this procedure.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", “MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
Emergency change
A change that must be introduced as soon as possible to resolve a Major Incident or to implement a security patch.
Standard change
A recurrent, well-known change that has been proceduralised to follow a pre-defined, relatively risk-free path, and is the accepted response to a specific requirement or set of circumstances, where authorisation by the CAB is effectively given in advance of implementation.
Normal change
Any other type of change that is neither an Emergency change nor a Standard change.
Entities involved in the procedure
- Change Requester (CR): The person who requests the change, wants it to happen, and is following it through from initial planning to implementation and review.
- Change and Release Owner (CRO): The person in charge of the change and release, following it through from initial planning to implementation and review. Initiates the Release procedure by marking a ChaRDM ticket as ready to be released, and controls and coordinates the activities in the lifecycle of a specific release.
- ChaRDM Staff: Support the Change and Release Owner over all the process, and may provide further people for testing the service or service component. Usually, it is the SDIS team.
- CRM Staff: Will be informed of the release so that they can interface with customers if needs be.
NOC-Managers: Are informed regarding the emergency release of the service or service component.
- Service Supplier: Team responsible for the actual development, release, and deployment of the service or service component.
- CAB: Change Advisory Board is a group of technical and strategic experts (membership decided by SSB) who are tasked with reviewing proposed change requests and reviewing them and approving or rejecting the changes.
Triggers
The process is triggered when a new change is determined to be critical in resolving a Major Incident or implementing a security patch. At this point, the Change Request is recorded in a Jira ticket for Core services, while for middleware the GGUS helpdesk is used instead.
Emergency change workflow
An Emergency Change is a change that needs to be done to address a critical situation. In such circumstances, it may not be practical to follow neither of the Normal or Standard Change workflows (ChaRDM.PR.02 and ChaRDM.PR.03). For example, there may not be time to get sign-off from Change Stakeholders or convene the CAB to discuss and approve the change. However, it is still important for the change to be recorded. Such information will be used in a post-implementation review or a serious incident review.
In case the release has to be canceled at any point in time, the workflow resumes at step 8.
Core services
Step | Responsible | Action | Comment | Prerequisites, if any |
1 | Change Requester | Creation of a Change Request (CR) ticket in Jira | A new ticket is created in Jira. The CR ticket consists of providing information for standard questions asking about which service is affected, the type of change, testing that has been carried out (if possible), and potential impact if the change is unsuccessful, in addition to rollback plans (if possible). The Emergency type of change should be selected. | |
2 | ChaRDM staff | The ticket is assigned to the Change and Release Owner who validates the change | ||
3 | Change and Release Owner | The change is considered and either approved or rejected. If it is rejected, the procedure terminates here --- | The assessment mainly consists in acknowledging that there is a need for an emergency release, accepting the ticket as an Emergency Change. | |
4 | Change and Release Owner ChaRDM staff | Ensures that the Jira ticket contains the following information, and interacts with the Service Supplier to collect it:
This information is also meant to capture the CI baseline. Gives approval for implementation by updating the ticket. | The change is implemented, after as much consideration of the risks and rollback scenarios as is possible given the emergency situation. Ideally this should be done by (usually) the Service supplier consulting with another member of staff with knowledge of the service. | Change has been accepted to be implemented as an emergency release |
5 | Service Supplier | Registers a downtime for the service in GOCDB, using 'at risk' if no downtime is expected. | ||
6 | Change and Release Owner ChaRDM staff |
| ||
7 | Service Supplier | Deploys the release during the time slot recorded in GOCDB. | ||
8 | Change and Release Owner ChaRDM staff | If the release took place, waits for a week to have confirmation that the new release fixed the critical problem. Comments the Jira ticket to provide feedback about the release. | ||
9 | Change and Release Owner | Register a Major Incident by following ISRM4 Classifying and managing major incident (access currently limited) if necessary | Change addresses a critical situation following a Major incident | |
10 | CAB | The change is reviewed updating the Jira CR ticket and closed. The CAB meeting should be recorded in Indico, with minutes recording the attendees, the tickets updated and any other discussions. | Once the change is implemented, after a suitable period of time, the change shall undergo a post-implementation review (by adding a comment to the Jira ticket) and closed by the CAB. This review should be done using input provided by the Change and Release Owner and includes assigning the quality of the change to the Jira ticket (see Quality of Change). The implementation date of the change should be verified, and updated if it was different from the planned date. Finally, the Jira ticket corresponding to the change is then closed, but still searchable for future reference. |
Middleware
As pre-requisite a GGUS ticket is expected to have been created to report a Major Incident (cf. ISRM4). The Product team provides here the outcomes of their investigation and proposed solutions i.e. new product release. The Product team asks UMD team for inclusion of the new product release in the next UMD release. UMD team creates the new release, affected sites are asked to update to the new product release (GGUS tkts to each site?? - TBC). Finally GGUS ticket is closed.
Step | Responsible | Action | Comment | Prerequisites, if any |
1 | CR (member of Product Team) | Provides the outcomes of Product team investigation and proposed solution i.e. new product release | The GGUS ticket is updated with Product team findings. Ticket category is set as Incident. Priority set as Top priority. | GGUS ticket is created to report a possible Major Incident due to a critical bug or critical security patch |
2 | ChaRDM staff | Liaise with the UMD team for inclusion of the new product release in the next UMD release. | ||
3 | UMD team | Creates and publish the new release | A new release of the specific middleware is made available | |
4 | ChaRDM staff | Liaise with affected sites and inform them about availability of new middleware release | Affected sites are asked to update to the new product release. Depending on the magnitude of the problem, GGUS ticket can be created to track updates at each affected site. | |
5 | ChaRDM staff | GGUS ticket is closed | Once the affected sites confirm the new release fixed the reported problem, the GGUS ticket is closed and the matter considered resolved. |
Schedule of changes
All changes that have a Change Request associated with them, both past and planned, are listed in Jira. As such, it is possible to obtain a list of when past changes were carried out, as well as obtaining a list of future changes along with their planned dates, by inspecting the list of current changes in Jira.