- Created by Valeria Ardizzone, last modified by Catalin Condurache on 2024 Jul 25
Document control
Area | CHM |
---|---|
Procedure status | FINALIZED |
Owner | Matthew Viljoen |
Approval status | APPROVED
|
Approved version and date | v. 13 |
Statement | Procedure how a change should be registered, approved, and reviewed after implementation but before deployment. |
Dissemination Level | TLP:WHITE - Public |
This procedure describes the lifecycle of all changes affecting (either directly or indirectly) EGI-branded services listed within the EGI Service Catalogue as well as the transition of all major changes and new services coming from SPM. This procedure includes registering, assessing, approving and reviewing Change Requests (CRs), in addition to managing pre-approved or 'standard' CRs. Finally the procedure for managing emergency changes is covered. The tool for managing the lifecycle of CRs is Jira. This supports the entire lifecycle of change requests from registering to the historical searching of CRs. The list of pre-approved or 'standard' CRs for service shall be created and maintained in the EGI CHM - Standard Changes. Its existence shall be made known to all operational staff for the service. Some changes brought to Change Management for approval will become a release in Release and Deployment Management RDM having CHM1 triggering the appropriate RDM procedure available under RDM Procedures. The relevant procedure is selected according to the assessment of the appropriate release procedure for the specific change. The EGI Change Management is a centralized process for the EGI Federation. If EGI Resource Provider organizations are delivering EGI branded services and are already running their own internal Change Management process, they may either chose to use the EGI Change Management process or continue to use their own existing process. In the latter case, their existing process needs to meet the minimum requirements of ISO/IEC 20000-1:2018 clause 9.2 (Change Management). In addition to this, EGI Resource Providers should agree to a lightweight audit run by EGI Foundation to verify that these minimum requirements are being met. Finally, if EGI Resource Providers are running their own Change Management process and are planning a change that has the potential to impact other EGI branded services, then the EGI Change Management process should be informed in advance via submission of a ticket to the Jira. Please refer to the EGI Glossary for the definitions of the terms used in this procedure. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", “MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. Emergency change A Change that must be introduced as soon as possible to resolve a Major Incident or to implement a security patch. Standard change A Change that is a recurrent, well-known change that has been proceduralized to follow a pre-defined, relatively risk-free path, and is the accepted response to a specific requirement or set of circumstances, where authorized by the CAB is effectively given in advance of implementation. Normal change Any other type of change that is neither an Emergency change nor a Standard change. Service Supplier: Team responsible for the actual development, release, and deployment of the service or service component. The process is triggered when a new change is determined to be high risk or otherwise would benefit from the CHM process. At this point, the CR is created within Jira. Change Requester The risk level is calculated and added to the CR ticket Risk results from the Impact and Likelihood of the change going wrong. These values are defined in the CHM Risks. Calculation of risk is done by the potential Impact (value of 1-4) multiplied by the Likelihood (value of 1-4) of the change going wrong in the Jira ticket. LOW or MEDIUM risk changes (score =<4) do not need to be submitted and approved unless the Change Owner feels that there is a benefit in doing so, or if the change has the possibility to directly affect other services under the scope of Change Management. While considering risks, the impact of each change should be considered on service delivery, customers, users and other interested parties, policies and plans, capacity, service availability and continuity, information security, other services, and current requests for change as well as the releases and plans for deployment. Any change with a high risk to impact the business outcome of the EGI Federation should be additionally approved by SPM via the procedure SPM1 Add, Change, Retire a service in the service portfolio (access currently limited). The ticket is assigned to the Change Owner who checks the change and validates the risk assessment. Changes with risk level HIGH or EXTREME (score >4) (or candidates for standard changes - see CHM2 Maintain the list, descriptions and step-by-step workflows for well-known and recurring changes) - or lower risk changes having the potential to affect other services - are reviewed and approved. Assessment according to the Change Management Policy is conducted which decides whether the change should leads to an update to the service portfolio. The CAB meets, either regularly or on an ad-hoc basis in response to an important change, to review the CR. At the CAB meeting, the Change Requester attends to answer any questions or provide clarification about the change. If the CAB is satisfied that the CR has been adequately prepared, approval is granted and recorded in Jira. The Change Owner, with the agreement of the Change Requester and CAB, decides whether the change will be implemented as a regular release (a release of a higher risk change or part of multiple changes) or a lightweight release. (a release of a lower impact change) For changes to the service portfolio, e.g. major new features, or changes to the status of a service (e.g. from beta to production) SPM1 Add, Change, Retire a service in the service portfolio (access currently limited) must be triggered by CHM Staff. Service Supplier and RDM Staff Follow the appropriate RDM process to implement the change by following the RDM2 Regular release process or RDM3 Lightweight release process. Updates are done to the same Jira CR ticket. The Jira workflow makes it clear how these steps proceed. The CAB meets, pending changes are reviewed, updating the Jira CR ticket, and closed. The CAB meeting should be recorded in Indico, with minutes recording the attendees, the tickets updated and any other discussions. Once the change is implemented, after a suitable period of time, the change shall undergo a post-implementation review (by adding a comment to the Jira ticket) and closed by the CAB. This review should be done using input provided by the Change Owner and includes assigning the quality of the change to the Jira ticket (see Quality of Change definition). The implementation date of the change should be verified, and updated if it was different from the planned date. Finally, the Jira ticket corresponding to the change is then closed, but still searchable for future reference. A Standard Change is a Change that is a recurrent, well-known change that has been submitted and approved by the CAB as a normal change (see the procedure above). Managing the list of standard changes (and further information about suitable changes that may be considered as candidates for standard changes) is described in CHM2 Maintain the list, descriptions and step-by-step workflows for well-known and recurring changes. The workflow when implementing standard changes is as follows: Step# Responsible Change Owner This can be done by discussions between the Service Supplier and the Change Owner. If necessary, details of the discussion can be added to the Jira ticket. Service Supplier and RDM Staff The change is reviewed, updating the Jira CR ticket and closed. The CAB meeting should be recorded in Indico, with minutes recording the attendees, the tickets updated and any other discussions. Once the change is implemented, after a suitable period of time, the change shall undergo a post-implementation review (by adding a comment to the Jira ticket) and closed by the CAB. This review should be done using input provided by the Change Owner and includes assigning the quality of the change to the Jira ticket (see Quality of Change definition). The implementation date of the change should be verified, and updated if it was different from the planned date. Finally, the Jira ticket corresponding to the change is then closed, but still searchable for future reference. If the change was not successful, the change should be removed from the list of Standard Changes, and any subsequent change similar to it should be submitted as a Normal Change in the usual way (see procedure above). An Emergency Change is one that needs to be done to address a critical situation. In such circumstances, it may not be practical to follow the Change Management procedure above. For example, there may not be time to get sign-off from Change Stakeholders or convene the CAB to discuss and approve the change. However, it is still important for the change to be recorded. Such information will be used in a post-implementation review or a serious incident review. Service Supplier and RDM Staff Follow RDM1 Emergency release process. The change is implemented, after as much consideration of the risks and rollback scenarios as is possible given the emergency situation. Ideally this should be done by the Change Implementer consulting with another member of staff with knowledge of the service. Change Owner The change is reviewed updating the Jira CR ticket and closed. The CAB meeting should be recorded in Indico, with minutes recording the attendees, the tickets updated and any other discussions. Once the change is implemented, after a suitable period of time, the change shall undergo a post-implementation review (by adding a comment to the Jira ticket) and closed by the CAB. This review should be done using input provided by the Change Owner and includes assigning the quality of the change to the Jira ticket (see Quality of Change definition). The implementation date of the change should be verified, and updated if it was different from the planned date. Finally, the Jira ticket corresponding to the change is then closed, but still searchable for future reference. Proposal to update the procedure to include middleware specifics. As pre-requisite a GGUS ticket is expected to have been created to report a Major Incident (cf. ISRM4). The Product team provides here the outcomes of their investigation and proposed solutions i.e. new product release. The Product team asks UMD team for inclusion of the new product release in the next UMD release. UMD team creates the new release, affected sites are asked to update to the new product release (GGUS tkts to each site?? - TBC). Finally GGUS ticket is closed. Step Responsible Action Comment Prerequisites, if any 1 CR (member of Product Team) Provides the outcomes of Product team investigation and proposed solution i.e. new product release The GGUS ticket is updated with Product team findings. Ticket category is set as Incident. Priority set as Top priority. GGUS ticket is created to report a possible Major Incident due to a critical bug or critical security patch 2 ChaRDM staff Liaise with the UMD team for inclusion of the new product release in the next UMD release. 3 UMD team Creates and publish the new release A new release of the specific middleware is made available 4 ChaRDM staff Liaise with affected sites and inform them about availability of new middleware release Affected sites are asked to update to the new product release. Depending on the magnitude of the problem, GGUS ticket can be created to track updates at each affected site. 5 ChaRDM staff GGUS ticket is closed Once the affected sites confirm the new releaase fixed the reported problem, the GGUS ticket is closed and the matter considered resolved. All changes that have a CR associated with them, both past and planned, are listed in Jira. As such, it is possible to obtain a list of when past changes were carried out, as well as obtaining a list of future changes along with their planned dates, by inspecting the list of current changes in Jira.Overview and Scope
Federated Change Management
Definitions
Entities involved in the procedure
Triggers
Normal change workflow
Step# Responsible Action Comment 1 Change Requester Creation of a Change Request (CR) ticket in Jira Creation of a new ticket in Jira. The CR ticket consists of providing information for standard questions asking about which service is affected, the type of change, testing that has been carried out and potential impact if the change is unsuccessful, in addition to rollback plans (if possible). This helps with the review of the change by the CAB. 2 2a CHM staff 3 CAB (including the Change Owner) 4 5 CAB Standard change workflow
Action Comment 1 Change Requester Creation of a Change Request (CR) ticket in Jira and marks it as a Standard Change, referring it to the name of the change as listed in the wiki Creation of a new ticket in the Jira queue corresponding to the correct service, with the planned date of the change. A completed CR document is not required for Standard Changes. CHM Staff CHM staff assigns the ticket to change owner who checks the change 2 The change is considered and either approved or rejected. If it is rejected, the procedure terminates. 3 The change is implemented, updating the same Jira CR ticket. Follow the RDM3 Lightweight release process. The Jira workflow makes it clear how these steps proceed. 5 CAB Emergency change workflow
Core services
Step# Responsible Action Comment 1 Change Requester Creation of a change ticket in Jira Creation of a new ticket in Jira. The CR ticket consists of providing information for standard questions asking about which service is affected, the type of change, testing that has been carried out (if possible), and potential impact if the change is unsuccessful, in addition to rollback plans (if possible). The Emergency type of change should be selected. 2 CHM Staff Assign ticket to the Change Owner who validates the change 3 Change Owner The change is considered and either approved or rejected. If it is rejected, the procedure terminates. The assessment mainly consists in acknowledging that there is a need for an emergency release, accepting the ticket as an Emergency Change, triggering RDM1 Emergency release process. 4 The change is implemented, updating the same Jira CR ticket. 5 Register a Major Incident by following ISRM4 Classifying and managing major incident (access currently limited) if necessary If there is a possibility of the change becoming a Major Incident 6 CAB Middleware
Schedule of Changes