- Created by Matthew Viljoen, last modified by Catalin Condurache on 2024 Nov 22
The EGI Change Management (CHM) Process Introduction and Overview
This is the public homepage of the EGI Change Management Process. Change management within the EGI’s production IT environment is extremely important in ensuring high-quality delivery of IT services.
The purpose of the IT Change Management Policy is to manage higher risk changes in a planned and predictable manner in order to assess risks, assign resources, and minimize any potential negative impact to services. This is done by requiring change owners to prepare submit a Jira ticket including information about the change, which is then considered by the Change Advisory Board (CAB, a group of technical and strategic experts membership, decided by Services and Solution Board, who are tasked with reviewing proposed change requests and reviewing them and approving or rejecting the changes).
The CAB meets to assess and approve changes and is coordinated on the egi-cab@mailman.egi.eu mailing list.
Here is a brief introduction to the different change management procedures. More details may be found in the CHM Procedure Pages and CHM Risk Page.
Normal changes
The basic procedure for a normal change, is as follows:
- For higher risk changes (score >4), the Change Requester (usually the Service Supplier - see below) opens a Jira ticket. Lower risk changes do not need to be recorded unless the change can affect other services under EGI Change Control, or unless the Change Requester feels that there is benefit from doing so.
- The change risk of something going wrong (risk = likelihood X impact) should be recorded in the ticket in preparation for the CAB review. Further details about evaluating risk may be found here.
- If the change is urgent, the Change Requester should send an email to EGI-CAB to convene the CAB which reviews the change with the Change Requester present. Once approved, this decision is recorded on the ticket (along with the planned intervention date) and the change may proceed.
- The change should be implemented following Release and Deployment Management. After the change, the Change Owner should update the Jira ticket with the intervention date, a comment about the outcome of the change.
- The change is reviewed at the next CAB and the ticket closed, with the intervention date recorded, if different from the planned intervention date.
Standard changes
In addition, repeated changes of a similar type may be approved as a standard change by the CAB. Subsequent changes that have first been registered as a normal change and executed without problems do not then require explicit approval (or review) by the CAB; it is sufficient for the Service Instance Owner to submit a Jira ticket to recording the change and confirming that it is a standard change. After the change, the Service Instance Owner can then review the change by adding a comment to the ticket saying whether the change was successful and close the ticket. The list of standard changes is provided below.
Emergency changes
Sometimes changes need to be done to address a critical situation (e.g. patch to fix a newly discovered vulnerability) and there may be insufficient time to follow the normal change procedure.
- The Change Requester opens a Jira ticket.
- CHM staff approve the change
- The change should be implemented following Release and Deployment Management (RDM1). After the change, the Change Owner should update the Jira ticket with the intervention date, a comment about the outcome of the change.
- The change is reviewed at the next CAB and the ticket closed, with the intervention date recorded, if different from the planned intervention date.
Services that fall under EGI Change Control
This is the list of services that are under the scope of the central EGI Change Management process (note that federated EGI services are expected to be under the Change Management process of the service supplier's SMS):
Service | Service Supplier |
---|---|
Accounting repository (Computing and Grid) | UKRI |
Accounting Portal | CESGA |
Application Database (AppDB) | IASA |
Check-in | GRNET |
Collaboration Tools (Document Repository, Indico, Mailing lists, RT, SSO) | EGI Foundation |
Configuration Database (GOCDB) | UKRI |
DataHub | CYFRONET |
Helpdesk (GGUS) | KIT |
Infrastructure Manager | UPV-GRyCAP |
Messaging Service (AMS) | GRNET |
Notebooks | CESNET |
Replay | CESNET |
Operations Portal | CC-IN2P3 |
Service Monitoring (ARGO) | GRNET, CC-IN2P3 |
Software Distribution | UKRI |
Workload Manager | CC-IN2P3/CNRS |
Standard Changes
Service | Title | Description | Change Request Reference |
---|---|---|---|
Collaboration Tools | Reboot of a VM following a regular OS update | Rebooting Collaboration Tools VMs following regular OS updates. | IMSCHM-28 - Getting issue details... STATUS |
DataHub | Upgrade Onedata on the EGI DataHub | Upgrade of the EGI DataHub Onezone. | IMSCHM-50 - Getting issue details... STATUS |
DataHub | Upgrade Oneprovider on the EGI DataHub | Upgrade of the EGI DataHub Oneprovider | IMSCHM-277 - Getting issue details... STATUS |
Helpdesk (GGUS) | Add new VO | Add new VO name to 'Concerned VO' list in GGUS. | IMSCHM-248 - Getting issue details... STATUS |
Helpdesk (GGUS) | Remove VO | Remove VO name from 'Concerned VO' list in GGUS. | IMSCHM-252 - Getting issue details... STATUS |
Helpdesk (GGUS) | Add new support unit | Add new Support Unit to Helpdesk | IMSCHM-253 - Getting issue details... STATUS |
Notebooks | Enable a new Virtual Organization (VO) | Allowing access to the EGI Notebooks platform to all members of the specified new VO. | IMSCHM-64 - Getting issue details... STATUS |
Notebooks | Access to a new CVMFS repository | Enable access to a new CVMFS repository | IMSCHM-276 - Getting issue details... STATUS |
Confluence | Upgrade of Confluence version | Upgrade of Confluence version | IMSCHM-99 - Getting issue details... STATUS |
Confluence | Upgrade of kernel/os version | Upgrade of kernel/os version | IMSCHM-99 - Getting issue details... STATUS |
Jira (part of Collab Tools) | Minor and patch release updates | Minor and patch release updates of Jira instance | IMSCHM-244 - Getting issue details... STATUS |
Infrastructure Manager | Minor and patch release updates | Minor and patch release updates of Infrastructure Manager | IMSCHM-242 - Getting issue details... STATUS |
Quality of Change
Definition | |
---|---|
Failed | The change did not complete successfully and had to be rolled back or worked around by following unplanned procedures. Details shall be recorded in the CR ticket. |
Problematic | Implementation of the change did not proceed entirely according to the plan but these were overcome and the change was ultimately successful. Details of the problems shall be recorded in the CR ticket. |
Successful | Implementation of the change went according to the plan as described in the CR |
Change management operated by other organisation
The EGI Change Management is a centralised process for the EGI Federation. If organisations are providing EGI branded services and are already running their own internal Change Management process, they may continue to do so if their process meets the essential requirements of ISO20k with respect to change management:
- is there a systematic way of evaluating the risk for changes?
- is there a procedure within the organisation for approving high-risk changes
- are high-risk changes recorded (who implemented the change, when was it implemented and what was its outcome)?
In addition to the above, if any changes are planned that have the potential to impact other EGI branded services, then the EGI Change Management process should be informed in advance by submission of a ticket to the Jira queue linked above.
EGI should keep track of organisations running their own internal Change Management process and should periodically run a lightweight audit to ensure that the above requirements are being met.
Contact
If you have any questions relating to EGI Change Management, please contact Matthew Viljoen (matthew DOT viljoen AT egi.eu).