Introduction
The EGI Staged Rollout is a procedure by which certified updates of the supported middleware are first released to and tested by Early Adopter sites before being made available to all sites through the production repositories. This procedure permits testing an update in a production environment that also is more heterogeneous than what is possible during the certification and verification phases. It allows for potential issues with an update to be discovered at a small scale and potential workarounds to be added to the release notes. In some cases an update may even be rejected. The Staged Rollout serves to increase the confidence in the quality of the updates that pass, such that the vast majority of the sites should experience smooth updates. Sites are invited to participate in the Staged Rollout for services that they have a particular interest in, with the proviso that they may need to debug issues with a particular update and in any case should report their findings. Such participation can be of great value to the production infrastructure and is appreciated!
Reminder: members of Early Adopter teams need a EGI SSO account.
Tools
Collaboration between EGI and WLCG
EGI received an expression of interest from WLCG to integrate the WLCG Middleware Readiness working group. The main goal of this group it's not only to improve the quality of the middle-ware products available to the WLCG sites but also to increase the speed at which new products or releases are available to sites while avoiding the duplication of effort. This could be achieved by sharing experiences and improving the communication channels of both communities.
The work, discussions, meetings, technical information will be handled in this wiki page: [WLCG Middleware Readiness]
Staged Rollout process
Status of products in the SW provisioning process
The EGI RT dashboard has the current status for every product that entered in the EGI SW provisioning, as well as products already released in the UMD repositories https://rt.egi.eu/rt/Dashboards/260/SoftwareProvisioningDashboard
The release schedule of UMD containing the candidate products for each release can be found here UMD_Release_Schedule
The remainder of this document describes the part of the workflow corresponding to the Staged Rollout phase.
Repository Configuration
This document assumes SL6 x86_64, for other architectures the actual repositories to use may change
Base, EPEL and EGI Trustanchors repositories
You should set up these repositories according to the description given in http://repository.egi.eu/category/umd_releases/distribution/umd-3/
Note that for the product repository you should use the one explained below.
Product Repository
The URL for the staged rollout repository is the following in the case of UMD3 components under test:
http://repository.egi.eu/sw/testing/umd/3/sl6/x86_64/
Generically, for UMD3 and other other OS's it's
http://repository.egi.eu/sw/testing/umd/3/<OS>/<ARCH>/
The yum configuration will look like:
# EGI Software Repository - UMD3 SL6 [UMD3_SL6_x86_64_Testing] name= UMD3 SL6 x86_64 Testing baseurl=http://repository.egi.eu/sw/testing/umd/3/sl6/x86_64/ gpgkey=http://emisoft.web.cern.ch/emisoft/dist/EMI/3/RPM-GPG-KEY-emi gpgcheck=1 enabled=1 protect=1
Staged Rollout Workflow
State transition diagram of the RT custom filed “Rollout- Progress”. Transitions labelled “TR” trigger a repository transition of software packages
The staged rollout start when the RolloutProgress custom field of RT ticket in the queue sw-rel changes from InVerification to StageRollOut, this means that the SW component as passed the verification process, i.e. was ACCEPTED in the verification phase.
The staged rollout repository is fixed for any given major release of any SW, this is publicly available in the repository and EGI wiki for Early Adopters. This state change triggers the following actions:
- The SW packages are moved from the untested repository to the testing repository.
- All the Staged Rollout process occurs in the sw-rel queue/ticket.
- Advisable to change the RT ticket tab Jumbo.
- Change the ownership of the ticket to the Staged Rollout Manager responsible for that MW stack: ARC, gLite, UNICORE, Globus, Operational Tools.
- (Staged Rollout Manager) The staged rollout manager is notified and takes the actions below. Advisable to change to the RT ticket tab Jumbo.
- The ticket contains the links to:
- Release notes, installation/upgrade and configuration information.
- Bugs or issues fixed.
- Documentation
- The ticket contains the links to:
- (Staged Rollout Manager) The ticket has a custom field (dropped down) where the staged rollout managers select all the EA teams to assign the staged rollout test.
- (Staged Rollout Manager) When the "Save changes" button is pressed an automatic notification mail is sent to all EAs. All EA teams are added to the Administrative Cc field.
- (EA teams) Each EA team has to acknowledge the reception of the notification within 1 working day. Either by replying to the mail sent by the RT or directly in the ticket with: <accept|reject> <NGI>-<Site-name>
- (Staged Rollout Manager) The staged rollout manager will check the ticket and if there are no EAs accepting the staged rollout test, it will pool the early-adopters-XXX.mailman.egi.eu mailing lists, and other (s)he see’s fit to get other EA sites.
- (EA teams) The EA team has the option to put the service node into downtime in the GOCDB for the update/re-configuration. A special beta tag may also be used for services that are included in the production infrastructure but only used for testing purposes.
- (EA teams) The EA teams do the staged rollout: install/upgrade, configure, and some tests as they see fit.
- (EA teams) If the EA finds problems or issues, either they are clarified within the ticket by the, verifiers, staged rollout managers or other EA teams, OR , a GGUS ticket should be opened. If a GGUS ticket is opened, this the URL of the ticket should be inserted in the RT ticket field RelatedGGUSTickets.
- (EA teams) When opening a ggus ticket. The subject of the ticket should start with Staged Rollout: <name of product> <version>. You should let DMSU decide on the priority, but should describe the criticality of the issue in the ticket body. If it's a show stopper or not, or if there are possible workarounds, etc.
- Example: Subject: staged rollout: <product>-<version> plus some short summary/description. Please note that the version is the one in the RT ticket, for example Staged Rollout: EMI.bdii-site.sl5.x86_64-1.0.1 , just copy/paste from the RT ticket head
- (EA teams) The service should ideally be exposed to production load/environment and users. This period lasts between 5 to 7 days, but may be extended depending on the cases or components under test.
- (EA teams) Each EA team should fill the staged rollout report after the last point, and send it to the Staged Rollout Manager (see table below):
- (EA teams) The report should contain as much information as possible, specially the correctness of the release notes, test that have been preformed, and possible metrics when the service is exposed to production (like number of jobs per day, or number of transfers, what VOs are configured for that service, etc.)
- (EA teams) The name of the file should be:
- ea-<NGI>-<Site-name>-<MW stack>-<component>-<version>.EXT (doc, docx, odt)
- (Staged Rollout Manager) The staged rollout managers:
- Create a document in the EGI Doc server with a given ID, which will contain all reports for the staged rollout of that component.
- Insert the docDB URL in the RT ticket field StageRolloutReport.
Resources
- The Staged Rollout process is supported by the RT ticketing system (see tickets in the staged-rollout queue).
- Report templates: each Early Adopter has to fill in a report (see the template on DocumentDB.
Naming conventions
- The EA team names are SSO groups containing the members of each team FULL LIST:
- ea-< NGI >-< Site-name >
- The custom fields provided in the staged-rollout queue are:
- Drops down box containing all EA teams, on the EGI SSO: possibility to select several teams, the button Save Changes notifies those teams and they will be added to the AdminCC field.
- Outcome of the staged rollout, drop down box with: < ACCEPT | REJECT >.
- The title of the ticket is of the form:
- “Staged Rollout < SW stack-MajorVersion > < COMPONENT > < VERSION > < OS > < ARCH >”
- Examples:
- Staged Rollout CA 1.38
- Staged Rollout EMI-1.0 SE-DPM_mysql 1.8.0-1 SL5 x86_64
- The name of the file should be:
- Filename: ea-< NGI >-< Site-name >-< MW stack >-< component >-< version >.doc(odt)
- The staged rollout manager will provide a summary and the name of the file is:
- Filename of summary: summary-< MW stack >-< component >-< version >.doc(odt)
All reports and the summary will be in: https://documents.egi.eu with a given ID, that will be referenced in the respective rt ticket.
- The description in the document database should have the following naming convention:
- Staged rollout < MW stack > < component > < version >