General information
Middleware
UMD
- CentOS Stream 8 now the recommended OS for new installations
- C8->CS8 migrations recommended
- CS9 will be supported by CERN and FNAL
- middleware: recommended path is C7->CS9 (we will probabily skip CS8)
UMD 4.16.0 has been released (https://repository.egi.eu/UMD/4.16.0.html) and includes several updates for CentOS7:
- python-jess 0.2.37 https://repository.egi.eu/static/SW/python-jess0.2.37.html
- nagios-plugins-webdav 0.4.5 https://repository.egi.eu/static/SW/nagios-plugins-webdav0.4.5.html
- nagios-plugins-check-ssl-cert 1.84.0 https://repository.egi.eu/static/SW/nagios-plugins-check-ssl-cert1.84.0.html
- qcg-nagios-probes 4.0.0 https://repository.egi.eu/static/SW/qcg-nagios-probes4.0.0.html
- ARC6 6.14.0 https://repository.egi.eu/static/SW/ARC66.14.0.html
- xroot 5.3.4 https://repository.egi.eu/static/SW/xroot5.3.4.html
- nagios-plugins-xroot 0.0.1 https://repository.egi.eu/static/SW/nagios-plugins-xroot0.0.1.html
- python-nap 0.1.20 https://repository.egi.eu/static/SW/python-nap0.1.20.html
- nagios-plugins-srm 0.0.5 https://repository.egi.eu/static/SW/nagios-plugins-srm0.0.5.html
Operations
Crisis Ukraine-Russia
EGI stands with Ukraine and its people: see the message on the website https://www.egi.eu/news/egi-stands-with-ukraine-and-its-people/
New message circulated: https://www.egi.eu/blog/a-message-to-the-egi-community-and-its-scientists/
A crisis team was set-up to deal with this difficult situation.
Operative matters:
- Broadcast sent to VO managers, VO Users, and RC Administrators to warn and remind that the EGI resources must not be used for illicit purposes. All the sites are advised to monitor the traffic network.
- Single countries may decide to stop any interactions with Russian institutes: should this happen, we are working on a set of guidelines for the sites to implement such restrictions.
- Information on how to manage the access to compute and storage services: Access control to compute and storage infrastructure
Further news will be circulated after the extraordinary EGI EB and CERN Council meetings.
ARGO/SAM
- Integration of EOS storage element: GGUS 154335
- Monitoring probe status:
We are testing the monitoring probe for the EOS Storage endpoints (GGUS 156251) which uses the XRootD interface (see https://github.com/EGI-Federation/nagios-plugins-xrootd )
On GOCDB the EOS endpoints are registered as XrootD service endpoints.
In order to allow the proper execution of the probe, we would like you to:
enable the ops VO on your endpoints
for each EOS (Xrootd) service endpoint add the following Extension Property:
Name: XROOTD_URL
Value: XRootD base SURL to test (the path where ops VO has write access, for example: root://eospps.cern.ch:1094/eos/pps/ or similar)
Please do the same even if you provide an XRootD interface with a different type of storage element.
- Test results on the devel instance
FedCloud
Feedback from DMSU
New Known Error Database (KEDB)
The KEDB has been moved to Jira+Confluence: https://confluence.egi.eu/display/EGIKEDB/EGI+Federation+KEDB+Home
- problems are tracked with Jira tickets to better follow-up their evoulution
- problems can be registered by DMSU staff and EGI Operations team
Monthly Availability/Reliability
- Under-performed sites in the past A/R reports with issues not yet fixed:
- NGI_BG: https://ggus.eu/index.php?mode=ticket_info&ticket_id=155859
- BG05-SUGrid: instability with webdav and HTCondorCE; some idle jobs prevent the correct execution of the tests. In downtime until 20th March. In April batch system issues, unscheduled downtime due to maintenance to power supply.
- NGI_BG: https://ggus.eu/index.php?mode=ticket_info&ticket_id=155859
Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: (March 2022):
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=156745
- TW-NCUHEP
- NGI_CH: https://ggus.eu/index.php?mode=ticket_info&ticket_id=156744
- T3_CH_PSI: SRM failures caused by site-bdii issues.
- NGI_IBERGRID: https://ggus.eu/index.php?mode=ticket_info&ticket_id=156741
- CETA-GRID: issues due to an increase of utilisation.
- NGI_UK: https://ggus.eu/index.php?mode=ticket_info&ticket_id=156742
- RAL-LCG2
sites suspended:
Documentation
- MediaWiki in read-only mode
- content to be moved to different locations (confluence and https://docs.egi.eu/)
- confluence space hosting policies and procedures: EGI Policies and Procedures
- EGI Federation Operations
- Change Management, Release and Deployment Management, Incident and Service Request Management, Problem Management, Information Security Management
- Manuals, How-Tos, Troubleshooting, FAQs:
- huge number of material need to be reviewed and in case updated when moved to the new place
- location will be https://docs.egi.eu/providers/operations-manuals/
IPv6 readiness plans
- please provide updates to the IPv6 assessment (ongoing) https://wiki.egi.eu/w/index.php?title=IPV6_Assessment
- if any relevant, information will be summarised at OMB
Transition from X509 to federated identities (AARC profile token)
- WLCG is testing aai tokens (WLCG profile) as authz system for accessing the middleware, with Indigo IAM as a replacement of VOMS
- In Feb 2022 OSG will fully move to token-based AAI, abandoning X509 certificates
- HTCondorCE: replacement of Grid Community Toolkit
- The long-term support series (9.0.x) from the CHTC repositories will support X509/VOMS authentication through
Sep 2022Jan 2023 - Starting in 9.3.0 (released in October), the HTCondor feature releases does NOT contain this support
- EGI sites are recommended to stay with the long-term support series for the time being
- The long-term support series (9.0.x) from the CHTC repositories will support X509/VOMS authentication through
What we need to know in preparation of the transition:
Checking the middleware compliance with the AARC Profile token:
- ARC-CE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154958
- So far focusing on the WLCG profile, which is built upon the AARC profile, so this should cover everything.
- Argus: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154959
- no clear plans yet
- dCache: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154960
- dCache does support authorisation statements, as described by WLCG AuthZ-WG's JWT profile.
- supporting AARC-style group membership statements is on the TODO list
- DPM: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154961
- The DPM is in maintenance mode to be phased out by ~2024. There is no effort for implementing new functionality, which furthermore would be short-lived.
- HTCondor-CE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154962
- HTCondor-CE supports WLCG tokens, so it should work also with the AARC profile token. Some tests are needed.
- STORM: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154963
- Only the StoRM-WebDAV component supports token-based authorization.
- At the moment only scopes and groups foreseen by the WLCG Token Profile are recognized by the authorization policy engine, but adding support for the AARC profile is planned.
- At the moment there is no plan to add token support to the SRM component.
- Finalizing a specification for a Tape REST API to replace the functionally-equivalent SRM one.
- That implementation will have token support.
- Only the StoRM-WebDAV component supports token-based authorization.
Need to check the awareness and readiness of users communities:
- which GRID services do they use
- Compute: ARC-CE
- Compute: HTCondorCE
- Storage: SRM
- Storage: webdav/http
- Storage: GridFTP
do you interact directly with Compute and Storage services (e.g., through command line) or do you use a tool (e.g., DIRAC, data transfer tools, data management tools, etc.) available to your VO?
- do you own and need a personal X509 certificate to access the services or can you use a federated identity (e.g., institutional identity, social account, etc.)
- are they familiar with AAI identities
- are they ready for the switch
Broadcast sent to the VO on Jan 28th (it requires login): https://operations-portal.egi.eu/broadcast/archive/2896
- reply so far from:
- atlas
- biomed
- enea
- eiscat.se
- glast.org (srm, gfal-utils)
- ildg (srm, gridftp; direct access with x509)
- Km3Net
- lhcb
- project.nl
- vo.france-grilles.fr
- vo.grapevine.eu
- vo.hess-experiment.eu
- vo.complex-systems.eu
- VOCE
- usage of DIRAC in general, a few VOs access directly to the services
- a training over federated identities for users (and sys-admins) could be useful
- VOs framework based on either X509 or AAI (because the usage of DIRAC)
Migration of the VOs from VOMS to Check-in
- transition period where both X509 and tokens can be used
- delays in updating the GRID elements to the latest version compliant with tokens
- not all if the middleware products can be compliant with tokens at the same time
- the same VO has to interact with element supporting different authentications
New benchmark replacing HEP-SPEC06
The benchmark HEPSCORE is going to replace the old Hep-Spec06
- preparing plans with WLCG and the EGI Accounting team for deploying the new benchmark
- transition period where both the benchmark will be published and used to normalise the data
- to allow comparison between the two kind of data
AOB
- DPM migration
Next meeting
Apr