General information
Middleware
UMD
- Waiting for the release UMD5 (EL9) with a list of products already included in EPEL9 and other repositories.
Migration to EL9
Following PROC16 Decommissioning of unsupported software
Broadcast circulated in June.
Requested to enable the metric to detect CentOS7 endpoints:
- GGUS 167352
The NGIs can open tickets against sites to track the migration
While UMD5 is not released yet:
- install the product versions that are already published in EPEL9
- use the WLCG repository for products like: APEL, BDII, LCMAPS, UI and WN metapackages
- other products might be added if needed
- use the repositories of the product teams
Operations
Accounting Repository
Pub/Sync system taken offline for a security issue. Accounting Repository operation unaffected, but Repository test is provided via the pub/sync hosts.
ARGO/SAM
- EL9 servers moved to production over the past few weeks
- new version of some metrics that weren't deployed in the CentOS7 instances
- webdav and xroot metrics (in particular the read-only test for "eu.egi.readonly.xrootd" service endpoints)
- SRM metrics (in particular the support for SRM+HTTPS - INFN-T1 SRM tests are now successful)
- Waiting for the new version of the HTCondorCE probe
- for the moment the endpoints are tested with the host certificate validity metric
- new version of some metrics that weren't deployed in the CentOS7 instances
Comments during the meeting:
- NGI_IL: failures with the egi.webdav.readwrite metric, but the error message displayed is not helpful (egi.webdav.readwrite-Put)
- NGI_IL: it would be great to reintroduce the possibility to manually resubmit the tests like was happening with Nagios.
FedCloud
Feedback from DMSU
From July 1st the second level support is provided by UKIM:
- the partner representing the Macedonian Academic Research Grid Initiative (MARGI) in the EGI Council, is now a full member of the EGI Federation
New Known Error Database (KEDB)
The KEDB has been moved to Jira+Confluence: https://confluence.egi.eu/display/EGIKEDB/EGI+Federation+KEDB+Home
- problems are tracked with Jira tickets to better follow-up their evolution
- problems can be registered by DMSU staff and EGI Operations team
Monthly Availability/Reliability
Under-performed sites in the past A/R reports with issues not yet fixed:
AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=167466
INDIACMS-TIFR: downtime for several structural upgrades in the infrastructure.
- NGI_CHINA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=167026
- CENI: new failures
- NGI_CH: https://ggus.eu/index.php?mode=ticket_info&ticket_id=167821
- UNIBE-ID: frequent job submission failures
- NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=167820
- LRZ-LMU: IGTF failures fixed; SRM started to fail
- NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=166695
- FZJ: SRM failures
NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=167470
mainz: SRM overload due to large amount if data transferred
- NGI_GRNET: https://ggus.eu/index.php?mode=ticket_info&ticket_id=166696
- GR-07-UOI-HEPLAB: SURL information is missing
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=166697
- INFN-BARI: job submission failures
- INFN-GENOVA: SRM and job submission failures
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=165200
- INFN-PISA: information on GOCDB about webdav to be fixed.
- NGI_IT:
- INFN-MILANO-ATLASC: https://ggus.eu/index.php?mode=ticket_info&ticket_id=167467
- internal error in StoRM's webdav server that couldn't be to sorted out; plans to phase out StoRM and migrate to dCache.
- INFN-ROMA3: https://ggus.eu/index.php?mode=ticket_info&ticket_id=167468
- failures with the host certificate validity check
- INFN-MILANO-ATLASC: https://ggus.eu/index.php?mode=ticket_info&ticket_id=167467
- NGI_UK: https://ggus.eu/index.php?mode=ticket_info&ticket_id=166699
- UKI-SOUTHGRID-BRIS-HEP: downtime for a major infrastructure overhaul
Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: (August 2024):
- NGI_BY: https://ggus.eu/index.php?mode=ticket_info&ticket_id=168014
- BY-NCPHEP:
- NGI_CH: https://ggus.eu/index.php?mode=ticket_info&ticket_id=168015
- CSCS-LCG2:
- NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=168016
- DESY-HH:
- NGI_IBERGRID: https://ggus.eu/index.php?mode=ticket_info&ticket_id=168019
- NCG-INGRID-PT:
- NGI_IT:
- INFN-CATANIA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=168017
- INFN-ROMA1: https://ggus.eu/index.php?mode=ticket_info&ticket_id=168018
- Downtime for replacing the UPS
sites suspended:
IPv6 readiness plans
- please provide updates to the IPv6 assessment (ongoing) https://wiki.egi.eu/w/index.php?title=IPV6_Assessment
- if any relevant, information will be summarised at OMB
VOMS upgrade campaign to EL9
- VOMS released on EL9:
- The sites can now upgrade their VOMS endpoints to EL9
- Packages available on the product team repository:
- Optionally you could keep the current server to work as the database (not exposed to the outside), while you expose externally the new server with voms and voms-admin
- This should shorten the downtime when doing the switch
Currently there are 28 VOMS endpoints in production. We are also starting to decommission about 100 inactive VOs, so the number of VOMS endpoints could also decrease.
Tickets to be tracked here: 2024 VOMS upgrade campaign
StoRM upgrade campaign to EL9
- INFN is working to release StoRM on EL9
- StoRM WebDAV v1.4.2 (the latest released on CentOS 7) is available also for el9 in their stable repository
- The other components will be soon ready
- 31 StoRM endpoints published in the BDII
- We can track the migration in 2024 StoRM upgrade campaign
New benchmark HEPscore23
The benchmark HEPscore23 is replacing the old Hep-SPEC06
Recent activities:
- progress with testing and development of the new server and client
- merging HEPSCORE and EL8/9 compatible versions
- schema update script
- The new testing infrastructure for sites which would like to join the tests is ready.
- Please contact us if you'd like to make tests with the new benchmark
- Information for testing the publication of accounting records with the new benchmark:
- the twiki will be update with the test UI endpoint.
- This infrastructure can be used both for HEPSCORE integration testing and new Python3 EL9 APEL client testing.
- APEL
- APEL client 2.1.0 released
- It needs to be added to UMD
- APEL client 2.1.0 released
HEPSCORE application:
- link to the gitlab page: https://gitlab.cern.ch/hep-benchmarks/hep-score
WLCG/HSF Workshop 2024
- APEL status and plans presentation on Tue May 14th afternoon
AOB
Question during the meeting:
- NGI_FRANCE: the Biomed VO was planning to move from VOMS to Check-in, asking to add a feature in Check-in to release voms credentials: it wasn't provided a clear answer to this.
Next meeting
October