General information
Middleware
UMD
- UMD5 released: https://repository.egi.eu/umd/distribution.html?id=5#5
- APEL 2.1.0, APEL SSM 3.4.1
- Arc 6.20.1
- BDII 6.0.3,
- WN 5.1.0
- UI 7.0.0
- Dcache 9.2.25
- Gfal2 2.23.0
- Frontier-squid 5.9.2
- Voms 2.1.0, voms-api 3.3.3, voms-client-java 3.3.3, voms-client-cpp 2.1.0
- xroot 5.7.1
- htcondor-ce 23.0
- cvmfs 2.11.5
- config-egi 2.6.1
- egi-cvmfs 6.7.28
- Davix 0.8.7
Migration to EL9
Following PROC16 Decommissioning of unsupported software
Broadcast circulated in June.
Requested to enable the metric to detect CentOS7 endpoints:
- GGUS 167352
The NGIs can open tickets against sites to track the migration
Operations
Accounting Repository
Pub/Sync system taken offline for a security issue. Accounting Repository operation unaffected, but Repository test is provided via the pub/sync hosts.
We receive weekly reports by email about the publication of the accounting records.
ARGO/SAM
- Waiting for the new version of the HTCondorCE probe
- for the moment the endpoints are tested with the host certificate validity metric
FedCloud
Feedback from DMSU
From July 1st the second level support is provided by UKIM:
- the partner representing the Macedonian Academic Research Grid Initiative (MARGI) in the EGI Council, is now a full member of the EGI Federation
New Known Error Database (KEDB)
The KEDB has been moved to Jira+Confluence: https://confluence.egi.eu/display/EGIKEDB/EGI+Federation+KEDB+Home
- problems are tracked with Jira tickets to better follow-up their evolution
- problems can be registered by DMSU staff and EGI Operations team
Monthly Availability/Reliability
Under-performed sites in the past A/R reports with issues not yet fixed:
AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=167466
INDIACMS-TIFR: downtime for several structural upgrades in the infrastructure.
- NGI_CHINA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=167026
- CENI: tests ok since mid September
- NGI_CH: https://ggus.eu/index.php?mode=ticket_info&ticket_id=168015
- CSCS-LCG2:
- NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=167820
- LRZ-LMU: IGTF failures fixed; SRM protocol not supported any longer, the associated endpoint to be removed from GOCDB.
NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=167470
mainz: SRM overload due to large amount if data transferred
- NGI_GRNET: https://ggus.eu/index.php?mode=ticket_info&ticket_id=166696
- GR-07-UOI-HEPLAB: SURL information is missing
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=166697
- INFN-BARI: job submission failures
- INFN-GENOVA: SRM and job submission failures
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=165200
- INFN-PISA: information on GOCDB about webdav to be fixed.
- NGI_IT:
- INFN-MILANO-ATLASC: https://ggus.eu/index.php?mode=ticket_info&ticket_id=167467
- internal error in StoRM's webdav server that couldn't be to sorted out; plans to phase out StoRM and migrate to dCache.
- INFN-ROMA3: https://ggus.eu/index.php?mode=ticket_info&ticket_id=167468
- failures with the host certificate validity metric have been fixed; SRM started to fail.
- INFN-MILANO-ATLASC: https://ggus.eu/index.php?mode=ticket_info&ticket_id=167467
- NGI_IT:
- INFN-CATANIA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=168017
- failures with the host certificate validity check
- INFN-ROMA1: https://ggus.eu/index.php?mode=ticket_info&ticket_id=168018
- Downtime for replacing the UPS
- INFN-CATANIA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=168017
- NGI_UK: https://ggus.eu/index.php?mode=ticket_info&ticket_id=166699
- UKI-SOUTHGRID-BRIS-HEP: downtime for a major infrastructure overhaul
Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: (September 2024):
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=168531
- TW-FTT:
- NGI_France: https://ggus.eu/index.php?mode=ticket_info&ticket_id=168528
- IN2P3-LPC: the lsc file of the new ops iam server wasn't installed.
- NGI_HU: https://ggus.eu/index.php?mode=ticket_info&ticket_id=168527
- BUDAPEST: the lsc file of the new ops iam server wasn't installed.
- NGI_IBERGRID: https://ggus.eu/index.php?mode=ticket_info&ticket_id=168488
- BIFI:
- CESGA-CLOUD:
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=168529
- INFN-ROMA1-CMS: Downtime for replacing the UPS
- NGI_RO: https://ggus.eu/index.php?mode=ticket_info&ticket_id=168530
- RO-07-NIPNE: migration to AlmaLinux 9, issues with the UPS
- NGI_UK: https://ggus.eu/index.php?mode=ticket_info&ticket_id=168532
- UKI-LT2-QMUL: long downtime for data centre maintenance
- UKI-NORTHGRID-LIV-HEP: failures caused by the institute firewall
- UKI-SCOTGRID-ECDF: investigation on some changes that created issues; relocation of the machines in the data centre
- UKI-SCOTGRID-GLASGOW: webdav failures which have been resolved
- ROC_LA:
- AstrogridPUC: https://ggus.eu/index.php?mode=ticket_info&ticket_id=168486
- EELA-UTFSM: https://ggus.eu/index.php?mode=ticket_info&ticket_id=168487
- some issues with migration of dCache to EL9.
sites suspended:
IPv6 readiness plans
- please provide updates to the IPv6 assessment (ongoing) https://wiki.egi.eu/w/index.php?title=IPV6_Assessment
- if any relevant, information will be summarised at OMB
VOMS upgrade campaign to EL9
- VOMS released on EL9:
- The sites can now upgrade their VOMS endpoints to EL9
- Packages available on the product team repository:
- Optionally you could keep the current server to work as the database (not exposed to the outside), while you expose externally the new server with voms and voms-admin
- This should shorten the downtime when doing the switch
Currently there are 28 VOMS endpoints in production. We are also starting to decommission about 100 inactive VOs, so the number of VOMS endpoints could also decrease.
Tickets to be tracked here: 2024 VOMS upgrade campaign
StoRM upgrade campaign to EL9
- INFN is working to release StoRM on EL9
- StoRM WebDAV v1.4.2 (the latest released on CentOS 7) is available also for el9 in their stable repository
- The other components will be soon ready
- 31 StoRM endpoints published in the BDII
- We can track the migration in 2024 StoRM upgrade campaign
New benchmark HEPscore23
The benchmark HEPscore23 is replacing the old Hep-SPEC06
Recent activities:
- APEL client 2.1.0 released and included in UMD 5
- Testing ongoing, with data sent from some sites to the accounting repository and published into the staging accounting portal
- Please contact us if you'd like to make tests with the new benchmark
- Information for testing the publication of accounting records with the new benchmark:
- plans to finalise the HepScore deployment by the end of November
HEPSCORE application:
- link to the gitlab page: https://gitlab.cern.ch/hep-benchmarks/hep-score
WLCG Operations Coordination meeting (Oct 2024)
Verify configuration records
On a yearly basis, the information registered into GOC-DB need to be verified. NGIs and RCs have been asked to check them. In particular:
- NGI managers should review the people registered and the roles assigned to them, and in particular check the following information:
- ROD E-Mail
- Security E-Mail
NGI Managers should also review the status of the "not certified" RCs, in according to the RC Status Workflow;
- RCs administrators should review the people registered and the roles assigned to them, and in particular check the following information:
- telephone numbers
- CSIRT E-Mail
RC administrators should also review the information related to the registered service endpoints.
The process should be completed by Oct 7th.
List of tickets in the GGUS search page
- 17 out of 31 tickets still open
AOB
Next meeting
November