General information
Middleware
UMD
- a UMD4 was released.
- Expecting to release UMD5 (EL9) by the end of the month.
BDII packages for EL9 at the moment is available here:
https://github.com/EGI-Federation/bdii
https://github.com/EGI-Federation/bdii-config-site
https://github.com/EGI-Federation/bdii-config-top
https://github.com/EGI-Federation/glite-info-provider-ldap
https://github.com/EGI-Federation/glite-info-static
https://github.com/EGI-Federation/glite-info-update-endpoints
https://github.com/EGI-Federation/ginfo
Operations
Accounting Repository
Pub/Sync system taken offline for a security issue. APEL Repository operation unaffected, but Repository test is provided via the pub/sync hosts.
ARGO/SAM
- changing the warning period of the host certificate validity metric
- https://ggus.eu/index.php?mode=ticket_info&ticket_id=161019
- currently is 1 month before the certificate expiration
- most of the sites cannot fix the situation immediately
- agreed to have the WARNING status 2 weeks before the certificate expiration
- request to ARGO: https://ggus.eu/index.php?mode=ticket_info&ticket_id=166724
- Monitoring of xrootd endpoints
- some endpoints are exposed outside the site in read-only mode
- the new service type "eu.egi.readonly.xrootd" was created for this purpose (see GGUS 160848)
- new version of the xrootd probe executing only "read" tests: to be added in UMD and deployed in ARGO (GGUS 163071)
- New version of srm probe to be deployed (GGUS 162411) and to be included in UMD (GGUS 162424)
- support for py3 only
- support for SRM+HTTPS
- updated default Top-BDII endpoint
FedCloud
- Need for the FedCloud sites to perform a risk assessment to ensure that adequate measures are in place to mitigate the risk of users data loss.
Feedback from DMSU
New Known Error Database (KEDB)
The KEDB has been moved to Jira+Confluence: https://confluence.egi.eu/display/EGIKEDB/EGI+Federation+KEDB+Home
- problems are tracked with Jira tickets to better follow-up their evolution
- problems can be registered by DMSU staff and EGI Operations team
Monthly Availability/Reliability
Under-performed sites in the past A/R reports with issues not yet fixed:
- NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=162630
- UNI-SIEGEN-HEP: SRM failures; the endpoint was disabled, A/R figures are now improving.
- NGI_FRANCE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=166172
- GRIF: a planned electrical intervention was then extended; webdav failures were investigated with the EOS developers (option to list the directories wasn't enabled; blacklisted RC4 and NULL ciphers in the level of the eos-xrootd application); SOLVED.
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=165200
- INFN-PISA: information on GOCDB about webdav to be fixed.
- NGI_IBERGRID: https://ggus.eu/index.php?mode=ticket_info&ticket_id=165489
- CESGA: openstack failures, fixed.
- NGI_PL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=165490
- CYFRONET-CLOUD:
- ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=165196
- ATLAND: information on GOCDB about webdav was fixed, A/R figures are improving
Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: (April 2024):
- NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=166695
- FZJ: SRM failures
- NGI_GRNET: https://ggus.eu/index.php?mode=ticket_info&ticket_id=166696
- GR-07-UOI-HEPLAB: SURL information is missing
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=166697
- INFN-BARI:
- INFN-GENOVA:
- NGI_PL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=166698
- PSNC: changes to the inter-site integrated management system, which includes grants, user database, user authorization/authentication, and resource management for PSNC CLOUD and HPC resources; during these processes, investigations on and fixed quite a large amount of bugs.
- NGI_UK: https://ggus.eu/index.php?mode=ticket_info&ticket_id=166699
- UKI-SOUTHGRID-BRIS-HEP: downtime for a major infrastructure overhaul
sites suspended:
IPv6 readiness plans
- please provide updates to the IPv6 assessment (ongoing) https://wiki.egi.eu/w/index.php?title=IPV6_Assessment
- if any relevant, information will be summarised at OMB
VOMS strategy and upgrade campaign
- Currently VOMS is included in UMD4 available on CentOS 7
- both reaching end of life in June 2024
- voms server packages for EL8 available on EPEL8 repository
- EL7 voms-admin packages work also on EL8
- Discussing with INFN the extension of the VOMS security support until December 2024
Scenario 1
- Upgrade VOMS endpoints to EL8 with:
- voms packages from EPEL8 repository
- voms-admin packages from UMD4/EL7
Scenario 2
- INFN may decide to release voms and voms-admin on EL9
- Upgrade VOMS endpoint to EL9
Currently there are 28 VOMS endpoints in production. We are also starting to decommission about 100 inactive VOs, so the number of VOMS endpoints could also decrease.
Campaign to upgrade HTCondor to version 10 with SSL authentication enabled
- The campaign to decommission HTCondor <= 9 was started
- Upgrade to HTCondor 10 (or 23) with SSL authentication enabled
- Tickets to sites created at the beginning of November 2023
- Details in this page.
Important for the sites:
- Please start collecting information from the VOs you support about the DNs that should be mapped on your endpoints
- Mapping for the ops VO - at least the following certificates:
- EGI Monitoring Service:
- "/DC=EU/DC=EGI/C=GR/O=Robots/O=Greek Research and Technology Network/CN=Robot:argo-egi@grnet.gr"
- "/DC=EU/DC=EGI/C=HR/O=Robots/O=SRCE/CN=Robot:argo-egi@cro-ngi.hr"
- EGI Security monitoring:
- "/DC=EU/DC=EGI/C=GR/O=Robots/O=Greek Research and Technology Network/CN=Robot:argo-secmon@grnet.gr"
- EGI Monitoring Service:
Important for the VOs:
- update the condor-client as well in coordination with the sites
Monitoring:
- CE client updated also on ARGO (GGUS 163583)
- To be clarified with the developers if the current version of the probe can work also with Check-in tokens.
Accounting of HTC jobs using token-based authentication
- Transition period where the Computing Elements are supporting different authentication methods (X509 personal certificates + VOMS, and tokens) in order to allow the VOs an easier migration towards token-based authentication.
- Already a few cases of VOs using only tokens, and it was noticed that our middleware is not able to gather the associated accounting information as instead it should.
- Need to find a solution (either temporary or for the long-term) valid for any kind of CE and any kind of token profile
- Involving CE developers, APEL Accounting team, AAI team
- Git-hub issue and GGUS 155987
- Grand Unified Token (GUT) profile WG
- discussions on how the tokens should provide the VO information an users belong to
New benchmark HEPscore23
The benchmark HEPscore23 is replacing the old Hep-SPEC06
Recent activities:
- progress with testing and development of the new server and client
- merging HEPSCORE and EL8/9 compatible versions
- schema update script
- The new testing infrastructure for sites which would like to join the tests is ready.
- Please contact us if you'd like to make tests with the new benchmark
- Information for testing the publication of accounting records with the new benchmark:
- the twiki will be update with the test UI endpoint.
- This infrastructure can be used both for HEPSCORE integration testing and new Python3 EL9 APEL client testing.
- APEL
- APEL client 2.0.0 released
- It needs to be added to UMD
- APEL client 2.0.0 released
HEPSCORE application:
- link to the gitlab page: https://gitlab.cern.ch/hep-benchmarks/hep-score
WLCG/HSF Workshop 2024
- APEL status and plans presentation on Tue May 14th afternoon
AOB
Next meeting
June