General information
Middleware
UMD
- CentOS Stream 8 now the recommended OS for new installations
- C8->CS8 migrations recommended
- CS9 will be supported by CERN and FNAL
- middleware: recommended path is C7->CS9 (we will probabily skip CS8)
- new release https://repository.egi.eu/UMD/4.15.1.html
- ARC-CE 6.13.0 bug fixes release
- Xrootd 5.3.1 bug fixes release
- CERN EOS 5.0.2 new release of EOS Open Storage which provides a storage solution large amounts of physics data and user files, with a focus on interactive and batch analysis.
- dCache 6.2.31 security vulnerability fix
- Infrastructure Manager Nagios probe 1.3.1
- GridFTP 13.21.1 minor bug fix of some Globus packages
- gfal2 2.19.2 regular update of the gfal clientes
- gfal2-utils 1.6.0 regular update of the gfal2-utils clientes
- EGI CVMFS 3.3.16 new release for the EGI default configuration meta-package configured for EGI.
- CVMFS 2.8.2 patch release containing bug fixes for clients and new diagnostics commands for the client.
- HTCondor 9.0.1 New major release of HTCondor
- HTCondor-CE 5.1.3 New Major Reelase of the HTCondor-CE
Preview repository
- released on 2021-06-10
- Preview 2.34.0 (CentOS 7): ARC 6.12.0, CVMFS 2.8.1, xrootd 5.2.0
- released on 2021-08-11
- Preview 2.35.0 (CentOS 7): APEL SSM 3.2.1, DPM/DMLite 1.15.0 and 1.15.1, frontier-squid 4.15.2, xrootd 5.3.0
- We plan to stop the release of Preview since it doesn't seem to be used very much, and it is also easier to catch the last version of the products from EPEL or the product teams repos, prior the release in UMD.
Operations
ARGO/SAM
- probe for checking the HTCondorCE host certificate validity deployed in production (GGUS 147386):
- checks on expiration date, CN, and CA:
- it is working fine (very few failures)
- to be included in the A/R profile
FedCloud
Feedback from DMSU
New Known Error Database (KEDB)
The KEDB has been moved to Jira+Confluence: https://confluence.egi.eu/display/EGIKEDB/EGI+Federation+KEDB+Home
- problems are tracked with Jira tickets to better follow-up their evoulution
- problems can be registered by DMSU staff and EGI Operations team
Monthly Availability/Reliability
- Under-performed sites in the past A/R reports with issues not yet fixed:
- AfricaArabia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154295
- MA-01-CNRST: ARC-CE failures
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150818
- INFN-PISA: HTCondorCE failures fixed; SRM failures not yet
- NGI_PL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=153659
- TASK: in the process of replacing QCG with ARC-CE
- NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=152841
- UA-NSCMBR: problem during the DPM update: conflict between xrootd 5 and dmlite 1.13. Unscheduled downtime due to power failure in the computing centre. NFS configuration issue affected ARC-CE. Accounting data republished using the ARC accountng functionalities.
- NGI_UK: https://ggus.eu/index.php?mode=ticket_info&ticket_id=153660
- UKI-SOUTHGRID-SUSX: CE configuration issues; some other failures occurred.
- ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=153658
- SUPERCOMPUTO-UNAM: some network issues
- AfricaArabia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154295
- Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: (October 2021):
- NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154745
- GoeGrid: relocation of the cluster to a different building on the campus and subsequent network issues; handover to new staff; problems fixed.
- NGI_IBERGRID: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154750
- UAM-LCG2
- NGI_RO: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154746
- GRIDIFIN
- NGI_PL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154747
- PSNC: storage backend issues affecting the HPC cluster and DPM, causing also ARC-CE instability; DPM issues were fixed, working on HPC cluster
- Russia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154748
- RU-SARFTI: ARC-CE failures, problem with hard drives, fixed
- NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154749
- UA-KNU: failures with IGTF metric, now fixed.
- NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154745
- sites suspended:
Documentation
- plan to decommission MediaWiki
- content to be moved to different locations (confluence and https://docs.egi.eu/)
- confluence space hosting policies and procedures: EGI Policies and Procedures
- EGI Federation Operations
- Change Management, Release and Deployment Management, Incident and Service Request Management, Problem Management, Information Security Management
- Manuals, How-Tos, Troubleshooting, FAQs:
- huge number of material need to be reviewed and in case updated when moved to the new place
- location will be https://docs.egi.eu/providers/operations-manuals/
IPv6 readiness plans
- please provide updates to the IPv6 assessment (ongoing) https://wiki.egi.eu/w/index.php?title=IPV6_Assessment
- if any relevant, information will be summarised at OMB
AOB
Next meeting
Dec