General information
Middleware
UMD
- CentOS8 discussion still ongoing
- migration of Software Provisioning infrastructure to IBERGRID still ongoing
- in particular, administration portal used for release creation done successfully
- February release planned https://wiki.egi.eu/wiki/UMD_Release_Schedule to be discussed at today's meeting
- problem: UMD-4 missing voms-clients-cpp-2.0.15: http://repository.egi.eu/sw/production/umd/4/centos7/x86_64/updates/
- to be fixed urgently
Preview repository
- released on 2020-11-30:
- Preview 1.30.0 AppDB info (last release on sl6): CVMFS 2.7.5 and egi-cvmfs-2-7.12, dCache 5.2.35, DMLite/DPM 1.14.2, Dynafed 1.6.0, STORM 1.11.19, VOMS 10-20 release, xrootd 4.12.5
- Preview 2.30.0 AppDB info (CentOS 7): APEL-SSM 3.0.1, CVMFS 2.7.5 and egi-cvmfs-2-7.12, dCache 5.2.35, DMLite/DPM 1.14.2, Dynafed 1.6.0, STORM 1.11.19, VOMS 10-20 release, xrootd 4.12.5 and 5.0.3
- collecting information for the next release
Operations
ARGO/SAM
- Migration to CentoOS 7 completed
- some probes not yet ready for CentOS 7 are temporary executed by https://egi-mon-old.argo.grnet.gr/nagios/
- HTCondor-CE probes
- working on the probe for the host certificate validity check: GGUS 147386
- integration with secmon and pakiti: GGUS 150006
- CREAM-CE metrics removed from ARGO_MON, ARGO_MON_OPERATIONS and ARGO_MON_CRITICAL (GGUS 149778)
- emi.cream.CREAMCE*
- eu.egi.CREAM*
FedCloud
Feedback from DMSU
Monthly Availability/Reliability
- Under-performed sites in the past A/R reports with issues not yet fixed:
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150109
- INDIACMS-TIFR failures with HTCondor-CE and webdav
- KR-KNU-T3: migration from CREAM-CE to HTCondor-CE
- CERN-PROD: https://ggus.eu/index.php?mode=ticket_info&ticket_id=149351
- webdav failures which required a fix in the EOS services https://its.cern.ch/jira/browse/EOS-4515 ; some instability with the site-bdii
- NGI_HR: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148518
- egee.irb.hr: in the process of a major upgrade from CentOS 6 to CentOS 7, some delays.
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=149352
- INFN-LECCE: authz failures on SRM; CREAM-CE to decommission
- TRIGRID-INFN-CATANIA: CREAM-CE to decommission
- NGI_IT https://ggus.eu/index.php?mode=ticket_info&ticket_id=149798
- INFN-ROMA1-CMS: problems with ARC-CE solved; intermittent failures on SRM service, increased the storage to improve the stability
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150108
- GARR-01-DIR the site will be decommissioned
- NGI_NDGF: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150111
- SE-SNIC-T2: network issues affecting the SE. Planned a meeting with the internet provider.
- ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148515
- ATLAND: ARC-CE misconfiguration
- ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148956
- CBPF: SRM failures due to information not properly published. Physical access to facilities restricted due to COVID measures; planned a DPM update.
- ROC_LA https://ggus.eu/index.php?mode=ticket_info&ticket_id=149355
- SUPERCOMPUTO-UNAM: scheduled a downtime for upgrading the site.
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150109
- Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: (January 2021):
- AfricaArabia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150466
- MA-01-CNRST: migration from CREAM-CE to ARC-CE; job submission failures due to missing information (ApplicationEnvironment)
- NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150467
- mainz: problems with upgradin the STORM SE, now solved
- RWTH-Aachen: xrootd port doesn't allow ops VO
- SCAI: replacement of the cloud cluster
- NGI_France: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150465
- IN2P3-CC-T2: SRM failures
- NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150469
- UA-MHI
- NGI_UK: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150470
- UKI-SOUTHGRID-SUSX: failures with the IGTF test
- Russia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150468
- RU-SARFTI: problems when migrating from CREAM-CE to ARC-CE
- AfricaArabia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150466
- sites suspended:
- HK-HKU-CC-01 (AsiaPacific)
IPv6 readiness plans
- please provide updates to the IPv6 assessment (ongoing) https://wiki.egi.eu/w/index.php?title=IPV6_Assessment
- if any relevant, information will be summarised at OMB
Top-BDII problem affecting the publication of accounting records
- on 20th Dec 2020 the top-bdii at CERN lcg-bdii.cern.ch stopped working
- since then, it wasn't possible to publish the accounting data
- the SSM script couldn't find the Message Brokers queue to send the messages
- top-bdii fixed on 4th Jan 2021
- this problem affected all the sites because by default in the APEL SSM config file it is set CERN's top-BDII
- each site can set instead the top-BDII of its region:
- Top-BDIIs service group on GOCDB
- Top-BDII servers monitored by ARGO
- each site can set instead the top-BDII of its region:
- CERN's top-BDII is going to be retired
CREAM-CE Decommission
- End of Security Updates and Support: 31st Dec 2020
- Original broadcast: https://operations-portal.egi.eu/broadcast/archive/2293
- Decommissioning deadline: 31st Jan 2021
- PROC16 Decommission of unsupported software
- Decommissioning start date: Oct 1st 2020
- a probe detecting CREAM-CE endpoints will be run, returning WARNING status
- GGUS ticket: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148715
- eu.egi.sec.CREAMCE
- Nov 1st: probe returns CRITICAL status, alarms created on the ROD dashboard, ROD teams start to create tickets
- 1st Feb 2021: EGI Ops will start chasing the sites still providing CREAM-CE endpoints
- By this time service end-points which couldn't be upgraded should be put into downtime by site admin or ROD
- 1st March 2021: Sites still deploying unsupported service endpoints risk suspension, unless documented technical reasons prevent a Site Admin from updating these endpoints.
- Tickets opened: 49
- link to the list
- Please note that at least one CE endpoint should be associated to the APEL service type in order to monitor the publication of the accounting data, as explained here
- If the CE you are going to remove was also registered as APEL service type, do not forget to move the APEL service type to a different CE endpoint.
VOMS upgrade to CentOS 7
- VOMS for CentOS 7 released Nov 23rd with UMD 4.12.13
- VOMS Admin 3.8.0, VOMS Server 2.0.15
- VOMS endpoints registered on GOCDB as production and monitored: 41
- Provided by 33 sites
- list of ticket opened: GGUS
- the VOMS servers need to be published in the BDII in order to easily collect the deployed version
AOB
Next meeting
8th Mar 2021