General information
Middleware
UMD
- plans on CentOS8 ONGOING
- UMD4 release in preparation
- StoRM, VOMS, BDII update, dCache
- VERY URGENT
- feedback on software automation from the EGI Conference
Preview repository
- released on 2020-10-09
- Preview 1.29.0 AppDB info (sl6): ARC 6.8.0 and 6.8.1, BDII 5.5.26, CVMFS 2.7.4, dCache 5.2.31, DMLite/DPM 1.14.0, frontier-squid 4.13.1, glite-info-update-endpoints 3.0.2, lcg-info 1.12.5, STORM 1.11.18
- Preview 2.29.0 AppDB info (CentOS 7): ARC 6.8.0 and 6.8.1, BDII 5.5.26, CVMFS 2.7.4, dCache 5.2.31, DMLite/DPM 1.14.0, frontier-squid 4.13.1, glite-info-update-endpoints 3.0.2, lcg-info 1.12.5, STORM 1.11.18
- included in the upcoming release: DPM, VOMS
Operations
ARGO/SAM
- HTCondor-CE probes included in the ARGO_MON_OPERATORS profile on May 13th: https://ggus.eu/index.php?mode=ticket_info&ticket_id=146949
- (14th Sept) 70 endpoints, 14 CRITICAL, success rate is about 80%
- Oct 1st: included in the ARGO_MON_CRITICAL profile (A/R computation)
- (Nov 16th) 76 endpoints, success rate (including WARNING) 84.2%
- working on the probe for the host certificate validity check: GGUS 147386
FedCloud
Feedback from DMSU
Upgrade of central argus node
Message sent to administrators of NGIs argus servers:
- A replacement of the central argus servers (lcgargus03.cern.ch & lcgargus04.cern.ch), which are behind the argus.cern.ch & lcgargus.cern.ch aliases, is planned for Tuesday 17th November 2020 between 10:00 and 12:00.
- This replacement should be transparent, requiring no change of configuration on your side. Please report any issue you have with your NGI argus server.
- The two new hosts, lcargus21.cern.ch and lcgargus22.cern.ch are already ready for production, you can remotely test them if you want. The operation next week is simply a change of alias.
Monthly Availability/Reliability
- Under-performed sites in the past A/R reports with issues not yet fixed:
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147748
- HK-HKU-CC-01: migrating DPM from sl6 to CenOS7
- TW-NCUHEP: ARC-CE failures due to outdated CAs package, performance is now good
- NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148519
- LRZ-LMU: CE had problems due to the decommission of SharedFS; the other CE returns UNKNOWN in the IGTF test.
- NGI_HR: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148518
- egee.irb.hr: in the process of a major upgrade from CentOS 6 to CentOS 7, some delays.
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148957
- INFN-CATANIA: SRM problems
- INFN-CATANIA-STACK: recovered
- INFN-PADOVA: decommissioning process
- NGI_PL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147311
- WCSS64: failures on QCG and CREAM CEs
- NGI_UK:
- UKI-NORTHGRID-SHEF-HEP: https://ggus.eu/index.php?mode=ticket_info&ticket_id=146455 ARC-CE re-installed, some condor problems to fix, improving...
- UKI-SOUTHGRID-SUSX: https://ggus.eu/index.php?mode=ticket_info&ticket_id=144720 Migration from CREAM to ARC, WN migration to CentOS7; SRM to be decommissioned; ARC-CE was failing the IGTF test, then solved; site-bdii failures. new failures on ARC-CE.
- ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148515
- ATLAND: downtime due to powercut and quarantine
- ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148956
- CBPF: SRM failures due to information not properly published. Physical access to facilities restricted due to COVID measures; planned a DPM update in December.
- NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148958
- UA-NSCMBR: IGTF outdated; improving...
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147748
- Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: (October 2020):
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=149353
- JP-KEK-CRC-02: migration from CREAM-CE to ARC-CE, some problems with the ARC-CE which has been marked then as "not production"
- CERN-PROD: https://ggus.eu/index.php?mode=ticket_info&ticket_id=149351 webdav failures
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=149352
- INFN-LECCE
- TRIGRID-INFN-CATANIA
- NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=149356
- UA_BITP_ARC: bdii freshness failures
- ROC_LA https://ggus.eu/index.php?mode=ticket_info&ticket_id=149355
- SUPERCOMPUTO-UNAM
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=149353
- sites suspended:
IPv6 readiness plans
- please provide updates to the IPv6 assessment (ongoing) https://wiki.egi.eu/w/index.php?title=IPV6_Assessment
- if any relevant, information will be summarised at OMB
CREAM-CE Decommission
- End of Security Updates and Support: 31st Dec 2020 (Decommissioning deadline)
- Original broadcast: https://operations-portal.egi.eu/broadcast/archive/2293
- PROC16 Decommission of unsupported software
- Decommissioning start date: Oct 1st 2020
- a probe detecting CREAM-CE endpoints will be run, returning WARNING status
- GGUS ticket: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148715
- eu.egi.sec.CREAMCE
- Nov 1st: probe returns CRITICAL status, alarms created on the ROD dashboard, ROD teams start to create tickets
- 1st Jan 2021: EGI Ops will start chasing the sites still providing CREAM-CE endpoints
- By this time service end-points which couldn't be upgraded should be put into downtime by site admin or ROD:
ARC Middleware 5 end of support, migration to ARC 6
- EGI Operations Broadcast
- PROC16 Decommission of unsupported software
- deadline: end of July
- Status
Date | Number of endpoints in BDII | Number of GGUS tickets | Issues |
---|---|---|---|
2020-06-08 | 75 | 42 | Some ARC endpoints publish a timestamp instead of a version like 5.X.Y; we can fairly assume they are ARC6 nightly builds, but we're going to close the corresponding tickets after explicit confirmation from the site admin. |
2020-07-13 | 53 | 29 | - |
2020-09-14 | 34 | 18 | - |
2020-10-12 | 32 | 19 | - |
2020-11-16 | 26 | 16 | - |
Storage accounting
Many sites stopped the publication of storage accounting records. Opened 57 tickets to fix that.
- 12 tickets not solved yet
- page for checking when the records were published: http://goc-accounting.grid-support.ac.uk/storagetest/storagesitesystems.html
- Accounting Portal Prototype view
AOB
Next meeting
In 2021: 2021-01-11 notes