General information
Middleware
UMD
- plans on CentOS8 STARTED
Preview repository
- released on 2020-05-08
- Preview 1.27.0 AppDB info (sl6): ARC 6.5.0 and 6.6.0, CVMFS 2.7.2, dCache 5.2.20, frontier-squid 4.11.2, gfal2 2.17.2, xrootd 4.11.3
- Preview 2.27.0 AppDB info (CentOS 7): ARC 6.5.0 and 6.6.0, CVMFS 2.7.2, dCache 5.2.20, frontier-squid 4.11.2, gfal2 2.17.2, xrootd 4.11.3
Operations
ARGO/SAM
- HTCondor-CE probes included in the ARGO_MON_OPERATORS profile on May 13th: https://ggus.eu/index.php?mode=ticket_info&ticket_id=146949
- 57 endpoints, 17 CRITICAL, success rate is about 70.2%
- on Sept 1st they will be included in the ARGO_MON_CRITICAL profile (A/R computation)
- please fix the failures by that date
- working on the probe for the host certificate validity check: GGUS 147386
- CREAM-CE metrics in the ARGO_MON_OPERATORS profile on May 27th: eu.egi.CREAMCE-JobSubmit, eu.egi.CREAMCE.WN-Csh, eu.egi.CREAMCE.WN-Softver
- results: 177 endpoints, 15 WARNING (Timeout occurred (900 sec) ), 53 CRITICAL. Success rate 70% (61.6% including the WARNING)
- When eu.egi.CREAMCE.WN-Softver is successful:
CREAM JobOutput OK: retrieved outputSandbox: ['std.err', 'std.out'] **** std.err **** **** std.out **** egee01 has UMD 3.14.4
When it fails:
CREAM JobOutput ERROR [DONE-OK, exitCode=1 ]: retrieved outputSandbox: ['std.err', 'std.out'] **** std.err **** **** std.out **** ERROR: unable to find glite, EMI, LCG or UMD WN version on n1037-amd
FedCloud
Feedback from DMSU
Verify configuration records
On a yearly basis, the information registered into GOC-DB need to be verified. NGIs and RCs have been asked to check them. In particular:
- NGI managers should review the people registered and the roles assigned to them, and in particular check the following information:
- ROD E-Mail
- Security E-Mail
NGI Managers should also review the status of the "not certified" RCs, in according to the RC Status Workflow;
- RCs administrators should review the people registered and the roles assigned to them, and in particular check the following information:
- telephone numbers
- CSIRT E-Mail
RC administrators should also review the information related to the registered service endpoints.
The process should be completed by June 22nd.
- 30 tickets
- Not yet solved after 1 month: 16
Monthly Availability/Reliability
- Under-performed sites in the past A/R reports with issues not yet fixed:
- AfricaArabia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=146877
- ZA-WITS-CORE: SE hardware problem, machine sent to the vendor; CREAM-CE failures due to a known issue with the classads library (GGUS 146979)
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=142591
- NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=146871
- GoeGRID: CREAM-CE intermittent failures not affecting ATLAS; failures with ARC-CE
- NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147313
- mainz: some problems in March and April, that could not be fixed easily; in May, the HPC infrastructure was attacked and the whole computer center was shut down; in downtime.
- wuppertalprod: SRM failures to to a BDII issue, fixed
- NGI_UK:
- UKI-NORTHGRID-SHEF-HEP: https://ggus.eu/index.php?mode=ticket_info&ticket_id=146455 ARC-CE re-installed, some condor problems to fix
- UKI-SOUTHGRID-SUSX: https://ggus.eu/index.php?mode=ticket_info&ticket_id=144720 Migration from CREAM to ARC, WN migration to CentOS7; SRM to be decommissioned; ARC-CE was failing the IGTF test, then solved; site-bdii failures.
- ROC_CANADA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=146452
- CA-SFU-T2: SLURM problems caused failures to site-BDII freshness check due to some old jobs not properly cancelled; recovered
- CA-WATERLOO-T2: SRM failures not involving production VOs, fixed; some unscheduled downtime affected the the A/R figures; ARC-CE and Site-BDII back in production with a minimum set of resources; A/R figures are improving.
- AfricaArabia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=146877
- Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: (June 2020):
- NGI_BG: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147747
- BG01-IPP
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147748
- HK-HKU-CC-01
- TW-NCUHEP
- NGI_NL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147749
- SARA-MATRIX: SRM not published in the BDII: planned a network change to solve the problem.
- NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147750
- UA-ISMA: migration to ARC6 and other planned software updates
- NGI_BG: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147747
- sites suspended:
IPv6 readiness plans
- please provide updates to the IPv6 assessment (ongoing) https://wiki.egi.eu/w/index.php?title=IPV6_Assessment
- if any relevant, information will be summarised at OMB
ARC Middleware 5 end of support, migration to ARC 6
- EGI Operations Broadcast
- PROC16 Decommission of unsupported software
- deadline: end of July
- Catalin is in contact with ARC team to get a webinar on ARC administration, scheduled (to be confirmed) for July 6th please contact operations@ for information
- Status
Date | Number of endpoints in BDII | Number of GGUS tickets | Issues |
---|---|---|---|
2020-06-08 | 75 | 42 | Some ARC endpoints publish a timestamp instead of a version like 5.X.Y; we can fairly assume they are ARC6 nightly builds, but we're going to close the corresponding tickets after explicit confirmation from the site admin. |
2020-07-13 | 53 | 29 | - |
Storage accounting
Many of sites stopped the publication of storage accounting records. Opened 57 tickets to fix that.
- page for checking when the records were published: http://goc-accounting.grid-support.ac.uk/storagetest/storagesitesystems.html
- Accounting Portal Prototype view
SECMON failures
Several CEs are failing the job submission tests, preventing pakiti to check the vulnerabilities fixes on the WNs.
- original ticket: https://ggus.eu/index.php?mode=ticket_info&ticket_id=143837
- List of tickets to the sites
- https://ggus.eu/index.php?mode=ticket_info&ticket_id=144732
AOB
Next meeting
Sept 14th, 2020 https://indico.egi.eu/event/5098/