General information
Middleware
UMD
- CentOS8 discussion still ongoing
- UMD 4 June update
- ARC 6.12.0 will be included in the upcoming release (end of June)
- other products to be included: HTcondor, gfal2, lcmaps-plugins, xrootd 5.1.1, StoRM 1.11.21, DDNS probe
- repository frontend web pages restored as static pages
Preview repository
- released on 2021-05-20:
- Preview 2.33.0 (CentOS 7): ARC 6.11.0, STORM 1.11.20 and 1.11.21, VOMS 04-21
- released on 2021-06-10
- Preview 2.34.0 (CentOS 7): ARC 6.12.0, CVMFS 2.8.1, xrootd 5.2.0
Operations
ARGO/SAM
- HTCondor-CE probes
- deployed on secmon and pakiti: GGUS 150006
- working on the probe for the host certificate validity check: GGUS 147386
- With 8.9.12 installed (expected the week of Mar 15), you should be able to query remote HTCondor-CEs for their host certificate using the following:
$ python -c 'import htcondor; ad = htcondor.Collector("collector2.opensciencegrid.org:9619").locate(htcondor.DaemonTypes.Schedd, "hosted-ce10.opensciencegrid.org"); print htcondor.SecMan().ping(ad, "READ")["ServerPublicCert"]' | openssl x509 -noout -subject -enddate subject= /CN=hosted-ce10.opensciencegrid.org notAfter=Apr 26 12:26:42 2021 GMT
- testing the new version of the probe available with HTCondor 9.0.0
FedCloud
Feedback from DMSU
New Known Error Database (KEDB)
The KEDB has been moved to Jira+Confluence: https://confluence.egi.eu/display/EGIKEDB/EGI+Federation+KEDB+Home
- problems are tracked with Jira tickets to better follow-up their evoulution
- problems can be registered by DMSU staff and EGI Operations team
Verify configuration records
On a yearly basis, the information registered into GOC-DB need to be verified. NGIs and RCs have been asked to check them. In particular:
- NGI managers should review the people registered and the roles assigned to them, and in particular check the following information:
- ROD E-Mail
- Security E-Mail
NGI Managers should also review the status of the "not certified" RCs, in according to the RC Status Workflow;
- RCs administrators should review the people registered and the roles assigned to them, and in particular check the following information:
- telephone numbers
- CSIRT E-Mail
RC administrators should also review the information related to the registered service endpoints.
The process should be completed by July 2nd.
Monthly Availability/Reliability
- Under-performed sites in the past A/R reports with issues not yet fixed:
- NGI_HR: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148518
- egee.irb.hr: major upgrade from CentOS 6 to CentOS 7; tests currently fail due to UNKNOWN status returned
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150818
- INFN-PISA: HTCondorCE and SRM failures
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=151844
- CNR-ILC-PISA: downtime for CREAM-CE decommission and migration to ARC-CE; issues in configuring torque/maui.
- INFN-BARI CE and SRM failures, fixed
- INFN-ROMA3: failures occurred during migration from CREAM-CE to HTCondorCE, and with migration to CentOS7; fixed SRM configuration.
- INFN-TRIESTE: the site was in downtime for migration to HTCondorCE; new CE failures
- NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=151847
- UA-IFBG: CE configuration issues, DNS misconfiguration; test jobs remain pending. Jobs were stuck in the ARC spool not being pushed to the LRMS, SOLVED.
- ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148956
- CBPF: DPM updated; SRM failures due to information not properly published, fixed; other SRM failures due to available space
- ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150817
- ICN-UNAM: replaced CREAM-CE; SE certificate expired; new failures with HTCondorCE; problems disappeared after re-installation.
- NGI_HR: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148518
- Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: (May 2021):
- NGI_BY: https://ggus.eu/index.php?mode=ticket_info&ticket_id=152255
- BY-NCPHEP CE failures due to missing information; fixed.
- NGI_FRANCE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=152253
- AUVERGRID: Long downtime connected to IN2P3-LPC site
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=152256
- INFN-GENOVA
- INFN-MIB
- NGI_NL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=152254
- BEgrid-ULB-VUB: CE failures only with one nagios server because a wrong mapping of the user certificate used to submit jobs; solved.
- NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=152258
- UA-BITP: authentication issues with one of the nagios servers, fixed.
- UA-KNU
- NGI_BY: https://ggus.eu/index.php?mode=ticket_info&ticket_id=152255
- sites suspended:
IPv6 readiness plans
- please provide updates to the IPv6 assessment (ongoing) https://wiki.egi.eu/w/index.php?title=IPV6_Assessment
- if any relevant, information will be summarised at OMB
APEL migration from ActiveMQ to ARGO Message Service (AMS)
- ActiveMQ is going to be dismissed at the end of June: for security reasons it is not possible maintain it any longer.
- Migration insructions (HTCondorCE, Storage, and Cloud accounting): https://github.com/apel/ssm/blob/dev/migrating_to_ams.md
- ARC 6.12.0 released, instructions:
- http://www.nordugrid.org/arc/releases/6.12/release_notes_6.12.html
- all the sites with ARC-CE need to update to this version
- Recommended versions:
- Apel Clien: 1.9.0
- APEL SSM: 3.2.0
- Cloud accounting campaign:
- HTCondorCE and Storage accounting campaign:
- ARC-CE and storage accounting campaign:
- Most common issues:
- mismatch between the host certificate subject registered in GOCDB and the real DN
- SAN field missing / wrongly defined in the host certificate
- DNS entries not completely defined
- same host used to send different types of accounting records
- a new version of ARGO Message Service mitigates the problems related to DNS entries and the SAN field:
Prerequisites for using AMS
- A valid host certificate from an IGTF Accredited CA.
- A GOCDB 'Site' entry flagged as 'Production'.
- A GOCDB 'Service' entry of the correct service type flagged as 'Production'. The following service types are used:
- For Grid accounting use 'gLite-APEL'.
- For Cloud accounting use 'eu.egi.cloud.accounting'.
- For Storage accounting use 'eu.egi.storage.accounting'.
- The 'Host DN' listed in the GOCDB 'Service' entry must exactly match the certificate DN of the host used for accounting. Make sure there are no leading or trailing spaces in the 'Host DN' field.
AOB
Next meeting
Jul or Aug