General information
Middleware
UMD/CMD
- CMD-OS 1 (Mitaka) update
- inclusion of cloudkeeper-os/cloudkeeper ongoing https://ggus.eu/?mode=ticket_info&ticket_id=129660
- planned inclusion of user id isolation patch for Mitaka
- APEL team asked to include cASO 1.1.1 -> in progress
- CMD-ONE 1 (ONE5/C7) first major release
- all set, release in preparation
- UMD 4.6 scheduled for mid November (UI, CREAM, ARGUS)
Preview repository
Released on 2017-07-07:
- Preview 1.13.0 AppDB info (sl6): ARC 15.03 u15, dCache 2.16.40, frontier-squid 3.5.24-3.1, LCGdm-dav 0.18.2, QCG Broker 4.2.0
- Preview 2.13.0 AppDB info (CentOS 7): ARC 15.03 u15, ARGUS 1.7.1, CREAM 1.16.5, dCache 3.1.9 & SRM client 3.0.11, frontier-squid 3.5.24-3.1, LCGdm-dav 0.18.2, QCG Broker 4.2.0
Operations
ARGO/SAM
Testing FedCloud sites
Feedback from Helpdesk
Monthly Availability/Reliability
- Underperformed sites in the past A/R reports with issues not yet fixed:
- AsiaPacific
- TW-NCUHEP: still undeperforming for frequent failures https://ggus.eu/index.php?mode=ticket_info&ticket_id=128083
- KR-UOS-SSCC: there were srm problems, now also CREAM failures, proposed the suspension https://ggus.eu/index.php?mode=ticket_info&ticket_id=127024
- ROC_Canada: https://ggus.eu/index.php?mode=ticket_info&ticket_id=128097
- CA-MCGILL-CLUMEQ-T2: still some failures
- NGI_BG (BG01-IPP) https://ggus.eu/index.php?mode=ticket_info&ticket_id=129370 : suggested to mark the SE as not production
- NGI_IT https://ggus.eu/index.php?mode=ticket_info&ticket_id=129381
- HEPHY-UIBK: recovered
- INFN-ROMA1-CMS: still underperforming, but the bug in the nagios probes for the CREAM (ticket GGUS 128151) is then disappeared,
- AsiaPacific
- Underperformed sites after 3 consecutive months, underperformed NGIs, QoS violations:
- ROC_CERN https://ggus.eu/index.php?mode=ticket_info&ticket_id=129957 QoS violation
- NGI_AEGIS https://ggus.eu/index.php?mode=ticket_info&ticket_id=129959
- NGI_CH https://ggus.eu/index.php?mode=ticket_info&ticket_id=129960
- T3_CH_PSI
- NGI_DE https://ggus.eu/index.php?mode=ticket_info&ticket_id=129961
- FZK-LCG2
- NGI_GRNET https://ggus.eu/index.php?mode=ticket_info&ticket_id=129962
- NGI_UA https://ggus.eu/index.php?mode=ticket_info&ticket_id=129963
- UA_IFBG
- NGI_UK https://ggus.eu/index.php?mode=ticket_info&ticket_id=129964 QoS violation (SOLVED)
suspended sites: IFJ-PAN-BG, ZA-MERAKA, ZA-UJ
Decommissioning EMI WMS
As discussed at the February and April/May OMBs, we are making plans for decommissioning the WMS and moving to DIRAC.
NGIs provided WMS usage statistics, and in general the usage is relatively low, mainly for local testing
Moderate usage by few VOs:
- NGI_CZ: eli-beams.eu
- NGI_GRNET: see
- NGI_IT: calet.org, compchem, theophys, virgo
- NGI_PL: gaussian, vo.plgrid.pl, vo.nedm.cyfronet
- NGI_UK: mice, t2k.org
EGI contacted these VOs to agree a smooth migration of their activities to DIRAC, only some of them replied till now:
- compchem is already testing DIRAC
- calet.org: discussing with the users the migration to DIRAC. Interested in a webinar on DIRAC.
- mice: enabled on the GridPP DIRAC server
We need the VO feedback for better defining technical details and timeline:
- NGIs with VOs using WMS (not necessarily limited to the VOs above), please contact them to ensure that these VOs have a back-up plan.
WMS servers can be decommissioned as soon as the supported VOs do not need them any more. The proposal is:
- WMS will be removed from production starting from 1st January 2018.
- VOs have 4 months to find alternatives or migrate to DIRAC
- Considering that this is not an update, the decommission can be performed in few weeks.
2017-08-21 UPDATE: eli-beams.eu is interested in testing DIRAC; the process for enabling the VO on te DIRAC4EGI server has started.
IPv6 readiness plans
- Resource Centres: assess the IPv6 readiness of the site infrastructure (real machines, cloud managers)
- NGIs/ROCs please start discussing with sites and provide suggestions for the overall plan
- Resource Centres: assess the IPv6 readiness of the site infrastructure (real machines, cloud managers)
Decommissioning of dCache 2.10 and 2.13
- support for the dCache 2.10 ended at December 2016, tickets opened by EGI Operations to track decommissioning
- dCache 2.13 decommissioning procedure started, in June the probes will get CRITICAL, support from dCache ends in July, upgrades to be performed by August
- please upgrade to 2.16, whose support ends on May 2018, or to 3.0
- take care that the dCache team does not support the upgrade from 2.10 directly to 2.16; only 2.10->2.13 and 2.13->2.16 transitions are supported.
- decommissioning campaign started by EGI Operations http://go.egi.eu/decommdcache213
webdav probes in production
The webdav probes have been deployed in production. Some sites were already contacted for enabling the monitoring of their webdav endpoints:
Site | Host | GGUSID | note |
---|---|---|---|
CYFRONET-LCG2 | se01.grid.cyfronet.pl | https://ggus.eu/index.php?mode=ticket_info&ticket_id=128325 | SOLVED |
GRIF | node12.datagrid.cea.fr | https://ggus.eu/index.php?mode=ticket_info&ticket_id=128329 | |
IGI-BOLOGNA | darkstorm.cnaf.infn.it | https://ggus.eu/index.php?mode=ticket_info&ticket_id=127930 | SOLVED |
INFN-T1 | removed | https://ggus.eu/index.php?mode=ticket_info&ticket_id=128326 | SOLVED |
NCG-INGRID-PT | gftp01.ncg.ingrid.pt | https://ggus.eu/index.php?mode=ticket_info&ticket_id=128327 | SOLVED |
UKI-NORTHGRID-LIV-HEP | hepgrid11.ph.liv.ac.uk | https://ggus.eu/index.php?mode=ticket_info&ticket_id=128328 | SOLVED |
egee.irb.hr | lorienmaster.irb.hr |
link to nagios results: https://argo-mon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_webdav&style=detail
Several sites are publishing in the BDII the webdav endpoints:
- AsiaPacific: JP-KEK-CRC-02
- NGI_AEGIS: AEGIS01-IPB-SCL
- NGI_CH: UNIGE-DPNC, UNIBE-LHEP
- NGI_DE: UNI-SIEGEN-HEP
- NGI_GRNET: GR-01-AUTH, HG-03-AUTH
- NGI_HR: egee.irb.hr, egee.srce.hr
- NGI_IBERGRID: CETA-GRID, NCG-INGRID-PT
- NGI_FRANCE: GRIF-IPNO, GRIF-LAL, GRIF-LPNHE
- NGI_IL: IL-TAU-HEP, TECHNION-HEP, WEIZMANN-LCG2
- NGI_IT: IGI-BOLOGNA, INFN-GENOVA, INFN-MILANO-ATLASC, INFN-ROMA3, INFN-T1
- NGI_PL: CYFRONET-LCG2, WUT
- NGI_UK: UKI-NORTHGRID-LIV-HEP, UKI-NORTHGRID-MAN-HEP
- ROC_CANADA: CA-MCGILL-CLUMEQ-T2
Checked with:
$ ldapsearch -x -LLL -H ldap://egee-bdii.cnaf.infn.it:2170 -b "GLUE2GroupID=grid,o=glue" '(&(objectClass=GLUE2Endpoint)(GLUE2EndpointInterfaceName=webdav))' GLUE2EndpointImplementationName GLUE2EndpointURL
ACTIONS for NGIs and sites: The Operations Centres are asked to verify with their sites if the webdav protocol is really (intentional) enabled on their storage elements (if not, the information should be removed from the BDII), and report to EGI Operations
- The webdav service endpoint should be registered in GOC-DB for being properly monitored: the nagios probes are executed using the VO ops, so please ensure that the protocol is enabled for ops VO as well
- the webdav probes are harmless: they are not in any critical profile, they don't raise any alarm in the operations dashboard, and the A/R figures are not affected. We need time and more sites for gathering statistics on their results before making them critical.
For registering on GOC-DB the webdav service endpoint, follow the HOWTO21 in order to filling in the proper information. In particular:
- on GOC-DB fill in the webdav URL containing also the VO ops folder, for example: https://darkstorm.cnaf.infn.it:8443/webdav/ops or https://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home/ops/
- it corresponds to the value of GLUE2 attribute GLUE2EndpointURL (containing the used port and without the VO folder)
- verify that the webdav url (for example: https://darkstorm.cnaf.infn.it:8443/webdav ) is properly accessible
Testing of the storage accounting
As discussed during the January OMB, the APEL team would need one site per NGI for testing the storage accounting. The eligible sites are the ones providing either dCache or DPM storage elements.
More information can be found in the following wiki: https://wiki.egi.eu/wiki/APEL/Storage
List of sites available for test.
2017-07-27 UPDATE (more details in the July OMB presentation):
- 23 sites have verified their numbers and 3 in progress
- for the deployment in production we need to:
- Get sites to add new GOCDB service type
- Change broker queue name and get sites to swap
- Update documentation
- Add storage system scripts to UMD
- Migrate storage view to new development Portal
- by September we should be ready for a wide roll-out of storage accounting
- detailed instructions for the sites will be circulated
AOB
Next meeting
- Oct 9th, 2017 https://indico.egi.eu/indico/event/3353/