General information
Middleware
- EMI repository will be shut down on June 15th, broadcast sent on May 31st https://operations-portal.egi.eu/broadcast/archive/1715
CMD
CMD-OS 1.1.0 RC ready http://repository.egi.eu/sw/production/cmd-os/candidate/1/
UMD
Preview repository
Released on 2017-06-02:
- Preview 1.12.0 AppDB info (sl6): ARC 15.03 u14, davix 0.6.6, DMLite 0.8.6, dpm-dsi 1.9.13, FTS 3.6.8, XRootD 4.6.1
- Preview 2.12.0 AppDB info (CentOS 7): ARC 15.03 u14, davix 0.6.6, DMLite 0.8.6, dpm-dsi 1.9.13, FTS 3.6.8, XRootD 4.6.1, WN 4.0.5
Operations
ARGO/SAM
- ARC-CE probes are updated in order to mitigate the issue with missing jobs (https://ggus.eu/index.php?mode=ticket_info&ticket_id=126724)
- FTS default port changed to 8446 and it is extracted from GOCDB service URL (https://ggus.eu/index.php?mode=ticket_info&ticket_id=128154)
- New probes:
- AAI CheckIn: HTTP checks of all URLs in GOCDB
- NGI Argus: https://sccsec-egi-git.scc.kit.edu/EGI-CSIRT/nagios-plugins-egi.argus-ngi
- WebDAV: https://gitlab.cern.ch/lcgdm/nagios-plugins-webdav
- Internal ARGO probes: API queries, Nagios & ARC-CE monitor test freshness, Consumer & connectors, AMS
- ARGO MON switched from UMD-3 to UMD-4
Testing FedCloud sites
Feedback from Helpdesk
yearly review of the information registered into GOC-DB
2017-04-07
On a yearly basis, the information registered into GOC-DB need to be verified. NGIs and RCs have been asked to check them. In particular:
- NGI managers should review the people registered and the roles assigned to them, and in particular check the following information:
- ROD E-Mail
- Security E-Mail
NGI Managers should also review the status of the "not certified" RCs, in according to the RC Status Workflow;
- RCs administrators should review the people registered and the roles assigned to them, and in particular check the following information:
- telephone numbers
- CSIRT E-Mail
RC administrators should also review the information related to the registered service endpoints.
The process should be completed by Apr 28th.
To track the process, a series of tickets have been opened.
2017-06-12 UPDATE:
- no feedback yet by: AfricaArabia, NGI_DE, NGI_FI, NGI_IL, NGI_NL, NGI_UA;
- still reviewing: NGI_IBERGRID, NGI_IT, ROC_LA.
Monthly Availability/Reliability
- Underperformed sites in the past A/R reports with issues not yet fixed:
- AfricaArabia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=127502 ZA-UCT-ICTS no improvement, no feedback, will be suspended after the meeting
- AsiaPacific
- TW-NCUHEP: site-bdii unstable for network issues with ARGO https://ggus.eu/index.php?mode=ticket_info&ticket_id=128083
- KR-UOS-SSCC: there were srm problems, now also CREAM failures https://ggus.eu/index.php?mode=ticket_info&ticket_id=127024
- NGI_DE GGUS 125430
- LRZ https://ggus.eu/index.php?mode=ticket_info&ticket_id=128087 site-bdii unreachable, GRAM5 failures; improving
- UNI-SIEGEN-HEP: the fix for CREAM probes solved the issues, waiting the end of the month for closing
- wuppertalprod: https://ggus.eu/index.php?mode=ticket_info&ticket_id=127026 the patch to the ARC-CE probes has been applied, the situation is improving
- NGI_FI: https://ggus.eu/index.php?mode=ticket_info&ticket_id=127505 ARC-CE nagios probes bug
- NGI_UA: GGUS 125839
- UA-NSCMBR: bug in the ARC-CE probes
- ROC_Canada: https://ggus.eu/index.php?mode=ticket_info&ticket_id=128097
- CA-MCGILL-CLUMEQ-T2: new problems regarding ssl on CREAM, they were solved, situation is improving
- Underperformed sites after 3 consecutive months and underperformed NGIs:
- AsiaPacific (MY-USM-GCL): https://ggus.eu/index.php?mode=ticket_info&ticket_id=128880
- NGI_CHINA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=128881 QoS violation (SOLVED)
- NGI_FI (CSC) https://ggus.eu/index.php?mode=ticket_info&ticket_id=128883 (SOLVED)
- NGI_FRANCE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=128884 QoS violation
- NGI_IBERGRID (UNICAN) https://ggus.eu/index.php?mode=ticket_info&ticket_id=128885 the site has just been decommissioned
- NGI_IL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=128886 QoS violation
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=128887 QoS violation (SOLVED)
- NGI_PL (IFJ-PAN-BG) https://ggus.eu/index.php?mode=ticket_info&ticket_id=128889
- NGI_RO (RO-11-NIPNE, RO-14-ITIM) https://ggus.eu/index.php?mode=ticket_info&ticket_id=128890
- NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=128891 QoS violation (SOLVED)
- ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=128892 QoS violation (SOLVED)
Decommissioning EMI WMS
As discussed at the February and April/May OMBs, we are making plans for decommissioning the WMS and moving to DIRAC.
NGIs provided WMS usage statistics, and in general the usage is relatively low, mainly for local testing
Moderate usage by few VOs:
- NGI_CZ: eli-beams.eu
- NGI_GRNET: see
- NGI_IT: calet.org, compchem, theophys, virgo
- NGI_PL: gaussian, vo.plgrid.pl, vo.nedm.cyfronet
- NGI_UK: mice, t2k.org
EGI contacted these VOs to agree a smooth migration of their activities to DIRAC, only some of them replied till now:
- compchem is already testing DIRAC
- calet.org: discussing with the users the migration to DIRAC. Interested in a webinar on DIRAC.
- mice: enabled on the GridPP DIRAC server
We need the VO feedback for better defining technical details and timeline:
- NGIs with VOs using WMS (not necessarily limited to the VOs above), please contact them to ensure that these VOs have a back-up plan.
WMS servers can be decommissioned as soon as the supported VOs do not need them any more. The proposal is:
- WMS will be removed from production starting from 1st January 2018.
- VOs have 8 months to find alternatives or migrate to DIRAC
- Considering that this is not an update, the decommission can be performed in few weeks.
IPv6 readiness plans
- Resource Centres: assess the IPv6 readiness of the site infrastructure (real machines, cloud managers)
- NGIs/ROCs please start discussing with sites and provide suggestions for the overall plan
- Resource Centres: assess the IPv6 readiness of the site infrastructure (real machines, cloud managers)
Decommissioning of dCache 2.10 and 2.13
- support for the dCache 2.10 ended at December 2016, tickets opened by EGI Operations to track decommissioning
- dCache 2.13 decommissioning procedure started, in June the probes will get CRITICAL, support from dCache ends in July, upgrades to be performed by August
- please upgrade to 2.16, whose support ends on May 2018, or to 3.0
- take care that the dCache team does not support the upgrade from 2.10 directly to 2.16; only 2.10->2.13 and 2.13->2.16 transitions are supported.
- decommissioning campaign will be started by EGI Operations to monitor the upgrade of the dCache 2.13 instances and follow up with the NGIs/sites at the beginning of August
Testing the new webdav probes
Site | Host | GGUSID | note |
---|---|---|---|
CYFRONET-LCG2 | se01.grid.cyfronet.pl | https://ggus.eu/index.php?mode=ticket_info&ticket_id=128325 | SOLVED |
GRIF | node12.datagrid.cea.fr | https://ggus.eu/index.php?mode=ticket_info&ticket_id=128329 | |
IGI-BOLOGNA | darkstorm.cnaf.infn.it | https://ggus.eu/index.php?mode=ticket_info&ticket_id=127930 | SOLVED |
INFN-T1 | removed | https://ggus.eu/index.php?mode=ticket_info&ticket_id=128326 | SOLVED |
NCG-INGRID-PT | gftp01.ncg.ingrid.pt | https://ggus.eu/index.php?mode=ticket_info&ticket_id=128327 | SOLVED |
UKI-NORTHGRID-LIV-HEP | hepgrid11.ph.liv.ac.uk | https://ggus.eu/index.php?mode=ticket_info&ticket_id=128328 | SOLVED |
egee.irb.hr | lorienmaster.irb.hr |
Missing steps:
- on GOC-DB fill in the webdav URL containing also the VO ops folder, for example: https://darkstorm.cnaf.infn.it:8443/webdav/ops or https://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home/ops/
- it corresponds to the value of GLUE2 attribute GLUE2EndpointURL (containing the used port and without the VO folder)
- follow the HOWTO21 for filling in the information on GOC-DB
- verify that the webdav url (for example: https://darkstorm.cnaf.infn.it:8443/webdav ) is properly accessible
Testing of the storage accounting
As discussed during the January OMB, the APEL team would need one site per NGI for testing the storage accounting. The eligible sites are the ones providing either dCache or DPM storage elements.
More information can be found in the following wiki: https://wiki.egi.eu/wiki/APEL/Storage
List of sites available for test.
2017-06-12 UPDATE:
- 26 sites are sending storage accounting data (only from dCache and DPM SEs). The data has to be verified before deploying the script in production.
- After the discussion at the March OMB, we are evaluating the creation of a new service type on GOC-DB that will be used for:
- authorising the site/SE to publish the accounting data
- making the site/SE appear in the portal
- monitoring that the accounting data are regularly published
Currently the accounting service types are:
- glite-APEL: for authorizing the sending of the messages
- APEL: to monitor the accounting data publication
The proposed name is "APEL-SE"
AOB
Next meeting
- June 12th, 2017 https://indico.egi.eu/indico/event/3144/