General information
Middleware
UMD/CMD
- UMD 4.6.0 regular release: RELEASED http://repository.egi.eu/2017/12/18/release-umd-4-6-0/
- UMD 4.6.0/CentOS7
- FTS3 3.6.8 - several new features and bug fixes http://fts3-service.web.cern.ch/documentation/releases#qt-release-ui-tabs3
- ARGUS 1.7.2 - update component Argus PAP service version 1.7.2 fixing PAP permissions
- CVMFS server 2.4.1 - http://cvmfs.readthedocs.io/en/2.4/cpt-releasenotes.html
- dmlite 0.8.8 - bug fixes http://lcgdm.web.cern.ch/dmlite-088-being-released-epel
- GFAL 2.14.2 - https://dmc.web.cern.ch/release/gfal2-2.14.2
- GFAL-utils 1.5.1 - http://dmc.web.cern.ch/release/gfal2-util-1.5.1
- GFAL-python 1.9.3 - http://dmc.web.cern.ch/release/gfal2-python-1.9.3
- Gridsite 2.3.4 - https://github.com/CESNET/gridsite/wiki/Gridsite-release-page#GridSite_234
- UI 4.0.3 - first release of User Interface for centos7 (non supported clients removed) https://twiki.cern.ch/twiki/bin/view/LCG/EL7UIMiddleware
- CREAM 1.16.5 - first release of CREAM with C7 and draft support to Accelerator Devices (GPU, MIC); supported batch systems are Torque, Slurm, HTCondor, LSF; see details on https://wiki.italiangrid.it/twiki/bin/view/CREAM/CREAMReleaseUMD4_5_0
- dCache 3.2.10 - fixes https://www.dcache.org/downloads/1.9/release-notes-3.2.shtml#10
- UMD 4.6.0/SL6
- cvmfs-config-egi 2.0.1 - missing cvmfs config since last UMD release http://cvmfs.readthedocs.io/en/2.3/cpt-releasenotes.html
- VOMS admin 3.7.0 - http://italiangrid.github.io/voms/release-notes/voms-admin-server/3.7.0/
- CVMFS server 2.4.1 - http://cvmfs.readthedocs.io/en/2.4/cpt-releasenotes.html
- GFAL 2.14.2 - https://dmc.web.cern.ch/release/gfal2-2.14.2
- GFAL-utils 1.5.1 - http://dmc.web.cern.ch/release/gfal2-util-1.5.1
- GFAL-python 1.9.3 - http://dmc.web.cern.ch/release/gfal2-python-1.9.3
- ARGUS 1.7.2 - update component Argus PAP service version 1.7.2 fixing PAP permissions
- dmlite 0.8.8 - bug fixes http://lcgdm.web.cern.ch/dmlite-088-being-released-epel
- Gridsite 2.3.4 - https://github.com/CESNET/gridsite/wiki/Gridsite-release-page#GridSite_234
- dCache 3.2.10 - fixes https://www.dcache.org/downloads/1.9/release-notes-3.2.shtml#10
- Announcement on URT discuss list, to be announced also through EGI Monthly Broadcast in January
- UMD3 deprecation
- WMS dismission plan presented at December OMB
- sites have started broadcasting WMS decommissioning deadlines
- in parallel, UMD team will test upgrading the umd-release package from UMD3/SL6 to UMD4/SL6 to make usre everything works properly
- plan will be arranged and agreed with PTs in January/February (next meeting scheduled next monday Jan 22nd)
- at some point UMD3 will be "freezed" (no more updates of any kind, either security ones)
- probably we will establish a period of 2-4 weeks during which sites get progressively aware that the old repos won't work anymore and switch to UMD4/SL6
- if any security issue comes out during that period, we can always ask to shut down the repository
Preview repository
- released on 2017-11-15
- Preview 1.15.0 AppDB info (sl6): ARC 15.03 update 17, dCache 2.16.53, XRootD 4.7.1
- Preview 2.15.0 AppDB info (CentOS 7): ARC 15.03 update 17, dCache 3.1.21, XRootD 4.7.1
Operations
ARGO/SAM
FedCloud
Feedback from Helpdesk
Monthly Availability/Reliability
- Underperformed sites in the past A/R reports with issues not yet fixed:
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131661
- IN-DAE-VECC-02 (OK), PK-CIIT (site-BDII problems)
- NGI_RO: https://ggus.eu/index.php?mode=ticket_info&ticket_id=132345
- RO-11-NIPNE: SRM problems, now fixed
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131661
- Underperformed sites after 3 consecutive months, underperformed NGIs, QoS violations:
- AfricaArabia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=132807
- DZ-01-ARN
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=132808
- TW-NTU-HEP: SRM failures
- NGI_AEGIS: https://ggus.eu/index.php?mode=ticket_info&ticket_id=132809
- AEGIS01-IPB-SCL: SRM issues
- NGI_HU: https://ggus.eu/index.php?mode=ticket_info&ticket_id=132810 (T3_HU_Debrecen, suspended)
- NGI_IL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=132812: the sites haven't updated the CAs
- NGI_NDGF: https://ggus.eu/index.php?mode=ticket_info&ticket_id=132813:
- T2_Estonia
- NGI_PL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=132815
- PSNC
- AfricaArabia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=132807
suspended sites: T3_HU_Debrecen
New weights for the NGIs average A/R values, based on Computation Power
Since Dec 1st it has been used the CE's "computation power" for computing the weights for the NGIs average A/R values:
computation power = hep-spec * LogicalCPUs
This is a quantity that can be addable up over the CEs of a site (and over the sites). Until now it has been simply added up the CEs hep-spec values for getting a site global value, but this is not correct, because the hep-spec refers to a particular CE (to the cluster behind that particular CE) and it is not addable up. That is why, first of all, we asked VAPOR to implement the "computation power" as well as the site/NGI "average hep-spec". Have a look for example at the "figures" section: http://operations-portal.egi.eu/vapor/resources/GL2ResSummary
Several sites are still missing the necessary information for computing the weights: check on VAPOR the values published by your sites in order to properly publishing in the GLUE2 schema the number of logical CPUs and the Hep-Spec06 benchmark.
- Example of ldap query for checking if a site is publishing the HepSpec-06 benchmark:
$ ldapsearch -x -LLL -H ldap://egee-bdii.cnaf.infn.it:2170 -b "GLUE2DomainID=pic,GLUE2GroupID=grid,o=glue" '(&(objectClass=GLUE2Benchmark)(GLUE2BenchmarkType=hep-spec06))' dn: GLUE2BenchmarkID=ce07.pic.es_hep-spec06,GLUE2ResourceID=ce07.pic.es,GLUE2ServiceID=ce07.pic.es_ComputingElement,GLUE2GroupID=resource,GLUE2DomainID=pic,GLUE2GroupID=grid,o=glue GLUE2BenchmarkExecutionEnvironmentForeignKey: ce07.pic.es GLUE2BenchmarkID: ce07.pic.es_hep-spec06 GLUE2BenchmarkType: hep-spec06 objectClass: GLUE2Entity objectClass: GLUE2Benchmark GLUE2BenchmarkValue: 12.1205 GLUE2EntityOtherInfo: InfoProviderName=glite-ce-glue2-benchmark-static GLUE2EntityOtherInfo: InfoProviderVersion=1.1 GLUE2EntityOtherInfo: InfoProviderHost=ce07.pic.es GLUE2BenchmarkComputingManagerForeignKey: ce07.pic.es_ComputingElement_Manager GLUE2EntityName: Benchmark hep-spec06 GLUE2EntityCreationTime: 2017-06-20T16:50:48Z dn: GLUE2BenchmarkID=ce01.pic.es_hep-spec06,GLUE2ResourceID=ce01.pic.es,GLUE2ServiceID=ce01.pic.es_ComputingElement,GLUE2GroupID=resource,GLUE2DomainID=pic,GLUE2GroupID=grid,o=glue GLUE2BenchmarkExecutionEnvironmentForeignKey: ce01.pic.es GLUE2BenchmarkID: ce01.pic.es_hep-spec06 GLUE2BenchmarkType: hep-spec06 objectClass: GLUE2Entity objectClass: GLUE2Benchmark GLUE2BenchmarkValue: 13.4856 GLUE2EntityOtherInfo: InfoProviderName=glite-ce-glue2-benchmark-static GLUE2EntityOtherInfo: InfoProviderVersion=1.1 GLUE2EntityOtherInfo: InfoProviderHost=ce01.pic.es GLUE2BenchmarkComputingManagerForeignKey: ce01.pic.es_ComputingElement_Manager GLUE2EntityName: Benchmark hep-spec06 GLUE2EntityCreationTime: 2017-09-05T07:34:26Z
- Example of ldap query for getting the number of LogicalCPUs published by an ARC-CE (due to a bug in te info-provider, CREAM-CE publish the total number under the ExecutionEnvironment class):
$ ldapsearch -x -LLL -H ldap://egee-bdii.cnaf.infn.it:2170 -b "GLUE2DomainID=UA_ILTPE_ARC,GLUE2GroupID=grid,o=glue" 'objectClass=GLUE2ComputingManager' GLUE2ComputingManagerTotalLogicalCPUs dn: GLUE2ManagerID=urn:ogf:ComputingManager:ds4.ilt.kharkov.ua:pbs,GLUE2ServiceID=urn:ogf:ComputingService:ds4.ilt.kharkov.ua:arex,GLUE2GroupID=services,GLUE2DomainID=UA_ILTPE_ARC,GLUE2GroupID=grid,o=glue GLUE2ComputingManagerTotalLogicalCPUs: 168
- Example of ldap query for getting the number of LogicalCPUs published by a CREAM-CE:
$ ldapsearch -x -LLL -H ldap://egee-bdii.cnaf.infn.it:2170 -b "GLUE2DomainID=UKI-SOUTHGRID-SUSX,GLUE2GroupID=grid,o=glue" 'objectClass=GLUE2ExecutionEnvironment' GLUE2ExecutionEnvironmentLogicalCPUs GLUE2ExecutionEnvironmentPhysicalCPUs GLUE2ExecutionEnvironmentTotalInstances dn: GLUE2ResourceID=grid-cream-02.hpc.susx.ac.uk,GLUE2ServiceID=grid-cream-02.hpc.susx.ac.uk_ComputingElement,GLUE2GroupID=resource,GLUE2DomainID=UKI-SOUTHGRID-SUSX,GLUE2GroupID=grid,o=glue GLUE2ExecutionEnvironmentTotalInstances: 71 GLUE2ExecutionEnvironmentLogicalCPUs: 568 GLUE2ExecutionEnvironmentPhysicalCPUs: 71
- Manual for Hepspec06 benchmark.
Next year we are going to open tickets for making the sites either publish or fixing the necessary information.
Decommissioning EMI WMS
WMS servers can be decommissioned. Please follow the procedure PROC12. The plan is:
- Starting from January 2018, put the WMS servers in draining: this will block the submission of new jobs and will allow the jobs previously submitted to finish
- inform in advance your users that you are going to put in draining and then dismiss the WMS servers (as per PROC12)
- there might be several VOs enabled on your WMS servers: in case only few of them need to use the service for few weeks more, you might disable the other VOs
- On Dec 14th EGI Operations sent a new broadcast to the VOs reminding the users the forthcoming WMS decommission
- After the end of February, EGI Operations will open a ticket to the sites that haven't started the decommission process yet
WMS servers in downtime on GOC-DB
VOs have to find alternatives or migrate to DIRAC:
- the HOWTO22 explains how a VO can request the access to DIRAC4EGI and how interact with it by CLI
IPv6 readiness plans
- assessment ongoing https://wiki.egi.eu/w/index.php?title=IPV6_Assessment
- still missing NGIs/ROCs
- added column in FedCloud wiki to monitor IPv6 readiness of cloud sites https://wiki.egi.eu/wiki/Federated_Cloud_infrastructure_status#Status_of_the_Federated_Cloud
webdav probes in production
The webdav probes have been deployed in production. Several sites publish the webdav protocol in the BDII: they have been asked to register the endpoint on GOC-DB and to enable the monitoring, if it wasn't already done.
- webdav endpoints registered in GOC-DB: https://goc.egi.eu/gocdbpi/public/?method=get_service&&service_type=webdav
- link to nagios results: https://argo-mon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_webdav&style=detail
List of sites that not have completed the configuration yet:
- NGI_AEGIS: AEGIS01-IPB-SCL https://ggus.eu/index.php?mode=ticket_info&ticket_id=131033 (in progress...)
- NGI_DE: UNI-SIEGEN-HEP https://ggus.eu/index.php?mode=ticket_info&ticket_id=131036
- NGI_HR: egee.irb.hr, egee.srce.hr https://ggus.eu/index.php?mode=ticket_info&ticket_id=131041 (in progress...)
List of sites that disabled webdav: UNIGE-DPNC, GR-01-AUTH, HG-03-AUTH, CETA-GRID, WUT
For registering on GOC-DB the webdav service endpoint, follow the HOWTO21 in order to filling in the proper information. In particular:
- register a new service endpoint, separated from the SRM one;
- on GOC-DB fill in the webdav URL containing also the VO ops folder, for example: https://darkstorm.cnaf.infn.it:8443/webdav/ops or https://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home/ops/
- it corresponds to the value of GLUE2 attribute GLUE2EndpointURL (containing the used port and without the VO folder);
- verify that the webdav url (for example: https://darkstorm.cnaf.infn.it:8443/webdav ) is properly accessible.
During the January OMB we are going to discuss the inclusion of the probes in the operators and in the critical profile:
- after the OMB, the webdav probes will be added to the ARGO_MON_OPERATORS profile: in this way the failures will generate an alarm on the dashboard, and the ROD teams can open a ticket;
- after (at least) one month, if no particular issue occurs, and if at least 75% of webdav endpoint are passing the tests, the probes will be added in the ARGO_MON_CRITICAL profile, so the results of these probes will be taken into account for the A/R figures.
Storage accounting deployment
During the September meeting, OMB has approved the full-scale deployment of storage accounting. The APEL team has tested it with a group of early adopters sites, and the results prove that storage accounting is now production-ready.
Storage accounting is currently supported only for the DPM and dCache storage elements therefore only the resource centres deploying these kind of storage elements are requested to publish storage accounting data.
In order to properly install and configure the storage accounting scripts, please follow the instructions reported in the wiki: https://wiki.egi.eu/wiki/APEL/Storage
IMPORTANT: be sure to have installed the star-accounting.py script v1.0.4 (http://svnweb.cern.ch/world/wsvn/lcgdm/lcg-dm/trunk/scripts/StAR-accounting/star-accounting.py)
After setting up a daily cron job and running the accounting software, look for your data in the Accounting Portal: http://goc-accounting.grid-support.ac.uk/storagetest/storagesitesystems.html. If it does not appear within 24 hours, or there are other errors, please open a GGUS ticket to APEL who will help debug the process.
List of sites already publishing and of tickets opened is reported here.
PROBLEM: several (DPM) sites are using an old version of the star-accounting.py script. This leads to records having an EndTime 30 days in the future. The star-accounting.py script version to use is v1.0.4 (http://svnweb.cern.ch/world/wsvn/lcgdm/lcg-dm/trunk/scripts/StAR-accounting/star-accounting.py).
The APEL team opened tickets for this issue:
- AEGIS02-RCUB: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131892 (SOLVED)
- AEGIS03-ELEF-LEDA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131893 (SOLVED)
- AUVERGRID: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131894 (SOLVED)
- CAMK: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131895 (SOLVED)
- CETA-GRID: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131896 (SOLVED)
- GARR-01-DIR: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131897 (SOLVED)
- IN2P3-LPC: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131917 (SOLVED)
- RO-02-NIPNE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131918 (SOLVED)
- RO-07-NIPNE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131920 (SOLVED)
- TOKYO-LCG2: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131921 (SOLVED)
- TW-NTU-HEP: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131923 (SOLVED)
- UA-ISMA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131925
- UKI-NORTHGRID-SHEF-HEP: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131926 (SOLVED)
- TASK: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131928
PROBLEM number 2: the APEL repository is receiving an increasing number of storage records that have been encrypted with something that isn’t the APEL certificate, so the records can’t be read them (and so the sender is unknown). If your site isn’t successfully publishing, please comment out the “server_cert” variable in sender.cfg
AOB
Next meeting
- Feb 12th, 2018 https://indico.egi.eu/indico/event/3564/