General information

UMD/CMD

  • UMD 3.14.2 RC ready
    • problem with dependencies generated within EPEL: Package voms-clients is obsoleted by voms-clients-cpp, trying to install voms-clients-cpp-2.0.13-1.el6.x86_64 instead
    • solution should be setting priorities so that UMD comes first (thanks Mattias)
  • UMD 4 next release in preparation, release scheduled by June
    • first update for SL6
    • adding several products, see products in verification
  • CMD
    • RT setup: IT support to configure CMD together with UMD, discussion in progress
    • Verification process
      • starting with BDII info provider
      • external infrastructure needed to perform the tests
    • Staged-Rollout: TBD

Staged rollout updates

Preview repository

on 2016-05-17 released:

  • preview 1.2.0
    • LCMAPS-plugins-vo-ca-ap 0.0.1-1
    • STORM 1.11.11
  • Preview 2.1.0
    • NorduGrid ARC 15.03 update 6
    • LCMAPS-plugins-vo-ca-ap 0.0.1-1

Generic information about Preview repository: https://wiki.egi.eu/wiki/Preview_Repository

Note: EGI provides the preview repository without any additional quality assurance process, but the products are released as they are provided by the product team. EGI recommends the use of the UMD repositories, which contain software verified through the quality assurance process of UMD.

Operations

Central monitoring

  • this has been postponed due to technical issues in setting up the central instance

RFC proxy will be default

  • moving to RFC proxy instead of legacy proxy
  • in production since a while, everybody is using RFC
  • we will ask VOMS TP to make a little modification on VOMS client, changing the default

EGI Operations Support activities stopped

  • Operations Support core activity has not been re-bid in the phase 2 of the EGI core activities
  • all Operations Support activities have been moved to the EGI.eu Operations
  • all the operational procedures involving operations support have been updated pointing to EGI operations. Please, let us know if we

missed to update any documents.

  • The operations support support unit in GGUS has been decommissioned. Please, use the Operations support unit instead from now on.

Monthly Availability/Reliability

A/R report on ARGO: http://argo.egi.eu/lavoisier/ngi_reports?accept=html

List of the underperforming RCs for (at least) 3 consecutive months:

Decommissioning SL5

Status and actions

NGIs argus server not properly configured

Some time ago (more than a year I think), EGI ran a campaign to have NGIs run a "NGI Argus" service. This campaign resulted in new services being added to goc-db for each NGI.

Unfortunately, as explained in the OMB in February, our monitoring is currently unable to check the deployment of these services: - For 6 services, our monitoring cannot contact the NGI Argus - For 18 services, our monitoring is not authorized to get the right information from the NGI Argus - For 1 service, our monitoring indicates that the NGI Argus is not properly configured and does not pull the rules from argus.cern.ch

In the end, only 5 services are properly configured and monitored!

The changes are rather easy:

  • If we can't contact them, the site needs to make sure that there is no firewall blocking 195.251.55.111 from accessing the argus 'pap' port
  • If we are not authorized, the site needs to add the right ACE to their argus authorization
pap-admin add-ace 'CN=srv-111.afroditi.hellasgrid.gr,OU=afroditi.hellasgrid.gr,O=HellasGrid, C=GR' 'POLICY_READ_LOCAL|POLICY_READ_REMOTE|CONFIGURATION_READ'

The current status of the infrastructure can be found:

  • In the secmon nagios (not sure you have access to this):

https://secmon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_ngi.ARGUS&style=detail&sorttype=1&sortoption=3

  • On the security dashboard:

https://operations-portal.egi.eu/csiDashboard/ngi/any/tab/list/filter/monitoring/page/list?tsid=4

On the security dashboard, each NGI should have a "argus-ban" result:

  • "Ok" means ok
  • "Unknown" means that we can't contact them
  • "High" means that we are not authorized
  • "Critical" means that argus is not pull rules from argus.cern.ch

The parent ticket is https://ggus.eu/?mode=ticket_info&ticket_id=120770

2016_06_13 UPDATE pending tickets:

FedCloud status

  • only GoeGrid (NGI_DE) is not publishing images
  • open tickets to sites where dteam is not working: MK-04-FINKICLOUD -> this can lead to suspension as per OLA!
  • cloud profiles still under approval at OMB, email to be circulated by EGI Operations for approval; if profiles will be approved, the new profile will be used for A/R from July 1st, the suspension will start from August 1st on
A/R ProfileMarchAprilMay
improvements265
unchanged1175
worsening91012
  • CYFRONET-CLOUD (+100%): in the old profile it fails the accounting test
  • GoeGRID (+80.7%): in the old profile it fails the cdmi test
  • TR-FC1-ULAKBIM (+47.59%): it was failing the accounting test in the old profile
  • HG-09-Okeanos-Cloud: https://ggus.eu/index.php?mode=ticket_info&ticket_id=122012 (SOLVED, updated the cert)
    • failures with the probes:
    • eu.egi.cloud.OCCI-Context-ops: CATEGORIES CRITICAL - SSL_connect returned=1 errno=0 state=error: certificate verify failed
    • eu.egi.cloud.OCCI-VM-ops: CRITICAL - SSL connection with "https://okeanos-occi2.hellasgrid.gr:9000/" could not be established! SSL_connect
  • MK-04-FINKICLOUD unreachable
  • NCG-INGRID-PT (+26.74%): https://ggus.eu/index.php?mode=ticket_info&ticket_id=122013 (a new server are going to be put in production, decommissioning the old one)
    • failures mainly with the cloud probes:
    • eu.egi.cloud.OCCI-VM-ops (sometimes warning, sometimes critical): WARNING - "http://aurora.ncg.ingrid.pt:8787" failed to instantiate a COMPUTE instance in the given timeframe! Timeout: 300s
    • eu.egi.cloud.OpenStack-VM-ops: Critical: could not fetch flavor ID, endpoint does not correctly exposes available flavors: 110 Connection timed out
  • SCAI (-21.61%) https://ggus.eu/index.php?mode=ticket_info&ticket_id=122015 (CAs not completely updated)
    • some repeated failures with the CA probes
    • also eu.egi.cloud.OCCI-VM-ops CRITICAL - Unexpected response from https://fc.scai.fraunhofer.de:8787/! Net::HTTP::Post failed! HTTP Response status: [500] Internal Server Error : The server has either erred or is incapable of performing the requested operation.
  • UPV-GRyCAP (-24.56) https://ggus.eu/index.php?mode=ticket_info&ticket_id=122014 (SOLVED, CAs updated)
    • it is still failing the eu.egi.OCCI-IGTF probe
    • org.nagios.OCCI-TCP: 05-11-2016 17:56:27 Connection refused

AOB

Next meeting


  • No labels