Table of content

Introduction

In order to improve the effectiveness of our SMS processes, the definition of CIs was reviewed. Most of the services we are offering are actually not under our direct control because they are operated by suppliers according to the several OLAs/UAs, so that for us it was important to understand what we can control, what we have to control, what we have to be made aware of but we don't have to control. The outcome was the definition of new CI types that better reflect our situation and a re-design of the internal CMDB.

About the usage of word "service"

For historical reasons the word "service" is used in different contexts with meanings very far from the definition given by FitSM:

  • A way to provide value to a User/Customer through bringing about results that they want to achieve.

So technically a service is an element of the Service Portfolio, even though in the "normal IT life" it may usually refers to:

  • a software stack installed and running on a machine, providing well defined capabilities to the users, accessible via a (service) endpoint;
  • a particular system / software daemon running on a machine;
  • infrastructure components that allow to use and monitor the HTC, cloud, and storage resources: information system, workload management system, authorisation and authentication system, file catalog, monitoring framework, accounting system, etc.
  • the EGI Operational Tools

Every time the word "service" appears in a page of an operational tool or in some old procedure or policy (written before the adoption of the ITSM terminology), it doesn't refer to an element of the Service Portfolio, but rather to one of the cases listed above (often to a particular CI). In general the context where the word is used avoids any confusion.

Changing this way of usage is not a trivial task: we are focusing on using correctly the terminology in the new procedures, policies and operational tools.

CMDB structure

The Configuration Management Database comprises several components, each of them consumed for different purposes:

  • GOC-DB: this is the "federated" part of our CMDB which serves as topology source of information for our infrastructure: services like monitoring, accounting, security, helpdesk, collaboration tools, etc., consume the information stored in GOCDB to perform their daily operation. GOCDB is the place where it is stored the information regarding the Operations Centres, the Resource Centres and the services they provide, and the people involved in their activities. In general, the OCs and the RCs are responsible of managing the services/service components registered in GOC-DB, following the guidelines given by EGI Operations and the EGI infrastructures policies and procedures. For this reason only a small portion of services registered in GOC-DB is under the direct control of EGI foundation (and under the scope of CHM process): it is the services that are partly funded by EGI Foundation. We could call them "Supporting services" because they support the operations activities and other services of the whole infrastructure. They are registered on GOCDB in the EGI.eu NGI.  GOC-DB was designed as a source of information in a federated environment but not to track software related information like for example the version: when necessary it is possible querying the Information System for collecting the version of the services provided by the RCs (The Information system doesn't store historic information). In general, for the information stored in GOCDB we don't need to create baselines when a change occurs;
  • Internal CMDB:  it was created to collect those services and related information that are in the scope of CHM process. At this purpose an analysis of the several aspects of the services was conducted to define their level of control and it is reported as well. The internal CMDB stores mainly information under direct control of EGI Foundation (because either related to other SMS processes or for CHM purposes). Each entry of the Internal CMDB is a CI, with most of the attributes stored in this same Confluence space. As baseline of these CIs we consider the Jira tickets related to the Request For Change that are created any time there is a new version of the software ready to be deployed. For each service there is also a number of "federated" CIs/attributes that are not in the scope of CHM process and for which we don't need to create a baseline before the implementation of a change. 
  • Information System (BDII): most of the technology products running on the service endpoints registered into GOC-DB and belonging to the RCs publishes information in the BDII. Among the several information there is the version of the installed software/middleware. When needed, we query the BDII for this piece of information. The Information system doesn't store historic information.
  • Operations Portal: to manage the information of the Virtual Organisations (VOs) supported in our infrastructure.

Level of control

In the Internal CMDB it is listed for each service under CHM/RDM scope the level of control for the several CIs (types) and attributes according to the new definition:

  • Service components and CIs under direct / full control of EGI Foundation for which:
    • Up-to-date configuration information with enough detail is required and maintained by EGI F.
    • Change approval, coordination of change implementation and post-implementation review carried out by EGI F.
    • Release and deployment activities carried out by EGI F.
  • CIs controlled by suppliers, whose changes need the EGI F. approval
    • Up-to-date configuration information with enough detail is required and maintained by suppliers
    • Change approval and post-implementation review carried out by EGI F.
    • Release and deployment activities carried out by suppliers
  • Service components and CIs with transparency for EGI F. (controlled by supplier / federation member) for which:
    • No detailed configuration information is required, only some essential information (e.g. “who owns and controls it?”) might be needed
    • Change approval, implementation and review as well as release and deployment activities performed by federation member or supplier, with EGI F. kept informed
    • Detailed requirements on the required level of transparency and EGI F.’s options for intervention or approval in certain situations or for specific types of changes to be defined in OLAs / UAs

For us transparency means either that the information is stored and maintained on GOCDB or that we are notified by email about any change.

CI Types

This table summarises the CI types defined in our CMDB and their location. For details, see the next sections.

CI type

Location

Services / Service Components in the Internal CMDBInternal CMDB
ProjectsGOCDB
Operations CentresGOCDB
Resource CentresGOCDB
Service EndpointsGOCDB
Service TypesGOCDB
Service GroupsGOCDB
DowntimeGOCDB
PeopleGOCDB
RolesGOCDB
Scope tagsGOCDB
Accounting metricsInternal CMDB
Virtual Organisations (VOs)Operations Portal

Internal CMDB structure

Each entry of the Internal CMDB is a CI and corresponds to one or more elements of the Service Portfolio; inside each section, there is a first table collecting the attributes under our direct control. Then it is reported the level of control of the several attributes, in particular:

  • if there is a specific "infrastructure" procedure controlling them
  • where they are stored
  • the type of change, useful to say if invoking the CAB is necessary or not
  • who is the approving body: many CIs are managed on a federated/infrastructure level (i.e. OMB, EGI Operations, etc.) as described in the related procedure(s) so that the "classic" EGI CAB is expected to be invoked only in a few cases

Considering that the Federated Attributes/CIs are not under CHM scope, there is no need to track their changes into CHM process; moreover we don't need to create baselines.

Attributes in the internal CMDB

This table collects the attributes under our direct control which are stored in the EGI SMS confluence space.

AttributeNote
Service Portfolio entryit was created an interface with SPM process to ensure that changes in the life cycle of a service are properly propagated to CONFM process 
Federated information entryThe entry in GOCDB for the specific service/service component
SuppliersSUPPDB entry where is also linked the agreement in place for the delivery of the specific service/service component
AvCo planThe Availability and Continuity plan created within SACM process
Capacity planThe Capacity Plan created within CAPM process
Request for ChangeThe Jira tickets to track the RfC
Host certificatesThe host certificates are released by EGI Foundation for the funded services upon request.

GOCDB structure

In GOC-DB it is stored topology information that are consumed by several operational tools and that are related to the resources belonging to the EGI Infrastructure. This is the list of information categories:

  • Projects (currently, only the EGI one) that contain:
  • Operations Centres (contacts, people, resource centres, scope tags and project memberships) that contain:
  • Resource Centres (contacts, geographic location, parent NGI, certification status in the infrastructure, scope tags, people, services and downtimes) that provide:
  • Service endpoints (generic information, service type, status, endpoints, parent site, scope tags and downtimes) it represents the machine which is running a particular technology providing certain capabilities, usually accessible via an endpoint through a particular protocol;
    • service types: a technology used to provide a capability. Each service endpoint in GOCDB is associated with a service type.
  • Service Groups, arbitrary grouping of existing service endpoints that can be distributed across different physical sites;
  • Downtimes (severity, classification; starting, ending, declaration and announcement dates; description and affected services) can be declared for one or more services of a resource centre;
  • People (generic contacts and roles) can have roles either in OCs or RCs.
  • Roles: allow people to perform any useful tasks
  • Scopes: used to group entities such as Sites, Services and ServiceGroups into flexible categories.

A description of the information that can be stored into GOC-DB can be found in https://docs.egi.eu/internal/configuration-database/

In the BDII, the information about the service endpoints is published, grouped by Resource Centre.

Relationships types

Between the several information categories in GOCDB, there are the following relationships:

  • Parent/children:
    • Projects/Operations Centres/Resource Centres/Service Endpoints
    • Projects/Service Groups
  • extra-site memberships:
    • a Service Endpoint can also belong to one or more Service Groups
    • Resource Centres, Service Endpoints, and Service Groups can be grouped into other categories by means of one or more scope tags
  • flavour association:
    • a Service Endpoint is associated at least to one Service Type, depending on the different technologies provided by it.
  • downtimes:
    • they apply to one or more Service Endpoint of a Resource Centre.
  • roles:
    • a registered user needs at least one role in order to perform any useful tasks

The bond among the service components, the Service Type, the UMD package, and the documentation is quite straightforward.

Information consumed by operational tools

These information are needed by the following operational tools for performing particular operations:

  • ARGO/SAM (monitoring)
    • the hosts detected are the ones belonging to the RC in "Certified" status, in the "Production" infrastructure and with the "EGI" scope tag and the ones with the monitoring flag "on"
    • hostname, service type and endpoint: in order to successfully execute the right probe
    • downtimes: if the service is in downtime, the notifications about the monitoring status changes are disabled
    • people and roles: for regulating the access
    • Operations and Resource Centres: for grouping the results of Availability and Reliability computation
  • Operations Portal - ROD Dashboard
    • Operations Centres, Resource Centres, Services: there is a view for each OC displaying the RCs with failures and/or tickets already opened
    • downtimes: avoid opening tickets for services in downtime
  • Accounting portal
    • Operations and Resources Centres: for grouping the accounting data
  • GGUS
    • Operations and Resource Centres: for ticket assignment and notification
    • Downtimes: when assigning a ticket to a RC, it is checked if there is any downtime announced by the RC
  • VAPOR:
    • displays the information published in the BDII in a more structured way, using the topology information from GOC-DB

Configuration items by type

The list of CIs for each type can be got either by browsing the GOCDB portal or by using its programmatic interface. In the following table there is an example of the available API calls

API Methods
Service typesReturns a list of valid service types and associated description
Operations CentresReturns the list of OCs in the EGI scope
Resource CentresReturns the list of production and certified RCs, in the EGI scope
Service GroupsReturns the list of service groups with the service endpoints under those groups.



Software version run by "service endpoints"

The information about the service endpoints is published in the BDII, grouped by Resource Centre.

In case of need, we query the Information System (BDII) for getting the software version of the service endpoints that publish this piece of information.

It is possible either making an ldap query to the BDII or browsing the web tool VAPOR (Resurce explorer section) .

Some ldap query examples:




to find the technology in general$ ldapsearch -x -LLL -H ldap://egee-bdii.cnaf.infn.it:2170 -b "GLUE2GroupID=grid,o=glue" '(objectClass=GLUE2Endpoint)' GLUE2EndpointImplementationName | grep GLUE2EndpointImplementationName | sort | uniq -c


to find the version of a product in general$ ldapsearch -x -LLL -H ldap://egee-bdii.cnaf.infn.it:2170 -b "GLUE2GroupID=grid,o=glue" '(&(objectClass=GLUE2Endpoint)(GLUE2EndpointImplementationName=CREAM))' GLUE2EndpointImplementationVersion GLUE2EntityOtherInfo
to find a version of a particular endpoint$ ldapsearch -x -LLL -H ldap://egee-bdii.cnaf.infn.it:2170 -b "GLUE2GroupID=grid,o=glue" '(&(objectClass=GLUE2Endpoint)(GLUE2EndpointImplementationName=StoRM)(GLUE2EndpointURL=httpg://stormfe1.pi.infn.it:8444/srm/managerv2))'
to find the storage element implementation$ ldapsearch -x -LLL -H ldap://egee-bdii.cnaf.infn.it:2170 -b "Mds-Vo-name=local,o=grid" 'objectClass=GlueSE' GlueSEName GlueSEImplementationName GlueSEUniqueID GlueSEImplementationVersion