Date: Thu, 28 Mar 2024 19:57:13 +0100 (CET) Message-ID: <2096654109.247.1711652233036@czmuims01.ops.egi.eu> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_246_1167718070.1711652233034" ------=_Part_246_1167718070.1711652233034 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
Monitoring is the key service needed to gain insights into an infr= astructure. It needs to be continuous and on-demand to quickly detect, corr= elate, and analyze data for a fast reaction to anomalous behavior. The chal= lenge of this type of monitoring is how to quickly identify and correlate p= roblems before they affect end-users and ultimately the productivity of the= organization.
The ARGO Monitoring Service (https://argo.e= gi.eu/egi/documentation) provides a flexible and scalable framewo= rk for monitoring status, availability and reliability of a wide range of s= ervices provided by infrastructures with medium to high complexity. ARGO ge= nerates reports using customer defined profiles (e.g. for SLA management, o= perations, etc.). During the report generation, ARGO takes into account cus= tom factors such as the importance of a specific service endpoint and sched= uled or unscheduled downtimes. Foundations of ARGO Monitoring = Service are:
Management teams can monitor the availability and reliability of t= he services from a high level view down to individual system metrics and mo= nitor the conformance of multiple SLAs. The dashboard design e= nables easy access and visualization of data for end-users. APIs are also s= upported so as to allow third parties to gather monitoring data from the sy= stem .
The key features of ARGO Monitoring Service are:
The ARGO Monitoring service collects status, performance (metrics)= results from one or more monitoring engine(es) and delivers daily and/or m= onthly availability (A) and reliability (R) results of distributed services= . Both status results and A/R metrics are presented through a Web UI, with = the ability for a user to drill-down from the availability of a site to ind= ividual test results that contributed to the computed figure
Monitoring Engine: This service executes the serv= ice checks against the infrastructure and delivers the metric data (probe c= heck results) to the Messaging Service.
POEM: This service is used in order to define che= cks (probes) and associate them to service types. Each grouping of checks a= nd service types forms a POEM profile.
ARGO Analytics & Compute Engine: ARGO Analyti=
cs & Compute Engine includes computational job definitions for ingestin=
g data, calculating status and availability/reliability and a management se=
rvice to automatically configure, deploy and execute those jobs on an Apach=
e Flink Cluster and forward the results to the appropriate destinations (HD=
FS, Argo Web API, Notifications).
ARGO WEB API: Rest-like HTTP API service that pro= vides access to status and availability/reliability results. Supports token= based authentication and authorization with established roles. Results are= provided in JSON Format.
ARGO Notifications: If there is a problem with a =
service an alert notification should be sent. Based on the real-time layer,=
alerting is introduced to the ARGO Monitoring Service. Real-time status ev=
ents are the basis of alerts. Events are generated in the Analytics engine =
during computations, based on a set of rules. The alerts are customizable a=
nd contain detailed information about the various levels of groups (service=
endpoint, group of sites, site).
WEB UI - Lavoisier: The Web UI is based on a data= aggregation framework called Lavoisier. Lavoisier is the component used to= store, consolidate and =E2=80=9Cfeed=E2=80=9D data into the web applicatio= n. The global information from the primary and heterogeneous data sou= rces retrieved by means of the use of the different plug-ins. The col= lected information is structured and organized within configuration files i= n Lavoisier and, finally, made available to the web application without the= need for any further computations.
Follow the steps: