Return to ENVRI Community Home
Please provide your feedback on this Science Demonstrator using the questionnaire at https://survey2.icos-cp.eu/ENVRIplus-evaluator!
The EuroArgo Data Subscription Service (DSS) allows researchers to subscribe to customized views on Argo data, selecting specific regions and time-spans, and choosing the frequency of updates. Tailored updates are then provided on schedule to researchers’ private storage.
As shown in (Figure 81), the Data Subscription Service (DSS) involves the following basic components: 1) a data selection portal as frontend, 2) the Global Data Assembly Center (GDAC) of EuroArgo, 3) EUDAT B2SAFE storage, 4) DRIP, 5) EGI FedCloud resources, and 6) a subscription service component for managing the subscriptions registered via the data selection portal.
DRIP (the Dynamic Real-time Infrastructure Planner, developed by WP7, Task T7.2) is integrated to execute the data selection process using parallel computation. DRIP can dynamically deploy and manage as many Virtual Machines as required to cope with the load in order to be able to process the subscriptions in a timely manner. Once results were available, they were pushed to B2SAFE and the user was notified by email.
The typical workflow is as follows: users interact with the DSS via the portal, registering to receive updates for specific areas and time ranges for selected parameters such as temperature, salinity, and oxygen levels. The GDAC receives new datasets from regional centres and pushes them to the B2SAFE data service. The DSS maintains records of subscriptions including selected parameters and associated actions. DRIP plans, provisions, deploys, scales and controls the data filtering application. EGI FedCloud provides cloud resources to host the application. The application itself is composed of a master node and a set of worker nodes.
When new data is available to the GDAC, it pushes them to the B2SAFE service, triggering a notification to the DSS, which consequently initiates actions on the new data. If the application is not deployed to FedCloud then DRIP provisions the necessary VMs and network so that the application may be deployed. Next, the deployment agent installs all the necessary dependencies along with the application including configurations to access the Argo data. The DSS signals to the application master node the availability of the input parameters to be processed, whereupon it partitions the input tasks into sub-tasks and distributes them to the workers. If the input parameters include deadlines then the master will prioritise them accordingly. The monitoring process keeps track of each running task and passes that information to the DRIP controller. If the programmed threshold is passed, then the controller will request more resources from the provisioner. Finally, the results of each task are pushed back to the B2SAFE service triggering a notification to the subscription service, after which it notifies the user.
Shown in Figure 92, users can use DSS web portal to subscribe to interested datasets. A typical subscription task is made up of a set of inputs: a) an area expressed as a bounding box; b) a time range; c) a list of parameters required in data products (e.g. temperature); and d) optionally, a deadline.
The pilot activity was initiated by the marine research community, however, the possibility to receive regular transmissions of data, especially in near-real time, directly from the organisation responsible for the data collection and (pre-)processing, is very important to many large initiatives. Generic initiatives will themselves be interested to operate subscription services for their outputs, based around a trusted repository hosting synchronised versions of their data collections. Such a service may also allow the provision of new features to end users, generating more visibility.
Data subscription services are expected to play an increasing role in the future, as the number of data producers and their respective output continues to increase rapidly. The mechanism tested and implemented by this demonstrator could contribute to the development of both a common standard for input streams to enhancements of digital collaborative spaces for researchers and data providers.
Youtube video is at: https://youtu.be/PKU_JcmSskw