User stories
Instruction
Requirements are based on a user story, which is is an informal, natural language description of one or more features of a software system. User stories are often written from the perspective of an end user or user of a system. Depending on the community, user stories may be written by various stakeholders including clients, users, managers or development team members. They facilitate sensemaking and communication, that is, they help software teams organize their understanding of the system and its context. Please do not confuse user story with system requirements. A user story is an informal description of a feature; a requirement is a formal description of need (See section later).
User stories may follow one of several formats or templates. The most common would be:
"As a <role>, I want <capability> so that <receive benefit>"
"In order to <receive benefit> as a <role>, I want <goal/desire>"
"As <persona>, I want <what?> so that <why?>" where a persona is a fictional stakeholder (e.g. user). A persona may include a name, picture; characteristics, behaviours, attitudes, and a goal which the product should help them achieve.
Example:
“As provider of the Climate gateway I want to empower researchers from academia to interact with datasets stored in the Climate Catalogue, and bring their own applications to analyse this data on remote cloud servers offered via EGI.”
No. | User stories |
---|---|
US1 | As a user I want to be able to perform detailed analysis on large volumes of data in parallel using scalable cloud resource in order to achieve more rapid results than sequential processing and avoiding downloading large quantities of data to local storage. |
US2 | As a user I want the results of my analysis available to me anywhere and be able to share it with colleagues before publishing in order to discuss and confirm the outcomes. |
US3 | As a user I want to ensure my input data is accessible regardless of physical location (for example, by making use of persistent identifiers), since then I do not need to implement my own code to deal with these changes. |
US4 | As provider of the Climate gateway I want to empower researchers from academia to interact with datasets stored in the Climate Catalogue, and bring their own applications to analyse this data on remote cloud servers |
US5 | As a data producer I would like scientists to be able to reference the source data used for downstream analysis and get accreditation in any subsequent publications. |
US6 | As a data manager I want any analysis to generate provenance metadata in order to understand what analysis has been performed to allow both confirmation of results and increase confidence in the scientific methods and analysis. |
US7 | As a decision maker I want to have confidence in the scientific results on which I rely to make policy decisions. |
US8 | As a research infrastructure provider I would like to (link up the community AAI portal/ensure my users only need to use a single EOSC portal to interact with ECAS) so that my users can still use the portal they are familiar with to access resources outside of the community |
US9 | As a user, I want to access ECAS from my familiar community portal or the workflow I am used to without having to make use of additional services that I first would have to learn about in order to make use of ECAS. |
(This is not relevant to ECAS - ECAS is not doing data replication ...?) | |
(Same as above, not an ECAS concern?) | |
US12 | As an infrastructure manager I want to reduce the effort of maintaining client side code support |
US13 | A user would like to run a climate data analysis experiment across CMIP51 or CMIP62 data. The targeted model output (?input?) data come from multiple modelling groups across the globe and are therefore hosted at different ENES data sites across Europe. For a specific target experiment, as a preliminary step, the user runs a distributed search on the ENES data nodes to discover the required input files, which will result in a list of input dataset PIDs. The user then assembles a processing job specification and submits it to the ENES data The challenge is to do the server-side and parallel computation on the distributed data by making transparent the access to the data |
US14 | At a later point, another user discovers the data from the previous analysis on the publication service. In order to be certain that the |
US15 | As a user, I want to be able to make selected results of my data analysis or the analysis script I developed available to others. These recipients may be my immediate colleagues but also a wider range of external third parties. The workflow to make these data available should be largely hassle-free for me. |
Note: US12 and 13 come from the original ECAS proposal, the others were derived from information on the ECAS confluence page and discussions with Tobias Weigel
Use cases
Instruction
A use case is a list of actions or event steps typically defining the interactions between a role (known in the Unified Modeling Language as an actor) and a system to achieve a goal.
Include in this section any diagrams that could facilitate the understanding of the use cases and their relationships.
Step | Description of action | Dependency on 3rd party services (EOSC-hub or other) |
---|---|---|
UC1 | User needs to discover the location of all required input data | ESGF Metadata Service/B2FIND |
UC2 | Input data must have a PID associated with it. | Community solutions assigning PIDs, possibly via B2HANDLE |
UC3 | ENES Data Analytics Service must be able to transfer data from its current location to the processing site based on PID (Low priority - I am not sure if we will do this; it is not entirely in the original plan, though I agree it makes sense. It depends on how data input integration ultimately looks like and what can be done with limited effort.) | gridFTP/other? |
UC4 | Output data must be moved to a site where users can share it for others so they can access it via a link provided by the ECAS system. | B2DROP |
UC5 | Users will need to register to use the ECAS service | Appropriate EOSC-AAI Solution |
UC6 | Data must be movable between the output storage in UC4 to a data publication service, where it must be given appropriate metadata and a PID | B2SHARE |
UC7 | Output data shall have appropriate and sufficient metadata and provenance information associated to enable other users to have trust in the data. | ECAS, B2HANDLE profiles (possibly their usage by B2DROP) |
UC8 | A link between the output data and the sources must be maintained, in addition to provenance information related to the processing steps. | ECAS, B2HANDLE profiles (possibly their usage by B2DROP) |
UC9 | Input data must be accessible to the computation regardless of location. | B2HANDLE usage by communities and the DataHub. Support for B2HANDLE PID profiles by DataHub. |
UC10 | Published output data must be assigned a PID | B2SHARE, DataHub |
UC11 | The provenance information must be accessible for published output data | B2SHARE & DataHub usage of B2HANDLE profiles |
UC12 | Users will select individual files or entire directories from their ECAS workspace and then select to publish them. The ECAS workspace will inquire a destination location for the files in the user's B2DROP workspace. The publishing workflow for the users will start from the ECAS workspace but end with a view on the publishing repository (B2DROP) showing the newly published files as confirmation. | B2DROP |
Requirements
Technical Requirements
Requirement ID | EOSC-hub service | GAP (Yes/No) + description | Requirement description | Source Use Case | Related tickets |
---|---|---|---|---|---|
Example | EOSC-hub AAI | Yes: EOSC-hub AAI doesn’t support the Marine IdP | EOSC-hub AAI should accept Marine IDs | UC1 | |
RQ1 | EOSC-hub AAI | ESGF AAI not integrated to any AAI services | Integration of ESGF AAI to one of EOSC AAI services | UC5 | EOSCWP10-41 - Getting issue details... STATUS |
RQ2 | B2DROP | Can be a central service; no need for local installation. User has no interface to B2DROP filesystem; currently user log in to jupyter with username and password. Files automatically moved to B2DROP without user intervention. GAP: Need to integrate AAI to B2DROP . For training purposes, consider using a proxy user for training purposes. | Need to be able to write directly to B2DROP (via mount point inaccessible to users), or have the workflow copy data in using NextCloud OpenCloudMesh API. Will require separate instances for training and production | UC4 | |
RQ2.1 | B2DROP | Publishing files from an ECAS workspace to B2DROP will not require the user to log in to B2DROP separately. Aside from selecting files to publish and a destination folder, the user should also not be asked for additional information (e.g. metadata). | B2DROP must be able to understand and accept IAM security tokens provided by ECAS. Possibly additional detail questions to clarify wrt session management (transparent authentication, selecting destination folder, initiating and confirming transfer as one seamless workflow). | UC12 | |
RQ3 | B2DROP | GAP - UNSURE - If data is moved using OpenCloudMesh, the security needs to be considered. NextCloud website recommends using SSL since user information is passed in plain text. Need to check how B2DROP is configured. | B2DROP must run with SSL enabled | UC4 | |
RQ4 | B2SHARE | GAP - NO (if RQ2 is satified), YES (otherwise) Enable users to push files to B2SHARE. If RQ2 works there is no gap to deal with as the bridge exists. Unless RQ2 works, then beed to integrate AAI to B2SHARE | B2DROP/B2SHARE Bridge required | UC6 | |
RQ5 | Datahub | GAP - UNCLEAR Data publishing and data ingest. Allows contacting multiple communities. | EOSCWP10-67 - Getting issue details... STATUS EOSCWP10-45 - Getting issue details... STATUS | ||
RQ6 | B2HANDLE | GAP - UNCLEAR Both input data and published derived data must be assigned a PID. For third-party users to access provenance information, B2SHARE and possibly also B2DROP need to support recording of minimal provenance information, possibly organized via B2HANDLE profiles. | UC7, UC8, UC11 |
Capacity Requirements
EOSC-hub services | Amount of requested resources | Time period | Related tickets |
---|---|---|---|
B2DROP | Testing & Training (200GB, <1GB/file) | M5 onwards | |
B2DROP | Production (see table, >500MB) | M7 onwards | |
B2SHARE | Testing & Training ((200GB, <1GB/file) | M5 onwards | |
B2SHARE | Production (see table, >500MB) | M7 ideally, M12 latest | |
B2HANDLE | Production, 2-4 prefixes required (CMCC, DKRZ, EGI, spare) | M15 - | |
DataHub | Unknow | M15 | |
IM/Orchestrator | Unknown | M18- | EOSCWP10-68 - Getting issue details... STATUS |