Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Introduction defining context and scope

General comment

<this part isn’t intended for our “chapter”, but rather for the general introduction of the whole deliverable - if it is considered useful, of course...>

Important to keep in mind that updating, or implementing totally new, technology alone does not improve “usage performance” - also the behavior of the “designated scientific community” will influence the discoverability and ease of reuse of research data. Scientific traditions and previous investments into soft- or hardware can lead to large time constants for change. Adapting new database technology quickly could on paper provide large benefits (to the data providers) like lower costs and easier administration/curation, but may de facto be lowering overall productivity for significant parts of the user community over a long period of time.

<needs a lot more work to set main points in focus...>


<the following are text snippets from articles & reports, that should be synthesized and combined with other info - do *not* consider this as anywhere near the final text!>

Socha et al. 2012 (ch 5): To cite data, we need metadata elements that uniquely identify the data set(s) and make it (them) discoverable. These elements include author, title, publisher, publication date, resource type, edition, version, feature name and URI, verifier, identifier and (persistent) location. In addition info on granularity, provenance, privacy controls and reuse rights are needed to guide re-use.

Persistent (and unique) identifiers have an especially important role for digital data, as it is potentially more “mutable” and “changeable” than printed publications. Handle can serve both humans and machines and redirect to the data object of interest. (But handle registries must be highly accessible, and sustainably maintained!)

Issues: granularity, version control, microattribution (fine-grained and unambiguous credit), contributor identifiers and facilitation of reuse.

Tools needed for data citation discovery, tracking and reuse.

Citation indices (Thomson Reuter, DataCite, ...). Usage metrics! Full-text searches. altmetrics. Browser tools: citation support as plug-ins. Dynamic citation tools: embed into (online) editing software. Search tools, based e.g. on SPARQL and/or RDF. Archiving tools, preserving also data citation snapshots. Data citation mining tools.

The FORCE11 Data Citation Principles state that 1) Data should be considered legitimate, citable products of research; 2) data citations should facilitate giving scholarly credit and normative and  legal attribution to all contributors; 3) claims in scholarly literature that rely on data must include a citation of the corresponding data; 4) data citation should include a persistent method of identification that is (human and) machine actionable, globally unique and widely used by the community; 5) data citations should facilitate access to the data themselves and all relevant metadata and other resources needed to make informed use of them; 6) (at least) unique identifiers and metadata describing the data and its disposition should persist even if the data do not; 7) data citations should facilitate identification of, access to and verification of specific (subsets of) data, and should include information on provenance; 8) data citation methods must be flexible, but at the same time they must support interoperability.

Technologies of interest

Some specific “technology issues” that could be covered:

  • Main technology needs: versionable databases to support “time machine” retrieval of large datasets (also sensor data) that are dynamic,
  • systems for cataloguing and handling more complex collections, both of data sets and metadata (c.f. “research objects”),
  • PID (handle) records used also for identity and fixity verification
  • metadata systems that allow fast and flexible bibliometric data mining and impact analysis.
  • (micro)attribution of credit to data producers and others involved in the processing & management of data objects - especially in conjunction with 1) subsetting of larger datasets; and 2) collections comprising large numbers of smaller individual datasets (perhaps not even from the same RI or even domain)
  • Handle registries also need to become federated, and allow users to add community- or project-specific metadata to the handle records (see recommendations of the RDA WG on PID information types)
  • Connected: data type registries that can support subset identification (and retrieval) - see e.g. recommendations of the RDA WG on Data Type Registries and its proposed spin-off (“Data Types WG”)
  • Query-centric citations for data, allowing for both unambiguous and less storage resource-intensive handling of dynamic data sets
  • ORCID, ISNI: keeping track of people and institutions and their “scientific activities”
  • Provenance connections, supporting automated metadata extraction and production for machine-actionable workflows

And the list goes on - there are plenty more to choose from...

Change history and amendment procedure

The review of this topic will be organised by Margareta Hellström in consultation with the following volunteers: . They will partition the exploration and gathering of information and collaborate on the analysis and formulation of the initial report. Record details of the major steps in the change history table below.For further details of the complete procedure see item 4 on the Getting Started page.

Note: Do not record editorial / typographical changes. Only record significant changes of content.

DateNameInstitutionNature of the information added / changed

Sources of information used

Two-to-five year analysis

<In progress...>

Quite difficult to summarize, as field is evolving rapidly. Will concentrate on issues and ideas that are being discussed now (ca 2016), and try to extrapolate these...

  • Trends towards tighter information exchange (primarily links to content) between publishers, data repositories and data producers.
  • Systems for allocating persistent identifiers will become more user-friendly, but at the same time e.g. handle registries should allow more complex metadata about the objects they index.
  • Moves towards labeling “everything” and “everyone” with PIDs, to allow unambiguous (and exhaustive!) links between entities and therefore also a complete record of activities. (ORCID, ISNI, ...)
  • More effective usage tracking and analysis systems, that harvest citation information not only from academic literature but from a wide range of sources (DataCite, CrossRef, MDC)

State of the art

Subsequent headings for each trend (if appropriate in this HL3 style)

Problems to be overcome

Sub-headings as appropriate in HL3 style (one per problem)

Details underpinning above analysis

Sketch of a longer-term horizon

<to be worked on>

Almost impossible! Some guesses:

  • Much more tightly integrated systems for metadata, provenance, identification and citation
  • Move towards automation of those aspects of the research data life cycle that involve basic tasks like assigning identifiers and citing/referring to all kinds of resources - including data and metadata objects, software, workflows, ...
  • Evolution towards more complex “collections” of research resources, like Research Objects, will necessitate more flexible approaches towards both strategies for identification and detailed, unambiguous citation/referencing parts of such objects

Relationships with requirements and use cases

<to be expanded?>

Connections to RI requirements gathered for identification & citation, cataloguing, curation, provenance, and possibly also processing/workflows.

Work Package 6: The overarching objective is to improve the efficiency of data identification and citation by providing recommendations and good practices for convenient, effective and interoperable identifier management and citation services. WP6 will therefore focus on implementing data tracing and citation functionalities in environmental RIs and develop tools for the RIs, if such are not otherwise available.

ENVRIplus case studies of interest are mainly IC_01 “Dynamic data citation, identification & citation” and IC_09 “Use of DOIs for tracing of data re-use” (likely to be merged, possibly also with IC_06 “Identification/citation in conjunction with provenance”). The primary aim of IC_01 is to provide demonstrators of the RDA Data Citation Working Group’s recommendation for a query-centric approach to how retrieval, and subsequent citation, of dynamic data sets should be supported by the use of versionable database systems. This may be combined with support also for collections of data sets, which can be seen as a sub-category of dynamic datasets, thus addressing also the goals of IC_09. 

Summary of analysis highlighting implications and issues

Bibliography and references to sources

<note: not all of these are used now, and there are also other refs not yet added...>

R.E. Duerr et al. (2011), “On the utility of identification schemes for digital earth science data: an assessment and recommendations”. Earth Science Informatics, vol 4, 2011, 139-160. Available at

R. Huber et al. (2013), “Data citation and digital identification for time series data & environmental research infrastructures”, report from a joint COPEUS-ENVRI-EUDAT workshop in Bremen, June 25-26, 2013. Available via

M.A. Parsons et al. (2010), ”Data citation and peer review”, EOS, Transactions of the American Geophysical Union vol 91, no 34, 24 August 2010, 297-304. Available at

A. Rauber et al. (2015). “Data citation of evolving data. Recommendations of the Working Group on Data Citation (WGDC)”. Preliminary report from 20 Oct 2015. Available at

U. Schwardmann (2015). “ePIC Persistent Identifiers for eResearch” Presentation at the joint DataCite-ePIC workshop Persistent Identifiers: Enabling Services for Data Intensive Research, Paris 21 Sept 2015. Available at

Y.M. Socha, ed. (2013), “Out of cite, out of mind: The current state of practice, policy, and technology for the citation of data”. Data Science Journal vol. 12, 13 Sept 2013. Available at

M. Martone, ed. (2014), “Joint Declaration of Data Citation Principles”, Data Citation Synthesis Group and FORCE11, San Diego CA. Available at

  • No labels