Return to ENVRI Community Home
<this part isn’t intended for our “chapter”, but rather for the general introduction of the whole deliverable - if it is considered useful, of course...>
Important to keep in mind that updating, or implementing totally new, technology alone does not improve “usage performance” - also the behavior of the “designated scientific community” will influence the discoverability and ease of reuse of research data. Scientific traditions and previous investments into soft- or hardware can lead to large time constants for change. Adapting new database technology quickly could on paper provide large benefits (to the data providers) like lower costs and easier administration/curation, but may de facto be lowering overall productivity for significant parts of the user community over a long period of time.
<needs a lot more work to set main points in focus...>
<the following are text snippets from articles & reports, that should be synthesized and combined with other info - do *not* consider this as anywhere near the final text!>
Socha et al. 2012 (ch 5): To cite data, we need metadata elements that uniquely identify the data set(s) and make it (them) discoverable. These elements include author, title, publisher, publication date, resource type, edition, version, feature name and URI, verifier, identifier and (persistent) location. In addition info on granularity, provenance, privacy controls and reuse rights are needed to guide re-use.
Persistent (and unique) identifiers have an especially important role for digital data, as it is potentially more “mutable” and “changeable” than printed publications. Handle can serve both humans and machines and redirect to the data object of interest. (But handle registries must be highly accessible, and sustainably maintained!)
Issues: granularity, version control, microattribution (fine-grained and unambiguous credit), contributor identifiers and facilitation of reuse.
Tools needed for data citation discovery, tracking and reuse.
Citation indices (Thomson Reuter, DataCite, ...). Usage metrics! Full-text searches. altmetrics. Browser tools: citation support as plug-ins. Dynamic citation tools: embed into (online) editing software. Search tools, based e.g. on SPARQL and/or RDF. Archiving tools, preserving also data citation snapshots. Data citation mining tools.
The FORCE11 Data Citation Principles state that 1) Data should be considered legitimate, citable products of research; 2) data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors; 3) claims in scholarly literature that rely on data must include a citation of the corresponding data; 4) data citation should include a persistent method of identification that is (human and) machine actionable, globally unique and widely used by the community; 5) data citations should facilitate access to the data themselves and all relevant metadata and other resources needed to make informed use of them; 6) (at least) unique identifiers and metadata describing the data and its disposition should persist even if the data do not; 7) data citations should facilitate identification of, access to and verification of specific (subsets of) data, and should include information on provenance; 8) data citation methods must be flexible, but at the same time they must support interoperability.
Technologies of interest
Some specific “technology issues” that could be covered:
And the list goes on - there are plenty more to choose from...
The review of this topic will be organised by Margareta Hellström in consultation with the following volunteers: . They will partition the exploration and gathering of information and collaborate on the analysis and formulation of the initial report. Record details of the major steps in the change history table below.For further details of the complete procedure see item 4 on the Getting Started page.
Note: Do not record editorial / typographical changes. Only record significant changes of content.
|Date||Name||Institution||Nature of the information added / changed|
Quite difficult to summarize, as field is evolving rapidly. Will concentrate on issues and ideas that are being discussed now (ca 2016), and try to extrapolate these...
<to be worked on>
Almost impossible! Some guesses:
<to be expanded?>
Connections to RI requirements gathered for identification & citation, cataloguing, curation, provenance, and possibly also processing/workflows.
Work Package 6: The overarching objective is to improve the efficiency of data identification and citation by providing recommendations and good practices for convenient, effective and interoperable identifier management and citation services. WP6 will therefore focus on implementing data tracing and citation functionalities in environmental RIs and develop tools for the RIs, if such are not otherwise available.
ENVRIplus case studies of interest are mainly IC_01 “Dynamic data citation, identification & citation” and IC_09 “Use of DOIs for tracing of data re-use” (likely to be merged, possibly also with IC_06 “Identification/citation in conjunction with provenance”). The primary aim of IC_01 is to provide demonstrators of the RDA Data Citation Working Group’s recommendation for a query-centric approach to how retrieval, and subsequent citation, of dynamic data sets should be supported by the use of versionable database systems. This may be combined with support also for collections of data sets, which can be seen as a sub-category of dynamic datasets, thus addressing also the goals of IC_09.
<note: not all of these are used now, and there are also other refs not yet added...>
R.E. Duerr et al. (2011), “On the utility of identification schemes for digital earth science data: an assessment and recommendations”. Earth Science Informatics, vol 4, 2011, 139-160. Available at http://link.springer.com/content/pdf/10.1007%2Fs12145-011-0083-6.pdf
R. Huber et al. (2013), “Data citation and digital identification for time series data & environmental research infrastructures”, report from a joint COPEUS-ENVRI-EUDAT workshop in Bremen, June 25-26, 2013. Available via http://dx.doi.org/10.6084/m9.figshare.1285728
M.A. Parsons et al. (2010), ”Data citation and peer review”, EOS, Transactions of the American Geophysical Union vol 91, no 34, 24 August 2010, 297-304. Available at http://modb.oce.ulg.ac.be/wiki/upload/Alex/EOS_data_citation.pdf
A. Rauber et al. (2015). “Data citation of evolving data. Recommendations of the Working Group on Data Citation (WGDC)”. Preliminary report from 20 Oct 2015. Available at https://rd-alliance.org/system/files/documents/RDA-DC-Recommendations_151020.pdf
U. Schwardmann (2015). “ePIC Persistent Identifiers for eResearch” Presentation at the joint DataCite-ePIC workshop Persistent Identifiers: Enabling Services for Data Intensive Research, Paris 21 Sept 2015. Available at https://zenodo.org/record/31785
Y.M. Socha, ed. (2013), “Out of cite, out of mind: The current state of practice, policy, and technology for the citation of data”. Data Science Journal vol. 12, 13 Sept 2013. Available at https://www.jstage.jst.go.jp/article/dsj/12/0/12_OSOM13-043/_pdf
M. Martone, ed. (2014), “Joint Declaration of Data Citation Principles”, Data Citation Synthesis Group and FORCE11, San Diego CA. Available at https://www.force11.org/group/joint-declaration-data-citation-principles-final