About PID's

On this page:

Identification of data (and associated metadata) throughout all stages of processing is really central in any RI. This can be achieved by allocating unique and persistent digital identifiers (PIDs) to data objects throughout the data processing life cycle. The PIDs allow unambiguous references be made to data during curation, cataloguing and support provenance tracking. They are also a necessary requirement for correct citation (and hence attribution) of the data by end users, as this is only possible when persistent identifiers exist and are applied in the attribution. In short, in today’s expanding “open data world”, PIDs are an essential tool for establishing clear links between all entities involved in or connected with any given research project

There are a range of different types of persistent identifiers available. ICOS has chosen to primarily work with those built on the Handle system (http://www.handle.net/), including PIDs from the European Persistent Identifier Consortium (ePIC; http://doc.pidconsortium.eu/) and DOIs (Digital Object Identifiers) from DataCite (https://www.datacite.org/). For people in ICOS, the Open Researcher and Contributor ID system (ORCID ; http://orcid.org) is recommended (required?).

Identifying data objects

In principle, one can say that if one can think of at least one situation in which someone will have to make a reference to a given digital object, then it needs to be registered and assigned its own unambiguous and unique identifier. The associated information in the registry needs to contain at least information about where the data object is located (can be a direct pointer to the storage location, or to a so-called landing page), who created it and when it was created. Other useful metadata includes size, checksum and (mime) type of the object.

DataCite DOIs have been used for scientific articles and reports for over a decade, and is therefore well known in the researcher communities. In ICOS, DOIs will be assigned to all published data objects (Level 2 and Level 3 - “citable data” in the Figure below), since these are the ones most likely to be referred to, or cited, in scientific contexts. All other data objects that are stored in the ICOS repository, including sensor data and Near Real Time data products, can be considered as “raw data” or “referable data” in the Figure. These will instead be assigned PIDs from ePIC. These PIDs are just as unique and persistent as the DOIs, and could very well be used for citing data in articles and reports - but they are primarily used for referencing data objects in work flows and provenance records.

Versioning of datasets

How to use PID’s in relation to different versions of a dataset, or dynamic datasets where new data is continuously added, is still under investigation.