4.2 Different services (workflows) for different data deposit agreements

The service that a repository offers might vary according to the chosen license agreements. For example, there can be different pseudonymisation procedures regarding the openness of the published data sets. Or thinking the other way around, the condition of the data set (regarding universe e.g.,) might prescribe the necessary license agreement. This implies small sensible groups of respondents, e.g., pupils/students of a district of a city or maybe refugees; respondents that per se are vulnerable groups.

Furthermore, it is common that different Creative Commons (n.d.) license agreements apply for data, metadata and documentation material. Usually, metadata are CC0, documentation are mostly CC BY 4.0, and data either CC BY 4.0 or also other license agreements are possible.

Image presents different pats to take upon decision if the data will be stored permanently of for short term.

4.2.1 Open Access, Scientific Use, Replication data

The Open Access (OA) licence agreement (also called Public Use) aims at the maximal re-use of the data, often the Creative Commons (n.d.) licenses serve  as copyright licences. It is meant to be used by teachers, students, journalists, and the public but is usually not informative enough for researchers. The Scientific Use (SUF) licence agreement with restricted account-based access aims at a re-use by researchers with a scientific interest in the data, students are also included in this group. For data under restricted controlled access, an archive member is involved in the delivery of the data. The data users might be asked to complete and sign a form. Then, an archive member verifies the scientific legitimacy of the applicant and then grants access to the data.

Because of the different user groups and degree of openness, different ingest procedures might be applied (e.g., regarding pseudonymisation of personal data). This can also vary from archive to archive.

It is often possible to disseminate multiple versions of the data under different access conditions according to the data access policy of the respective data archive. This means, there can be different versions of the same dataset; one is open to the public and would include little if any demographic information, and the version meant to be used by the scientific researchers is only accessible after registration and would include demographic information.

4.2.2 Special conditions; e.g., Embargo or restricted access

If the depositor needs special conditions that require different workflows, this is necessary information for the ingest team. It might happen that depositors want to publish only parts of their data and keep other parts under embargo. Then, the ingest agent needs to be informed which files should be restricted/not accessible in the archive solution and at what point of time (if at all), access shall be given to re-users. Also, the process of how the data is treated internally should be clear. Does the data stay in ‘ingest status’ until the embargo is lifted? Or is the procedure seen as finished and all material (plus intern documentation) stored at its final storage location?

4.2.3 Re-use or preservation?

Making data available for re-use does not necessarily imply a long-term commitment to preservation; likewise, a long-term commitment to preservation does not imply data is available for re-use. It is possible that a repository stores data in the long-term but does not make it accessible to re-users. (This is uncommon, partly due to the costs of holding data that are not available to users.) Or long-term preservation is a service that depositors are charged for and if they do not pay for it, then long term preservation is not offered (only short term, hence, a few years). GESIS (n.d. [Accessed July 30, 2022b]) offers a variety of services, with different elements and costs.