1.2 What does an archive look like and what does it do?

What does a social science data archive look like?

Image presenting several colleagues working in the office behind a computer. Discussion. Archives nowadays mostly work with digital assets.

An archive:

  • will be (part of or affiliated to) and organization that has a physical location of residence,
  • is often related to an institution of higher education (university) or other research institution,
  • has a director and employees,
  • provides research data to a variety of users via the Internet.

What data archives for social sciences look like differs considerably between European countries. They can vary in size, designated community, position in the wider context, and services performed.

Differences in size can be defined as the differences in the number of employees, number of archived datasets, or number of datasets. An example of a smaller archive is the Czech Social Science Data Archive (CSDA), while an example of a larger archive is the UK Data Service (UKDS).  For more information, you can browse the archives' websites, or refer to 'DMEG - Chapter 7 Discover' (CESSDA Training Team, 2017 - 2022).

Other differences can be due to the position of an archive. An archive embedded in or connected to a (higher education) institute can have a different workflow and focus than independent archives. The services provided by an archive can also differ based on its characteristics, as well as due to the national context (e.g., legislation and mandates). Different archives can also have different designated communities, or audiences. Some archives focus exclusively on certain disciplines or domains, whereas other archives are more general in their focus.

What does a social science data archive do?

The actual tasks of a social science data archive can be very broad and extensive. They vary based on the characteristics of an archive and its context, and can be developed and updated.

An archive acquires data and documentation from the data producers, checks the quality of the documentation and data, prepares it for being shared with users and makes it available. In countries where the archiving of data from publicly funded research projects is not mandatory, an archive can actively address owners of data and ask them to provide their data for archiving. An archive also stores data and documentation, maintains and updates databases and file formats, and provides services to its users. Archives also provide training for data users (researchers, students, data journalists), data producers and the archiving community, e.g. the employees of other social science data archives.

What are the core job titles and what do they do?

There are typically several core roles in data archives, such as a data manager, data curator, policy advisor, and information system engineer. However, the type and naming of certain roles and positions, the number of people holding them, and the responsibilities assigned to them may vary, but they always relate to the functions that a data archive must provide according to the Open Archival Information System - OAIS (find out more about OAIS in the section What is the overall process of archiving from beginning to end?)

  • Curation of data

Data curation is one of the main activities of data archives. It is a chain of processes necessary for the long-term preservation of research data. Data curation means that datasets that are ingested, stored and shared need to be examined for consistency, authenticity, integrity, long-term quality and relevance over time. Implicit to the data curation are tasks of data preservation and data management (Palmer et al. 2013), which are both similarly broad terms encompassing many different tasks taken together. Depending on the division of roles and responsibilities in an archive, these tasks can be executed by one person or a large team.

  • Persistent Identifiers (PIDs) to link information

A persistent identifier (PID) is a long-lasting reference to a document, file, web page, author, organisation or other object. While URLs can change or become unavailable, PIDs are meant to reliably point to a digital resource. More information on PIDs can be found in the section What is the overall process of archiving from beginning to end?. A common PID used for datasets is a DOI (Digital Object Identifier). DOIs can be registered through DataCite, but an archive will first have to register to become a member. Some CESSDA archives make use of DOI registrations through DaRA, a service offered by GESIS in collaboration with DataCite. Other PIDs, such as a Handle, can also be used. For more information on persistent identifiers, see the Persistent Identifier Guide.

The CESSDA ERIC has its PID policy since 2019, with the update in 2022. The policy is intended to support locating, discovering, referencing, identifying and citing CESSDA Service Providers’ data holdings. (Hausstein & Horton, 2022)

  • Making data available

Another one of the core purposes of an archive is to make data available (e.g., for reuse). Data can be made available via an online data catalogue or specialized online software, such as Dataverse. Find out more about making data available in the section How does an archive make data available?

 

European diversity:

Overview of the archive positions present in some CESSDA archives 

Archive name

Position

DANS

Application manager
Information scientist / information systems engineer
Software developer
Data manager
Head IT
Head data archive
Policy officer
Programme/project manager
Project acquisition officer
Communication officer
Privacy Officer

Archive name

Position

CSDA

Head of data archive
Data Acquisition Administrator
Archival System Administrator
Administrator of user´s services

Archive name

Position

NSD

Data manager
Software Developer
Privacy Officer

Archive name

Position

AUSSDA

Data curator
DevOps Engineer
Senior Research Associate
Project Assistant

 

 

Find out more about your archive

Here are some questions that you can ask yourself to learn more about your archive:

  • To what institution is your data archive affiliated?
  • Is your archive rather small or large?
  • What kind of training activities does your archive offer?
  • What are the core positions in your archive? What are the responsibilities of these positions?
  • How does your archive make data available to its users?