1.2 What does an archive look like and what does it do?

What does a social science data archive look like?

Image presenting several colleagues working in the office behind a computer. Discussing. Archives nowadays mostly work with digital assets.

An archive:

  • has a location of residence,
  • is often related to an institution of higher education (university) or other research institution,
  • has a director and employees,
  • provides research data to a variety of users via the Internet.

What data archives for social sciences look like differs considerably between European countries. They can vary in size, designated community, position in the wider context, ander services performed, for example.

Differences in size can be defined as the differences in the number of employees, number of archived datasets or number of datasets available in English. An example of a smaller archive is the Czech Social Science Data Archive (CSDA), while an example of a larger archive is the UK Data Service (UKDS).  For more information, you can browse the archives' websites, or refer to 'DMEG - Chapter 7 Discover' (CESSDA Training Team, 2017 - 2022).

Other differences can be due to the position of an archive. An archive embedded or connected to a (higher education) institute can have a different workflow and focus than independent archives. The services provided by an archive can also differ based on its characteristics, as well as due to the national context (e.g., legislation and mandates). Different archives can also have different designated communities, or audiences. Some archives focus exclusively on certain disciplines or domain, whereas other archives are more general in their focus.

What does a social science data archive do?

The actual tasks of a social science data archive can be very broad and extensive. They vary based on the characteristics of an archive and its context, and can be developed and updated based on new insights.

An archive acquires data and documentation from the data producers, checks the quality of the documentation and data, prepares it for being shared with users and makes it available. In some countries where the archiving of data from publicly funded research projects is not mandatory, an archive actively addresses owners of data and asks them to provide their data for archiving. An archive also stores data and documentation, maintains and updates databases and file formats, and provides services to their users. Archives also provide training for data users (researchers, students, data journalists), data producers and the archiving community, e.g. the employees of other social science data archives.

What are the core job titles and what do they do?

There are typically several core roles in data archives, such as a data manager, data curator, policy advisor, and information system engineer. However, the type and naming of certain roles and positions, the number of people holding them, and the responsibilities assigned to them may vary, but they always relate to the following functions that a data archive must provide according to the Open Archival Information System - OAIS (find out more about OAIS in the section What is the overall process of archiving from beginning to end?)

  • Curation of data

Data curation is one of the main activities of data archives. It is a chain of processes that are necessary for the long-term preservation of research data. Data curation means that datasets that are ingested, stored and shared need to be examined for consistency, authenticity, integrity, long-term quality and relevance over time. Implicit to the data curation are tasks of data preservation and data management (Palmer et al. 2013), which are both similarly broad terms encompassing many different tasks taken together. Depending on the division of roles and responsibilities in an archive, these tasks can be executed by one person or a large team.

  • Persistent Identifiers (PIDs) to link information

A persistent identifier (PID) is a long-lasting reference to a document, file, web page, author, organisation or other object. While URLs can change or become unavailable, PIDs are meant to reliably point to a digital resource. More information on PIDs can be found in the section What is the overall process of archiving from beginning to end?. A common PID used for datasets is a DOI (Digital Object Identifier). DOIs can be registered through DataCite (DataCite, n.d. [Accessed July 30, 2022b]) when an archive registers to become a member. Some CESSDA archives make use of DOI registrations through DaRA, a service offered by GESIS in collaboration with DataCite (da|ra, n.d.). Other PIDs can also be used (e.g., Handle). For more information on persistent identifiers, see the Persistent Identifier Guide (Persistent Identifier Wijzer n.d.).

  • Making data available

Another one of the core purposes of an archive is to make data available (e.g., for reuse). Data can be made available via an online data catalogue or specialized online software, such as Norwegian Centre for Research Data (Nesstar n.d.) or Dataverse (n.d.). Find out more about making data available in the section How does an archive make data available?

 

European diversity:

Overview of the archive positions present in some CESSDA archives 

Archive name

Position

DANS

Application manager
Information scientist / information systems engineer
Software developers
Data manager
Head IT
Head data archive
Policy officer
Programme/project manager
Project acquisition officer
Communication officer
Privacy officer

Archive name

Position

CSDA

Head of data archive
Data acquisition administrator
Archival system administrator
Administrator of user´s services

Archive name

Position

NSD

Data managers
Software developers
Privacy officer

Archive name

Position

AUSSDA

Data curators
DevOps Engineer
Senior Research Associate
Project assistance

 

 

Find out more about your archive

Here are some questions that you can ask yourself to learn more about your own archive:

  • To what institution is your data archive affiliated?
  • Is your archive rather small or large?
  • What kind of training activities does your archive offer?
  • What are the core positions in your archive? What are the responsibilities of these positions?
  • How does your archive make data available to its users?