1.4 What is data acquisition?

Acquiring data from producers

Image presents a data archivist who receives several different digital objects (such as documentation and a data file) that she will need to review, archive, and publish.

Data acquisition is the process of acquiring data from data producers, who are in this context referred to as data depositors.

In some countries, there are requirements for sharing data by research funding organisations, i.e. the data producers are obliged to make their data available. They usually do so in cooperation with data archives. The data acquisition is thus initiated by producers and archives are “passive” ingestors of data.

In other countries, where data producers are not obliged to share their data, archives must actively search for potential depositors and persuade them to store their data in the repository. Archives usually contact potential depositors and inform them about the benefits of depositing data in the archive (see the section Why is archiving important?). Then the deposit agreement that specifies requirements regarding storage and access to data is negotiated, and the data are ingested by the archive. In the case of regular depositors, such as research institutions or universities, the process of acquisition can be formalised by long-term agreements which set the principles of collaboration.

When data are identified and deposit agreed, we start with data ingestion (you can find more information on the pre-ingest process in Chapter 3 and on ingest and curation in Chapter 4). When the data are obtained from the depositor, datasets and documentation (metadata) are inspected by the archive. If the data do not satisfy the archive’s requirements regarding the data quality or its content is outside of the scope of the archive, depending on archive policies, such as a data collection policy, the data could be rejected, corrected or resubmitted following feedback. All the important steps of acquisition should be evaluated and recorded in the archive’s internal records.

Archives can set up tools and workflows to support self-archiving by researchers. Self-archiving in principle is a process in which a researcher publishes data using online tools. For these “self-archiving services”, the levels of curation can differ significantly between archives, but simply said, the depositor has a more active role in this process. Two examples are: ReShare in UK Data Service and SowiDataNet|datorium hosted by GESIS. Both organizations offer their expertise, archiving tools, procedures and curation services to researchers. See also the section What are the main tools used by archives?

Find out more about your archive

Here are some questions that you can ask yourself to learn more about your archive:

What requirements or mandates does your country/institute have regarding data sharing and archiving?
Does your archive have a formal acquisition policy?
Does your archive have restrictions regarding the data producers that are allowed to deposit data (e.g. only institutions are allowed to deposit data, not individual researchers)?
Which of your colleagues is responsible for acquisitions? How do they do this?
Does your archive have a self-deposit tool?

Expert tips

Usually, a lot of data is not only produced or applicable in “core social sciences” (sociology, psychology) but also in related research areas (e.g. health sciences, education research), so it can be fruitful to include them into acquisition policy.

It is useful to have different acquisition strategies for new depositors versus regular depositors e.g. in the UK Data Service.

« Previous | Next »