Chapter 3: Pre-ingest: From early contact to data transfer


Chapter 3 of the guide focuses on what data archives do to ensure that the incoming data meet the criteria of data collection and quality requirements before the data are accepted in the archive for further curation and preservation.

After introducing the concept of pre-ingest and why it is important, the chapter is organised in four main parts following a timeline, starting with outreach activities before the data submission or leading to it, and ending with rejection or acceptance of the deposit and transition to data curation in the archive. The principles to follow in pre-ingest are complemented with examples on how data archives do it in practice.

After finishing this chapter, you should be able to understand the rationale with the pre-ingest phase, and to define, evaluate and develop pre-ingest activities relevant and meaningful to your data archive.

OAIS framework

Many archives use the Open Archiving Information (OAIS) Reference Model as a conceptual framework for defining processes, procedures and actions necessary for the long-term preservation and accessibility of data. The OAIS framework starts with a data producer submitting a data and documentation package to the archive for curation (Ingest). However, preparations for the preservation of research data may start before that. Data and metadata can be reviewed, processed and complemented several times before it is accepted for curation and Ingest in the archive. This stage is often called Pre-ingest, inspired by the OAIS model, even if it is not formally part of it.

Why Pre-Ingest?

The activities in the Pre-ingest stage aim to make sure that data can be properly preserved, and thus reduce the workload of data curators later in the curation and preservation process.

Ideally, all researchers producing data of high value for the research community share it through a trusted digital data archive. Data submitted to an archive should be well described and structured, with proper and extensive documentation and in appropriate formats. In reality, the situation is often different. Researchers may be hesitant to share their data because of a variety of reasons, like lack of incentives, fear of misuse of data, or lack of time and resources needed to prepare the data for sharing. As a result, data archives might find they invest a lot of time and effort in advocating for benefits of sharing data. Even though researchers might be willing to share their data, data are not always complete, thoroughly documented according to standards and in formats suitable for preservation, so there often is a considerable amount of work to be done before data can be accepted into the archive.

Thus pre-ingest might be a way to effectively reduce the workload for data curations by tackling any arising quality, completeness and formats issues of the data and metadata before the curation and preservation process in the archive is started. In addition, the pre-ingest phase might give an opportunity to build up and strengthen relationships with the designated community and contribute to the data sharing culture and good Research Data Management (RDM) practices through personal contacts with researchers.

Pre-ingest timeline

In general, the Pre-ingest stage starts with the first contact with the researcher(s) producing data that can be deposited to the archive.

Depending on the size, level of maturity, needs and resources, a data archive may decide on the best timing and extent of support it offers to researchers. A broader approach may include outreach activities like advocating for benefits of data archiving, educating research communities in RDM practices and cooperating in data management planning, assuming these activities would have a positive effect on data deposits coming into the archive in the future. Another approach to timing and extent of researcher support is more focussed on incoming data, as data and documentation are reviewed, assessed and prepared to meet the requirements for acceptance for further curation in the archive.

Even though pre-ingest activities can be aligned around a timeline, it is important to acknowledge that some activities are not limited to one time-point only. 

The Pre-ingest phase is usually finalised when data are accepted or rejected into the archive. It means that the data archive staff have enough information from the depositor to decide whether the data can be included in the collection of the data archive. After a data deposit agreement is signed, the archive can start the curation and preservation process and any other action on the basis of the contract.

Pre-ingest timeline: graphical overview

The image presents 4 elements of pre-ingest, which are Outreach and support, Data submission, Data appraisal and review and Acceptance or Rejection.

Extensive and more diverse data sharing advocacy and support activities might be easier to implement in well-established and more resourceful archives with more staff. Smaller archives might have the potential to develop closer, more personalised relationships with research groups.

In some archives, Pre-ingest activities may involve additional staff working separately from data curators after data has been ingested. Therefore it might be useful to have a formal transition from Pre-ingest to Ingest phase. In other archives, it might be that roles and workflows in Pre-ingest and Ingest activities are accomplished by the same person. Therefore, there is no need for division of Pre-ingest and Ingest activities, and transition between them in workflows might not be so important.