1.11 What tools do archives use?

What systems, software and technology do archives use in archiving research data?

This image exposes different elements in the process of data archiving, such as curation, documentation, and versioning and implies that there are tools you could use.

Different types of tools in different phases of the archiving process

When the data archive receives research data, the data will be checked to ensure that they meet the requirements set by the archive and/or the agreement between the data archive and the data depositor.

Some archives have solutions where data producers can upload their own data directly into an archive - so called "Self-Deposit" solutions. This is commonly an addition to the main archive. Here are some examples of depositing and self-depositing technologies from some archives:

Archive

Deposit

Self-Deposit

Software/Technology

DANS Data Stations + DataverseNL

DANS Data Stations

DANS Data Stations + DataverseNL

SWORD protocol; Dataverse; inhouse developed software (Vault)

NSD

Archiving portal

-

Inhouse developed software, Collectica

ADP

 Archiving portal

 Dataverse

 

CSDA

via direct communication with archiving staff

 -

 

AUSSDA

AUSSDA Dataverse

AUSSDA Dataverse

 Dataverse

Progedo via secure FTP none DBnomics
FORS Archiving portal none SWISSUbase
FSD Archiving portal "Aila Data Service" none Inhouse developed software
ISSDA via secure FTP none Dataverse

 

Curating, administration, documentation, upgrades, versioning 

The main function of a data archive is to curate the research data so that the data retains its value for the research community during long-term archiving. The archive also needs to make sure that the archived research data fulfils the FAIR principles.

Which tools and software the different archives use for this task varies. Some examples are presented here:

Archive

Administrative

Curation

Software/Technology

DANS Data Stations + DataverseNL

various

various software applications and scripts to export data to preferred file formats

Dataverse + inhouse developed software + programming based on scripts such as python + software applications such as Microsoft Office, Adobe Creative Cloud, SPSS, STATtransfer, Irfanview, ArcGIS, QGIS, MapInfo, FFmpeg, ...

CSDA

 MS Office

Nesstar, SPSS, inhouse development,

 Nesstar,  SPSS, inhouse development

AUSSDA

Dataverse, Project management tool, Ticket system

software and scripts to curate and export data to preferred file formats

Dataverse, R, Adobe, Microsoft Office, Stata, STATtransfer, SPSS, Python

Progedo MS Office Nesstar, Oxygen, R, MS Office Nesstar, Oxygen, R, MS Office

FSD

inhouse ticketing and information management system (TIIPII) TIIPII, SPSS, inhouse data curation tools SPSS, Adobe Acrobat, Oxygen,
ISSDA Google/MS Office   Dataverse, OS Office, SPSS, OpenRefine

 

Dissemination

To enhance the value of archived data, data should be made FAIR, i.e.research data should be findable, accessible, interoperable and reusable. Data archives therefore need tools that make archived data easily findable and searchable for students, researchers and others.

Data archives also need to have tools that provide access management for the data, saving time and administrative procedures while ensuring the security of data that needs protection. This is especially important in order to avoid breaching agreements with data producers or regulations such as GDPR. One of the main tools is Dataverse, which is used by several CESSDA service providers, including DANS and AUSSDA (among others).

More information on tools that can be used regarding the wider topics of FAIR and research data management are covered in Chapter 5

These tools alone are not enough to facilitate quality archiving and curation. Archives need experts to employ and update these tools and to perform other archival tasks that tools cannot or do not yet cover.

Find out more about your archive

Here are some questions you can ask yourself to learn more about your archive:

  • Which tool(s) does your archive use for:

    • the Ingest phase?
    • when curating data?
    • for administration, both data acquisition and dissemination?
    • for documentation of data?
    • handling upgrades and versioning of datasets?