1.7 What is archived?
An archive can provide services for different kinds of data
The kind of data that an archive stores will depend on the archive’s policies. Typically, social science data archives store both qualitative and quantitative research data with relevant documentation, see Data in the Social Sciences (CESSDA Training Team 2017-2022).
Most commonly, archives would store digitalized survey data, i.e. datasets that contain matrices of coded answers of respondents. Administrative data are also archived by many CESSDA archives. Qualitative data, i.e. transcriptions of interviews and focus groups, photographs, videos, newspaper texts etc. are generally less frequently stored. All the data in archives are accompanied by documentation describing the data. For more information about data in social science data archives see 'DMEG - Chapter 1 - Plan (CESSDA Training Team 2017-2022).
The data files handled by an archive can be accessed by users via the online data catalogue or a specialized online software of the archive (e.g., Nesstar or Dataverse, see also the section What are the main tools used by archives?)
For informed and accurate use, data need to be described by source (e.g. historical, voting results, etc.), by format (e.g. numerical, textual, still image, etc.) or by the time the data has emerged (e.g. social media data, big data).
For the data archivist it is crucial to think about long-term preservation formats and how to handle personal data to comply with current legislation, such as the General Data Protection Regulation (GDPR) (CESSDA Training Team 2017-2022), to make sure that the data they offer remains available and reusable for their users over time.
Formats
Digital curators often face the challenge that data and documentation are not prepared in formats suitable for long-term preservation. The safest option to guarantee long-term data access and usable data is to convert data to standard formats that most software are capable of interpreting, and that are suitable for data interchange and transformation. This typically means using open or standard formats (such as OpenDocument Format or ODF, ASCII, tab-delimited format, comma-separated values or XML) instead of proprietary ones.
Proprietary file formats are often not backwards compatible and therefore run the risk of becoming inaccessible when newer versions develop. Some proprietary formats (such as Microsoft Rich Text Format, Microsoft Excel and SPSS) are widely used and likely to be accessible for a reasonable time. However, this will still not be unlimited, and therefore even these widely used formats are not the most suited for long-term sustainability and accessibility, see 'DMEG - Chapter 6 Archive and Publish (CESSDA Training Team 2017-2022).
Even when data cannot be deposited in sustainable data formats, data curators have a responsibility to ensure a long life of data. One of the tasks to satisfy this could be to transform data and documentation into formats suitable for long-term preservation.
More information on formats is available in File formats and data conversion (CESSDA Training Team 2017-2022).
(Sensitive) personal data
Data producers can deposit both, ‘raw’ (primary data that is not transformed in any way) and/or anonymised data. In practice, data archives often receive and accept ‘raw’ data. If it includes personal data, archivists usually work closely with data producers to make a decision regarding anonymisation, since only anonymised data (i.e. data that do not contain any information that may lead to the identification of a particular individual) can be shared/distributed to secondary users. If datasets contain too much personal information and the deletion of all this information would severely limit the scientific value of the data, the data are usually left as they are and the access to these datsets is restricted. This does not apply if data sharing is stated explicitly in a consent form, e.g. when respondents approved the sharing their data.
See the section What are relevant legislations and legal questions in relation to data archiving? for more information on personal data and legislation.
Find out more about your archive
Here are some questions that you can ask yourself to learn more about your archive:
- What kinds of data does your archive generally receive?
- How can your archive's data be accessed by users? Do you have an interface or website? How can users search your data holdings?
- Does your archive advise data depositors on preferred file formats?
- Can you find example consent forms in the archive or on websites?
- Do you have a colleague responsible for the handling of sensitive data? What do they do?