6.3 The role of data archives in replication
Why use a data archive for replication data
Trusted repositories offer many of the same advantages for replication data as they do for research data. The two main arguments for using trusted repositories are long-term preservation and expert advice and service in the deposit process. Putting replication materials in a trusted repository ensures that even if someone reads the article decades in the future, the data and other materials will be usable. The time period of this guarantee can vary. For example, at GESIS, replication materials are guaranteed with bitstream preservation for a minimum of 25 years. In addition, trusted repositories offer expert advice. This can include advice on what materials to deposit, files formats, attending to legal and ethical obligations, and more (refer to the support offered by FORS, for example).
Workflows need to be adapted
From the archive or repository perspective, there will be some differences in the workflow with replication materials. First, replication materials may not need to be preserved indefinitely, unlike the usual process of archiving and documenting datasets, and so preservation practices may differ. Second, in practical terms, replication materials (including the data) may require less work to curate, or at least these may require different skills on the part of archivists, for example, in checking syntaxes against the data. Finally, another significant difference is that it is usually desirable for replication materials to have fewer restrictions to access – obtaining replication materials should be easier compared to comprehensive, underlying data, in part because the partial nature of the data may reduce the likelihood of identification of individual cases. In some cases, the replication data may not even include any features (such as variables or cases) that would enable identification.
With this in mind, archives that accept replication materials should ensure that, at a minimum,:
- there are sufficient materials for replicating findings (i.e., all relevant data and documentation);
- there is sufficient metadata regarding the materials;
- any partial data do not allow for the identification of individual participants unless appropriate consent has been obtained and documented; and
- there is an attributed DOI to the materials that can be cited in the article.
Challenges for archives in handling replication materials
The differences in nature and workflow described above regarding replication materials and standard or full archived data indicate that this might present particular challenges for archives that aim to accept and integrate replication materials into their regular archival systems. In particular, since the two types often should not be treated in the same manner in terms of preservation, curation, and access, archives that try to use a single system for everything may be confronted with some challenges.
For example, full curation of replication materials and data may not be desirable or feasible if there are limited resources. Given the extensive effort needed for full curation, many archives choose to invest their resources in larger datasets with more reuse potential. Also, restricted access to such materials, which can be the default condition for full datasets held in archives, may be unnecessarily burdensome for those who wish to obtain and conduct replications. Maintaining a single system with distinct access conditions for partial and full data may be complicated from a technical standpoint.
It is for this reason that some data archives and service providers within CESSDA (e.g., GESIS, FORS, ICPSR) have opted for establishing completely separate systems for replication materials, apart from their regular archival solutions.
While there are challenges associated with the handling of replication materials, there are clear benefits for researchers and data archives alike.