6.2 About replication, materials, and related data 

Peer-reviewed journals are increasingly requiring the sharing of the data and materials that serve as the basis for scientific findings in publications (Corti, et al. 2020) . However, the journals themselves normally do not provide the infrastructure for this, leaving it instead to the authors to deposit their replication materials in a repository. Since data archives have strong competence in the domain of data preservation and documentation, they clearly have a role to play.  

What are replication materials?

From an archival perspective, replication can be seen in light of a key question: What do researchers need if they want to replicate a study for verifiability, robustness, repeatability, or generalizability (Freese & Peterson, 2017)? To make possible any of these types of replication, authors should make available the materials that allow others to follow or reproduce the analyses conducted for the publications. This includes well-documented data and analysis codes, as well as a description of the analytic decisions taken. These materials should also include information on the data and its version, as well as the software (and its version) used. FInally, information on the data should include the context and mode(s) of collection, sampling frame, and population.  

Replication materials and archives

What is different compared to “typical” dataset archival materials?

While replication materials are in the end no more (or less) than a collection of related digital files, there are important differences with respect to the data and documentation that are usually deposited in archives. Most notably, replication materials concern the analyses presented in a particular publication, usually a single article. Therefore, the related data are partial, that is, they include only a subset of a larger dataset (e.g., with a subset of variables or cases). Furthermore, the documentation may be somewhat different in nature. To replicate results from a publication, only the information strictly linked to the subset of the data used is necessary. By contrast, a comprehensive dataset for deposit will usually include the full data collected for a defined research project, and the documentation and metadata describe the complete context of the work.