5.4 Trustworthy data archives
As mentioned in Section 5.1, there are some other important aspects of data management that are not included in the FAIR guiding principles. While the development of the FAIR guiding principles led to a definition of good practice regarding digital objects (making them findable, accessible, interoperable, and reusable), the long-term sustainability of these objects is not included in this guidance. However, the impact of data, even when it is perfectly FAIR, will be incredibly limited if it does not remain available for reuse over time. Clearly, there is more needed in the scientific ecosystem than FAIR alone to come to a stable system that is optimised for data management and reuse. The 2019 State of Open Data report (Digital Science et al. 2019) found that “the biggest barrier to research data sharing and reuse seems to be a matter of trust”. Researchers depositing their data in an archive need to be able to trust that their data will be cared for and that there will be stable access to it. For that trust to be fostered, data archives need to be able to convince researchers that 1) they will receive support, data curation, and preservation to a satisfactory degree, and 2) that the repository services will be stable and consistent over time. By providing transparent and verifiable evidence, archives can clearly communicate their trustworthiness and stakeholders can confidently put their trust in the archive.
5.4.1 Trustworthy qualities in data archives
The term Trustworthy Digital Repository (TDR) is often used to describe repositories (and archives) that clearly display their trustworthiness. They are archives that can be trusted by other stakeholders to perform good data curation and long-term digital preservation of their data holdings. It is often said that, for TDRs, providing reliable and long-term access to their data holdings for their designated community is their explicit mission.
There are different ways to display trustworthiness as an archive. When trustworthy qualities are demonstrated in a transparent manner, an archive can be considered trustworthy. To guide archives in this process, different tools and (community-)standards have been developed which can be used to accurately capture all relevant information. An archive can perform a self-assessment using such a tool, but it is of higher value and more often encouraged to pursue formal audit and certification.
Different trust standards have been developed worldwide. The European Framework for Audit and Certification of Digital Repositories recognizes three levels of certification with increasing levels of stringency: 1) basic or core certification, 2) extended certification, and 3) formal certification (European Framework of Audit and Certification n.d.). The certification standard prescribed by CESSDA for their Service Providers is the most commonly used standard in Europe: the CoreTrustSeal (CoreTrustSeal, 2017). For other certification standards, see Section 1.5 of Chapter 1.
CoreTrustSeal is an international, community based, non-governmental, and non-profit organisation promoting sustainable and trustworthy data infrastructures.
To obtain CoreTrustSeal certification, an archive must execute a self-assessment on the Requirements defined by the CoreTrustSeal board. This self-assessment is then submitted and reviewed by qualified members of the Assembly of Reviewers, who are experts on the topic. After a positive evaluation, and possibly some rounds of improvements, an archive will obtain the CoreTrustSeal certificate, which will be valid for three years. After this, the audit has to be repeated to keep the certified status. This way, the continuous quality of an archive is guaranteed. The archive can then display the CoreTrustSeal logo on their website and platform to clearly signal to stakeholders that they are a certified Trustworthy Digital Repository. The final assessment reports from all certified archives are also publicly available on the CoreTrustSeal website, transparently showcasing all certified repositories and archives worldwide.
The Requirements of CoreTrustSeal describe the characteristics of a TDR (CoreTrustSeal Standards and Certification Board 2022). After some initial items on background information, an archive must score itself on topics such as organisational infrastructure, ethics, data integrity, workflows, and security. The Requirements are evaluated and updated every three years through a process of community-feedback to make sure they continuously reflect the repository community and their necessary trustworthy qualities in the most accurate way.
5.4.2 The TRUST principles
In 2020, the TRUST principles for digital repositories were created (Lin et al. 2020) as a set of high level principles to facilitate stakeholders discussion and provide guidance for repositories on the concept of trustworthiness.
The TRUST principles for digial repositories:
- to be transparent about specific repository services and data holdings that are verifiable by publicly accessible evidence.
- the following information should be easily findable for users:
- mission statement
- scope of the repository
- minimum digital preservation timeframe for the data holdings
- pertinent additional features or services (e.g., stewarding sensitive data)
- to be responsible for ensuring the authenticity and integrity of data holdings and for the reliability and persistence of its service.
- this is demonstrated by:
- adhering to the metadata and curation standards of your designated community, along with providing stewardship of the data holdings (e.g., quality control, documentation, technical validation, persistence)
- providing data services (e.g., a portal, machine interface, ability to download data, server-side processing)
- managing the intellectual property rights of data producers, the protection of sensitive information resources, and the security of the system and its content
- to ensure that the data management norms and expectations of target user communities are met.
- this is demonstrated by:
- implementing relevant data metrics and making these available to users
- providing (or contributing to) community catalogues to facilitate data discovery
- monitoring and identifying evolving community expectations and responding as required to meet these changing needs
- to sustain services and preserve data holdings for the long-term.
- this is demonstrated by:
- planning sufficiently for risk mitigation, business continuity, disaster recovery, and succession
- securing funding to enable ongoing usage and to maintain the desirable properties of the data resources that the repository has been entrusted with preserving and disseminating
- providing governance for necessary long-term preservation of data so that data resources remain discoverable, accessible, and usable in the future
- to provide infrastructure and capabilities to support secure, persistent, and reliable services.
- the fitness of technological abilities (software, hardware, technical services) can be demonstrated by:
- implementing relevant and appropriate standards, tools, and technologies for data management
- having plans and mechanisms in place to prevent, detect, and respond to cyber or physical security threats
Source: Lin et al. 2020
Compared to the FAIR principles, these TRUST principles have been backward-engineered from the already existing community standards such as certification schemas. The TRUST principles were not created to replace any of the community standards and best practices that were already in place to showcase trustworthiness. The motivation behind the creation of the principles was to make this unified set of high-level guidelines a way to better support, implement, assess, and discuss the concept of Trustworthy Digital Repositories and to highlight their importance in the scientific ecosystem. The principles are designed as a mnemonic device to remind stakeholders of what is necessary to create an environment in which data is well-cared for and stewarded. They are guidelines for archives to determine what to offer and how to communicate it, conditions for depositors to look for a suitable place to put their data, and a call for funding and developments for other stakeholders, such as funders and governing bodies.
Taken together, the TRUST principles communicate the following: Transparently providing information on your archive and its services will allow users to more easily determine whether the right services are available for their data to be deposited. By showcasing responsibility for stewardship, users and depositors will have confidence that the data holdings will receive adequate curation and will remain accessible over time. A focus on the designated community shows users that an archive can provide detailed care for their type of data and that further developments in community standards and requirements will be continuously implemented in the services, which provides easy reuse and interoperability for the long-term.
Users also need to be convinced that there will be uninterrupted, continued access to their data after depositing, so they know their data won’t disappear or need to be relocated over time. Aside from all the policies, soft-services, and community-focus, the technicalities of providing these services also need to be in order for users to feel confident in safely storing their data. All of this taken together allows users to make informed decisions on where to deposit their data without worry.
TRUST and OAIS
Much like the TRUST principles, the OAIS reference model includes recommendations on providing long-term preservation of data (‘Reference Model for an Open Archival Information System (OAIS)’ 2012). This overlap shows that these much newer principles have not been pulled out of thin air, but are instead based on archival processes that many already have adhered to for a long time. However, the OAIS recommendations on long-term preservation were never intended as a guarantee for trustworthiness, and additional concepts such as governance, resources, and security are absent. After the FAIR principles proved to be a valuable way to express the importance of data management throughout the community, the TRUST principles were created to also better communicate the essential elements of trustworthiness, with long-term preservation being one aspect of it.
Depending on the maturity of your archive, obtaining certification can be quite a challenging and lengthy process. All different departments of the archive must come together to present all the necessary information in a transparent manner. Some processes, workflows, or infrastructures may need to be changed to fit in with trustworthy practices. To help with this process, different types of support are sometimes offered. This could be financial support to pay the CoreTrustSeal administrative fee, or expertise support to help complete the assessment and learn more about trustworthy practices.
Within CESSDA, there is a dedicated Trust Group that provides support to all CESSDA Service Providers and Observers to obtain CoreTurstSeal certification. The group consists of representatives from Service Providers with expertise on the topic of trust, who navigate the continuously evolving landscape surrounding the concept of trustworthiness and translate this to relevant support for CESSDA members (Dolinar et al. 2022).
Each CESSDA Service Provider has a dedicated ‘trust contact’ with whom the Trust Group is in close contact. Individual support plans are crafted for each member to fit their specific needs. On a broader scale, the Trust Group organises workshops and meetings to discuss relevant topics or issues that many struggle with. The Trust Group also works on developing shared materials for members to use when pursuing certification (L’Hours et al. 2022a). They also produced an overview of support approaches, which has been used to craft support programmes run by the FAIRsFAIR, SSHOC, and EOSC-Nordic projects (L’Hours et al. 2020). In these programmes, broad groups of archives were supported to obtain certification or to become more trustworthy in general.
Archives that use Dataverse can also use the guide created by the Dataverse Project community to gather shared information on software and infrastructure for their CoreTrustSeal application (Dataverse Software Guide for CTS Certification 2021).
Examples of displaying trustworthiness from CESSDA Service Providers
This table gives some examples of how different CESSDA members display their trustworthiness in an easily accessible and transparent way.
Display of trustworthiness
The Service Map gives an overview of many important topics of information, such as licences, data access, security, and more.
The Provenance and Data Processing page on DANS’ website gives a clear display of information on the curation for their data holdings.
This page on Metadata Records gives information on the standard FSD uses. All FSD’s metadata records are openly available and harvestable.
Tarkí’s mission statement, as well as other important information regarding data security and management, is clearly displayed on their website. They are working on providing all information in English as well as part of their CTS application.
5.4.3 FAIR-Enabling Trustworthy Digital Repositories
To wrap up this chapter on FAIR and TRUST, it is important to highlight the efforts in converging these topics so they connect more closely. As has been detailed throughout this chapter, it is important for archives to be FAIR-enabling and to display their trustworthiness. If an archive succeeds in both these things, they are considered a FAIR-enabling Trustworthy Digital Repository. This is the kind of organisation that researchers are strongly recommended to deposit their data (such as by funding bodies, data professionals, archives, or open science advocates) in to ensure their data is being made and kept FAIR for the long term. For trustworthiness, certification is a clear way to communicate the adherence to the right qualities. For FAIR-enabling qualities, no formal standards exist yet. Work is being done to converge this field more and develop standards to showcase FAIR-enabling qualities in a clear and transparent manner.
The CoreTrustSeal+FAIRenabling capability-maturity model (L’Hours et al. 2022b) is a self-assessment tool for archives to evaluate themselves not only on the CoreTrustSeal Requirements, but also on accompanying FAIR-enabling qualities. In the future, this tool might be incorporated into the CoreTrustSeal standard to facilitate certification of FAIR-enabling TDRs. Efforts are also being made towards uniting FAIR-enabling TDRs in a network, where trustworthy archives can share experiences and resources and determine standards for the community together. In the rapidly developing landscape of all topics discussed in this chapter, support in the form of a network can help keep archives afloat and recognise the important position they hold in the data ecosystem.
Increase your understanding
Find out more about your archive
Here are some questions that you can ask yourself to learn more about your own archive:
- What is the designated community of your archive? How do you specifically cater to it?
- Using the CoreTrustSeal list of certified repositories, can you find your archive? What about other CESSDA Service Providers that are certified?
- Has your archive received any kind of support to become more FAIR-enabling or trustworthy?
According to archives that participated in the FAIRsFAIR repository support programme, the self-assessment document that you submit to CoreTrustSeal is an excellent information tool for new archive employees, as it bundles all the most vital information regarding the archive. So if you are new and would like to get up to speed with your archive, have a look at the latest certification submission! (FAIRsFAIR, 2022)
Find out more about the CESSDA Trust Group on their website.