3.1 Outreach and support
Regular contacts with designated communities are important to keep up with their needs. This includes both communication with data producers and potential data users.
One of the goals of Pre-Ingest is to ensure that the data archive accepts data that do meet the requirements of its Data collection and Acquisition policy and quality requirements.
Outreach and support activities may vary depending on RDM practices, data sharing culture, levels of formal regulation regarding Data Management Plans (DMPs), which describe certain technical, organisational and also legal sides on data handling processes, and Open Access (OA) policies to research data.
Outreach activities may focus on the designated communities in general, like, organising general RDM workshops, or more targeted to a particular institution, research group, or individual researchers planning a data deposit.
Outreach activities may be more training oriented or more acquisition oriented, but usually serve both purposes. Basically, outreach and support serve two main goals: (i) to facilitate the inflow of data, and (ii) to make sure the data and metadata meet the quality requirements of the archive.
3.1.1 Identify and localise data
There are different channels and strategies for identifying and localising data of interest to the designated community of the data archive. It is important to consult the Data collection and Acquisition policy to decide what data and what channels might be most appropriate.
Systematic data inventories
Systematic inventories of data produced by professional and reputable research institutions in social sciences or certain research areas (sociology, psychology, political science etc.) might be used in order to identify and localise high quality data with a high potential of reuse.
The starting point could be a list of projects in an institution or a list of recent publications in a research area aiming to identify the data sets used by researchers.
Another possibility might be to look systematically at research funded by national research funders during the current or previous funding cycle, identifying projects in which production of important data is planned (for example, national or regional representative surveys), and contacting the researchers. However, if there are no explicit requirements by funders that data should be published, this approach may prove not very productive.
This approach might be interesting
- for newly established data archives, with services that are not yet well-known or acknowledged in the research community,
- for data archives with a low inflow of data.
Identify and localise data: why and how do we do it?
The Swedish National Data Service (SND) from 2014 to 2016 had been keeping track of the projects financed by major national research funders in social sciences in a special project database. Even though there were no requirements for data publishing from the funders, SND was sending introductory data sharing advocacy information, offering support with research data management and planning for data publication. The response varied, but even though some useful contacts were established, contacts with several hundred research projects produced only a few published data sets.
In a project with a similar focus in 2010-2011, UKDS targeted high profile and well-funded projects at or near the start of their funding and tried to get good RDM in place early, in order to make final data appropriate for archiving. A description of the project can be found here (UK Data Archive n.d.).
Targeting specific data
There might be interest in the designated community for special data collections, enabling research on highly visible topics currently in the news (e.g. Covid-19 data now or data on migration-related aspects during the refugee crisis of 2015-2016 in Europe).
There are different ways of achieving it, but it is important to involve researchers in the process, as they might have good knowledge of data that are being produced and data that might be reused in further research. In case it is an interdisciplinary effort (e.g., data collection for studying climate change might include both social sciences and climate data), it might be good to cooperate with relevant research infrastructures in other scientific domains.
There may be different reasons that lead to the need to rescue data. For example, research organisations can be reorganized or closed, or funding for long-term projects can be terminated, or primary researchers might change jobs or retire. In such or similar situations, data are at risk of getting lost because of the lack of resources for proper curation and preservation. Usually, there are regulations in place on how research material should be archived in such situations. However, institutional archiving practices seldom focus on the preservation of research data and even less on actively promoting the reuse of data.
It is very important that unique, high value data that are impossible to reproduce are preserved. Such rescue actions are often time sensitive. It is important to work with the researchers or data producers before the project or institution has been closed down and people who can provide all context information that is needed for data to be reusable have moved on. Therefore it is important for data archive to have visibility and contacts in research community to be able to reach out in time, should there be such a need.
Rescuing data: Carmichael Watson Project
The case of the Carmichael Watson Project (2009-2013) at the Centre for Research Collections (CRC) illustrates how technological obsolescence and security issues have led to the urgent need to capture and preserve a web resource full of rich and unique archival material. As a result of security issues with underlying infrastructure, the resource was taken off the live web in 2018. Then in 2020, due to the obsolescence of the Virtual Machine (VM) RedHat5 technology where it was internally hosted, the resource became at imminent risk of loss. This series of challenges highlight the special vulnerability of web-based technologies to rapid change and evolution.
In response to these challenges, the CRC has undertaken a web archiving approach using OS pywb-based tools (archiveweb.page and replayweb.page). The CRC aims to maintain access to an authentic (as possible) version of the web resource through the more sustainable formats generated by these tools. The implementation of this web archiving approach, supported by further CRC investment, has (i) mitigated the imminent risk of losing this valuable resource, and (ii) provided a precedent for maintaining access to these types of web-based materials without the need to resource the maintenance of out-dated systems and infrastructure.
The CRC in collaboration with freelance web archivist Anisa Hawes have begun web archiving this resource. So far, around 4K unique URLs have been captured, but only a fraction of the full archival resource has been completed. A more comprehensive case study to present the experience of the Carmichael Watson Project website is planned to be completed by the end of 2021 or the beginning of 2022. A detailed description of the current situation is available at this link (The University of Edinburgh n.d.).
Cooperation with journals
Preservation of data linked to publications encourages validation and reproducibility of research, therefore a data archive might consider targeting data linked to publications with high impact or journals with high impact factor.
Besides, more and more journals are requiring data availability statements from authors, and encouraging data publication in repositories, so data archives should consider offering service of preservation of data linked to publications.
Cooperation with national journals by supporting them in developing open data policies and providing data archiving services, can be especially rewarding in raising general awareness of benefits of data archiving.
Cooperation with journals: why and how do we do it?
Implementing the RDA Research Data Policy Framework in Slovenian Scientific Journals
ADP in 2019 implemented a pilot project introducing RDA guidelines for journals data policies to four national journals in different disciplines (Štebe et al. 2020). The project is described in detail here: https://datascience.codata.org/articles/10.5334/dsj-2020-049/
The workflow for the preservation and publishing of data linked to publications in academic journals might have some special characteristics. For example, there might be some additional metadata elements required (linked data, linked publications), or some additional roles, like data replication specialist or data manager from a journal. Management of journal data is currently developing rapidly.
Following data requests
If there are topics or specific data sets that data users are frequently inquiring about, it might make sense to see if data can be acquired and preserved in the archive for future use. If there are some reasons it is not possible because of the sensitive character of the data or any other reason, or researchers are not interested, there is not much to be done. If, however, there is a reason to expect that researchers of the original study are not aware of the services the data archive can offer and are willing to invest time and effort to publish their data, following data user requests might be beneficial.
3.1.2 Advocate for data sharing
Even though data sharing culture is developing along the OA policy framework, there still might be a need to advocate for data sharing to facilitate higher inflow into a data archive.
Besides advocating the benefits of sharing research data on the archive webpages, participation in events like national conferences and contacting research institutions might be important too.
Arguments should be adapted to the national and local circumstances. When advocating for data sharing, a data archive might consider focussing on the following aspects:
- merits for data sharing, including higher visibility for own research and increased recognition through higher citations rate;
- data from publicly funded research should be treated as an asset that should be published and reused, providing the general public with more value for money research funders have invested;
- opportunities for long-term preservation of own research data that might be important for senior researchers considering retirement,
- opportunity to inspire further research and hands-on learning for students interested in the reuse of data;
- contribution to quality, transparency and accountability of research;
- compliance to funders’, institutional, or journals’ data policies;
- address researchers’ fear of misinterpretation, misuse, and fear of losing control over data,
- address practical constraints (lack of time, resources) that researchers perceive as hindrances to proper data management.
It might be beneficial to study the arguments against data sharing that are typical in the local social sciences research community. It may help in the dialogue with researchers discussing the issues that hinder data sharing and trying to figure out what solutions could be, and what support and services the data archive can offer.
Advocate for data sharing: why and how do we do it?
Open Data Excuse Bingo in ADP
According to the experience of ADP in Slovenia, the arguments against data sharing are usually quite typical. Therefore, it is a good idea to study them in advance before getting in contact with researchers, and be ready to discuss these with them. You can use the list of typical arguments to create "Open Data Excuse Bingo” (e.g., http://data.dev8d.org/devbingo/ - University of Southampton n.d.) and cross out each argument from the list that you hear during the discussion. ADP often used it when working with researchers in a group.
With a growing variety of repositories for digital content, some research communities may have developed data sharing routines through generalist rather than domain repositories. Compared to domain repositories, generalist repositories usually require less time and effort before data can be published. It might be worth advocating for the benefits of using a trusted social sciences domain repository for data publishing. Data and documentation quality controls, curation and preservation processes facilitate that data will be findable and reusable when it is published in a trusted social sciences data repository. It may also reduce risks of data being misused or misinterpreted, thus potentially reducing risks for researchers’ reputation.
Even if researchers are generally positive about data sharing, there still might be issues to negotiate regarding publishing a particular data set or study, for example, access levels, embargos, data delivery formats and documentation.
Some issues are difficult to predict before data are submitted, but some are reasonable to negotiate before data submission. Issues not resolved in negotiation can be addressed later in the data review and appraisal phase.
Legal and ethical aspects
Data archives have different policies and technical capacities of what data can be accepted and distributed. There might be cases when a data archive cannot accept data for distribution for secondary use because of ethical or legal aspects. Therefore, legal and ethical grounds for preservation and publishing research data should be discussed with the researchers depositing data.
Additional scrutiny and discussions with data producers could be good to have, if there is reason to believe that
- data might include some copyrighted content that the archive does not have the right to publish (for example, mass media content might be subject to copyright; social media platforms may have rules on how information published there can be re-published),
- data might include (sensitive) personal data.
Data access levels
In addition to directly downloadable data with unrestricted access, there might be several levels of restricted access. Data access levels should be defined in the Acquisition policy. There are, however, cases when it is important to negotiate the most appropriate level of access for the particular data set ensuring that data are as open as possible, and as protected as necessary. Offering support and consulting could be important here.
Researchers might want to embargo access to the whole dataset or part of it. There might be different reasons for that, and it is important to discuss them.
- In some cases, embargos on the dataset or part of it can be legally required, and researchers and data archives are obliged to apply them.
- There might be cases researchers would like to protect data for publishing reasons. In such a case, it might be reasonable to try to persuade researchers to choose a data access level that would allow researchers some control that potential data users would not intend to duplicate the ongoing research.
Some CESSDA SPs have a maximum time limit for an embargo: for example, data depositors in ADP are offered a maximum 6-month embargo, in DANS it is 2 years and in FORS it should not exceed 3 years.
Data archives should in general aim to make data accessible as soon as possible and describe policy on embargos in Acquisition policy.
Formats and documentation
Archives generally have a policy with the requirements for file formats and what documentation is needed (Acquisition policy). This information is often published on the data archive’s website, but in some cases, especially in large studies with extensive documentation, it makes sense to discuss what is needed and what are the best formats for submission.
3.1.4 Support and consult
Advocating for data research negotiating data deposit usually is combined with support and consultations regarding RDM and data archive services. It can have a more general RDM capacity building focus, or you could focus more on a particular potential data deposit.
RDM support and training
Data archives can offer RDM support and training. It is an important activity data archives can do to help researchers develop high-quality datasets.
Support and training might include an RDM guide with national specifics on the data archive’s webpage or recorded video tutorials.
Data archives might hold regular events for particular target groups (PhD students, researchers having acquired research grants, specific research communities, etc.). These usually include
- informing on the benefits of data repositories; accepted data types & formats and metadata profiles
- advocating for benefits of data archiving and RDM, and
- offer guidance, tutoring on how to prepare data to make it shareable (technical, legal).
- CESSDA Training Calendar: https://www.cessda.eu/Events?training=true (CESSDA n.d. [Accessed July 30, 2022b])
- CESSDA Data Management Expert Guide: https://dmeg.cessda.eu (CESSDA Training Team 2017-2022)
Consulting on data archive services and conditions
Archives can offer consultations on data archive services and conditions, like licences and data access conditions.