3.2 Outreach and support

It is important to communicate with community (be it researchers or just Ph.D. students)

Regular contact with designated communities is important in order to keep up with their needs.  This includes both communication with data producers and potential data users.

One of the goals of pre-ingest phase is to ensure that the data archive accepts data that do meet the requirements of its Data collection and Acquisition policy and quality requirements. 

Outreach and support activities may vary depending on research data management (RDM) practices, data sharing culture, levels of formal regulation regarding data management plans (DMPs), which describe certain technical, organisational and also legal sides on data handling processes, and open access (OA) policies to research data.

Outreach activities may focus on the designated communities in general, like, organising general RDM workshops, or more targeted to a particular institution, research group, or individual researchers planning a data deposit.

Outreach activities may be more training oriented or more acquisition oriented, but usually serve both purposes. Basically, outreach and support serve two main goals: (i) to facilitate the inflow of data, and (ii) to make sure the data and metadata meet the quality requirements of the archive.

3.2.1 Identify and localise data

There are different channels and strategies for identifying and localising data of interest to the designated community of the data archive. It is important to consult the Data collection and Acquisition policy to decide what data and what channels might be most appropriate.

Systematic data inventories

Systematic inventories of data produced by professional and reputable research institutions in social sciences or certain research areas (sociology, psychology, political science etc.) might be used in order to identify and localise high quality data with a high potential for reuse.

The starting point could be a list of projects in an institution or a list of recent publications in a research area aiming to identify the data sets used by researchers.

Another possibility might be to look systematically at research funded by national research funders during the current or previous funding cycle, identifying projects in which the production of important data is planned (for example, national or regional representative surveys), and contacting the researchers. However, if there are no explicit requirements by funders that data should be published, this approach may prove not very productive.

Systematic reviews of this sort may be interesting for:

  • newly established data archives, with services that are not yet well-known or acknowledged in the research community,
  • data archives with a low inflow of data.

Identify and localise data: why and how do we do it?

The Swedish National Data Service (SND) had been keeping track of the projects financed by major national research funders in social sciences in a special project database from 2014 to 2016. Even though there were no requirements for data publishing from the funders, SND had been sending introductory data sharing advocacy information, offering support with research data management and planning for data publication. The response varied, but even though some useful contacts were established, contacts with several hundred research projects produced only a few published data sets.

In a project with a similar focus in 2010-2011, UKDS targeted high profile and well-funded projects at or near the start of their funding and tried to get good RDM in place early, in order to make final data appropriate for archiving. A description of the project can be found here (UK Data Archive n.d.).

Targeting specific data

There might be interest in the designated community for special data collections, enabling research on highly visible topics currently in the news (e.g. Covid-19 data now or data on migration-related aspects during the refugee crisis of 2015-2016 in Europe).

There are different ways of achieving it, but it is important to involve researchers in the process, as they might have good knowledge of data that are being produced and data that might be reused in further research. In case it is an interdisciplinary effort (e.g., data collection for studying climate change might include both social sciences and climate data), it might be good to cooperate with relevant research infrastructures in other scientific domains.

Rescuing data

There may be different reasons that lead to the need to rescue data. For example, research organisations can be reorganised or closed, funding for long-term projects can be terminated, or primary researchers might change jobs or retire. In such or similar situations, data are at risk of getting lost because of the lack of resources for proper curation and preservation. Usually, there are regulations in place on how research material should be archived in such situations. However, institutional archiving practices seldom focus on the preservation of research data and even less on actively promoting the reuse of data.

It is very important that unique, high value data that are impossible to reproduce are preserved. Such rescue actions are often time sensitive. It is important to work with the researchers or data producers before the project or institution has been closed down and people who can provide all the contextual information that is needed for data to be reusable have moved on. Therefore it is important for data archive to have visibility and contacts in the research community to be able to reach out in time, should there be such a need.

Rescuing data: Carmichael Watson Project

The case of the Carmichael Watson Project (2009-2013) at the Centre for Research Collections (CRC) illustrates how technological obsolescence and security issues have led to the urgent need to capture and preserve a web resource full of rich and unique archival material. As a result of security issues with underlying infrastructure, the resource was taken off the live web in 2018. Then in 2020, due to the obsolescence of the Virtual Machine (VM) RedHat5 technology where it was internally hosted, the resource became at imminent risk of loss. This series of challenges highlights the special vulnerability of web-based technologies to rapid change and evolution.

In response to these challenges, the CRC has undertaken a web archiving approach using OS pywb-based tools (archiveweb.page and replayweb.page). The CRC aims to maintain access to an authentic (as possible) version of the web resource through the more sustainable formats generated by these tools. The implementation of this web archiving approach, supported by further CRC investment, has (i) mitigated the imminent risk of losing this valuable resource, and (ii) provided a precedent for maintaining access to these types of web-based materials without the need to resource the maintenance of out-dated systems and infrastructure.  

The CRC in collaboration with freelance web archivist Anisa Hawes have begun web archiving this resource. So far, around 4K unique URLs have been captured, but only a fraction of the full archival resource has been completed. A more comprehensive case study to present the experience of the Carmichael Watson Project website is planned to be completed by the end of 2021 or the beginning of 2022.  A detailed description of the current situation is available at this link (The University of Edinburgh n.d.).

 

Cooperation with journals

Preservation of data linked to publications encourages validation and reproducibility of research, therefore a data archive might consider targeting data linked to publications with high impact or journals with high impact factor.

Besides, more and more journals are requiring data availability statements from authors, and encouraging data publication in repositories, so data archives should consider offering service of preservation of data linked to publications.

Cooperation with national journals by supporting them in developing open data policies and providing data archiving services, can be especially rewarding in raising general awareness of benefits of data archiving.

Cooperation with journals: why and how do we do it?

Implementing the RDA Research Data Policy Framework in Slovenian Scientific Journals

ADP implemented a pilot project introducing RDA guidelines for journals data policies to four national journals in different disciplines in 2019 (Štebe et al. 2020). The project is described in detail here: https://datascience.codata.org/articles/10.5334/dsj-2020-049/

The workflow for the preservation and publishing of data linked to publications in academic journals might have special characteristics. For example, there might be some additional metadata elements required for the study description (linked data, linked publications), or some additional roles, like data replication specialist or data manager from a journal. Management of journal data is currently developing rapidly.

Following data requests

If there are topics or specific data sets that data users are frequently inquiring about, it might make sense to see if data can be acquired and preserved in the archive for future use. If there are some reasons it is not possible because of the sensitive character of the data or any other reason, or researchers are not interested, there is not much to be done. If, however, there is a reason to expect that researchers of the original study are not aware of the services the data archive can offer and are willing to invest time and effort to publish their data, following data user requests might be beneficial.

3.1.2 Advocate for data sharing

Even though data sharing culture is developing along the OA policy framework, there still might be a need to advocate for data sharing to facilitate higher inflow into a data archive.

Besides advocating the benefits of sharing research data on the archive web pages, participation in events like national conferences and contacting research institutions might be important too.

Arguments should be adapted to the national and local circumstances. When advocating for data sharing, a data archive might consider focussing on the following aspects:

  • merits for data sharing, including higher visibility for own research and increased recognition through higher citation rate;
  • data from publicly funded research should be treated as an asset that should be published and reused;
  • opportunities for long-term preservation of their own research data might be important for senior researchers considering retirement;
  • opportunity to inspire further research and hands-on learning for students interested in the reuse of data;
  • contribution to quality, transparency and accountability of research;
  • compliance to funders’, institutional, or journals’ data policies;
  • address researchers’ fear of misinterpretation, misuse, and fear of losing control over data; and/or
  • address practical constraints (lack of time and resources) that researchers perceive as hindrances to proper data management.

It might be beneficial to study the arguments against data sharing that are typical in the local social sciences research community. The dialogue with researchers discussing the issues that hinder data sharing, might help in trying to figure out what solutions could be, and what support and services the data archive can offer.

Advocate for data sharing: why and how do we do it?

Open Data Excuse Bingo in ADP

Based on the experience of ADP in Slovenia, the arguments against data sharing are usually quite typical. Therefore, it is a good idea to study them in advance before getting in contact with researchers, and be ready to discuss these issues with them. You can use the list of typical arguments to create "Open Data Excuse Bingo” (e.g.,  http://data.dev8d.org/devbingo/ - University of Southampton n.d.) and cross out each argument from the list that you hear during the discussion. ADP often used it when working with researchers in a group.

With a growing variety of repositories for digital content, some research communities may have developed data sharing routines through generalist rather than domain-specific repositories. Compared to domain repositories, generalist repositories usually require less time and effort before data can be published. It might be worth advocating for the benefits of using a trusted social sciences domain repository for data publishing. Data and documentation quality controls, curation and preservation processes ensure that data will be findable and reusable when it is published in a trusted social sciences data repository. Proper documentation reduces the risk of data being misused or misinterpreted, thus potentially reducing risks to researchers’ reputation.

3.2.3 Negotiate data sharing

Even if researchers are generally positive about data sharing, there still might be issues to negotiate regarding publishing a particular data set or a study, for example, access levels, embargos, data delivery formats and documentation.

Some issues are difficult to predict before data are submitted, but some are reasonable to negotiate before data submission. Issues not resolved in negotiation in advance can be addressed later in the data review and appraisal phase.

Legal and ethical aspects

Data archives have different policies and technical capacities of what data can be accepted and distributed. There might be cases when a data archive cannot accept data for distribution for secondary use because of ethical or legal aspects. Therefore, legal and ethical grounds for preservation and publishing of research data should be discussed with the researcher depositing data.

Additional scrutiny and discussions with data producers could be good to have, if there is reason to believe that:

  • data might include copyrighted content that the archive does not have the right to publish (for example, mass media content might be subject to copyright; social media platforms may have rules on how information published there can be re-published),
  • data might include (sensitive) personal data where permission was not obtained to publish the data.

Data access levels

In addition to directly downloadable data with unrestricted access, there might be several levels of restricted access. Data access levels should be defined in the Acquisition policy. There are, however, cases when it is important to negotiate the most appropriate level of access for the particular data set ensuring that data are as open as possible, and as protected as necessary. Offering support and consulting could be important here.

Embargos

Researchers might want to embargo access to the whole dataset or part of it. There might be different reasons for that, and it is important to discuss them.

  • In some cases, embargos on the dataset or part of it can be legally required, and researchers and data archives are obliged to apply them.
  • There might be cases researchers would like to protect data for publishing reasons. In such a case, it might be reasonable to try to persuade researchers to choose a data access level that would allow researchers some control that potential data users would not intend to duplicate the ongoing research.

Repositories, such as those maintained by CESSDA SPs, might have a maximum time limit for an embargo (for example, data depositors in ADP are offered a maximum 6-month embargo; in DANS it is 2 years; in FORS it should not exceed 3 years from the day of publication).

Data archives should in general aim to make data accessible as soon as possible and describe policy on embargos in Acquisition policy.

Formats and documentation

Archives generally have a policy with the requirements for file formats and what documentation is needed (Acquisition policy). This information is often published on the data archive’s website, but in some cases, especially in large studies with extensive documentation, it makes sense to discuss what is needed and what are the best formats for submission.

3.2.4 Support and consult

Advocating for data research and negotiating data deposits is usually combined with support and consultations regarding RDM and data archive services. It can have a more general RDM capacity building focus, or you could focus more on a particular potential data deposit.

RDM support and training

Data archives can offer RDM support and training. This is an important activity that can help researchers develop high-quality datasets.

Support and training might include an RDM guide with national specifics on the data archive’s webpage or recorded video tutorials.

Data archives might hold regular events for particular target groups (PhD students, researchers having acquired research grants, specific research communities, etc.). These usually include:

  • informing on the benefits of data repositories; accepted data types & formats and metadata profiles
  • advocating for the benefits of data archiving and RDM, and
  • offer guidance and tutoring on how to prepare data to make it shareable (e.g. FAIR principles, technical aspects, legal considerations).

Examples:

Consulting on data archive services and conditions

Archives can offer consultations on data archive services and conditions, like licences and data access conditions.