4.3 Metadata

See also the main source of this section for more information: CESSDA Data Management Expert Guide, Chapter 2 - Documentation and metadata (CESSDA Training Team 2017-2022). 

4.3.1 Relevance of metadata

Metadata describe available data resources of an archive to facilitate searching for and cataloguing of data. Metadata offer a structured and systematic overview of the data resources and contain e.g., information on the author(s) or keywords on the data. Open and unrestricted access to metadata is essential for effective data use and re-use. To this end, metadata in general are published under the public domain dedication (CC0 1.0 Universal) and may thus be freely and openly accessed and used by the public.

4.3.2 Vocabularies (CMM, DDI, etc.)

It is essential to make data FAIR (Findable, Accessible, Interoperable, Re-Usable). The increasing availability of online resources means that data need to be created with long-term storage in mind. Providing other researchers with access to research data facilitates knowledge discovery and improves research transparency. In ingest, staff prepares the metadata that make the data FAIR. By following the FAIR principles, research data become more available to other researchers. The description of the deposited data follows international standards set by the Data Documentation Initiative (DDI n.d.) and the Consortium of European Social Science Data Archives (CESSDA n.d. [Accessed July 30, 2022a]). For several metadata entries, established “controlled vocabularies” of standardised items are used. This standardisation allows for comparability of the metadata in the entire data holdings of the repository and also for comparability within the international CESSDA data catalogue. Therefore, using the controlled vocabularies should be mandatory for all publications. See more on “How does an archive make data available?” in Chapter 1.6 of the Data Archiving Guide.

The following metadata items are mandatory for harvesting for the CESSDA Data Catalogue (CDC): study title, person/institution reference, study number, publisher (name of actual CESSDA service provider), publication year (ISO 8601), abstract, topic classification (CESSDA 2022), keyword (use European Language Social Science Thesaurus - ELSST preferably - CESSDA And Service Providers 2021), type of time method (DDI), study area country (ISO 3166-2), language (of the data; ISO 639-2 codes), date of data collection (ISO 8601), type of data source (DDI), kind of data (DDI), unit of analysis type (DDI), universe, time method (DDI), sampling procedure (DDI), method of data collection (DDI), type of research instrument (DDI) (CESSDA n.d. [Accessed July 30, 2022c]). The User Guide for the CESSDA Metadata Model (CMM) gives very detailed information on relevant aspects of the elements and how to use it (Storviken et al. 2019).

You should check languages that are used for describing the metadata in your data repository. Does your repository support only  a widely used language, like English? Does your repository additionally apply the national language or is there more than one national language? E.g., GESIS (n.d. [Accessed July 30, 2022a]), the German Social Science Data Archive offers the metadata of the archived datasets in two languages, in German and in English. ADP (2017), the Slovenian Social Science Data Archive offers English and Slovenian metadata in their catalogue. AUSSDA, the Austrian Social Science Data Archive, describes metadata exclusively in English to date.

Some archives provide datasets in two languages to reach a broader audience. In the Aila Data Service of the Finnish Social Science Data Archive e.g.,  you can freely search and browse study descriptions of archived data in Finnish and in English. The variables of many quantitative datasets have already been translated and are available in English. FSD (n.d. [Accessed July 30, 2022b.]) also offers to provide an English translation of a dataset on request.

There can be more information that is necessary to add to complete the data entry in your archive. You should ask if there are citations to related publications for inclusion in the catalogue metadata.

4.3.3 Be clear about rules for free text fields that apply in your archive solution

For data description, there are many possibilities to enter free text in common metadata fields. For consistency reasons, it is essential that the ingest staff choose certain notations. Here are some examples:

  • Capital letters (Example: ‘Educational opportunities’; ‘Educational Opportunities’ or ‘educational opportunities’ as Keyword)
  • British English vs. American English (‘Organisations’ or ‘Organizations’)
  • Abbreviations 
  • Full names of institutions (Example: ‘University of Linz’ or ‘Johannes Kepler University Linz’)
  • Links to related publications need persistent identifier, a PID (e.g., DOI or URN, but not URL)
  • Be clear about what are mandatory fields, and which are optional.
  • Which CMM version does your repository applies to?
  • Do you have a limit on how many items (e.g., in Topic Classification or in Keywords) the depositor can choose? 

4.3.4 Checks on metadata in ingest

  • Did you receive all relevant metadata information? Are all mandatory fields filled in? Do you have more information on optional metadata fields?
  • Metadata fields should not contain any separators (e.g., like dots or commas as thousands separators), depending on the tool you use for dissemination so that easy harvesting of your entered metadata is assured.
  • Are the metadata fields filled in correctly, according to CESSDA CMM and DDI standards?
  • For topics and keywords; adhere to the CESSDA Topic Classification (CESSDA 2022) and CESSDA ELSST Thesaurus (CESSDA And Service Providers 2021). Is there a maximum of topics the depositors can pick? 
  • Give the researchers assistance in choosing the right term, neither too broad nor too narrow.