How to open up data, improve metadata and create linked open data – tools online

Open data, as defined by European Data Portal, is data that anyone can access, use and share. Linked open data means that data is not only open so anyone can use it, but also published in a machine-readable format, and linked to other datasets. Open data can be used in a lot of different, innovative and unexpected ways. However, we also need to understand users of data and what is important to think of when publishing data online so users are able to search, find, and potentially develop services and products out of data. Which training materials, tools and services can be used to publish data as open or linked open data?

Opening up data

DCAT-AP is a cataloguing specification used as a common vocabulary for describing public sector datasets in Europe. European Union Open Data Portal  shows open data that has been made available in DCAT-AP and provides API for developers, information and tools. To learn more about open data there is also a training material “Open Data Support” available from EU Commission, In order to facilitate the implementation of DCAT-AP, a list of tools such as validators, harvesters and exporters of DCAT-AP metadata is published here, with tools such as open source DCAT-AP validator.

"Metadata is a love note to the future", Flickr, by cea+, Licens: CC-BY 2.0.

”Metadata is a love note to the future”, Flickr, by cea+, Licens: CC-BY 2.0.

Improving metadata

Access to more open data is a first requirement in creating a better understanding of the collections and helping people to engage in different ways, by searching, using, re-using or developing products. However, data do not speak for itself, and there is a need to describe information so it can be interpreted in a correct and proper way and accordingly increase the quality and usefulness.

Structuring of cultural heritage information is done according to metadata standards and formats. Metadata is ”data about data”, data that provides the information about digital collections and enable us to structure information to be as qualitative, identifiable and usable as possible. There are different types of metadata standards used by cultural heritage institutions, such as technical, descriptive, conceptual, and preservation standards. Those standards can be specifically developed for a certain kind of information or compatible between different domains.

Sometime, we need to re-structure metadata in order to adapt to an international standard. Often, this work is done when digital information is to be published online or in connection to a migration of a information management system. This process often requires a lot of resources, as well as technological development at the institutions. How can those processes be made more effective with use of open online resources? Which tools can be used to improve metadata and make it as usable for anyone as possible?

There are currently several open source tools for re-structuring metadata from one to another metadata standard or format. When aggregating metadata for publishing in a European or international online portal, metadata needs to be re-definied, often through national common aggregation services. So in order to publish metadata in Europeana you are probably using a national aggregation service or a tool developed in a domain specific project connected to Europeana. Through those services and tools you need to create metadata element mappings between two different metadata schema. At Europeana there is currently work towards possibilities for institutions to improve this process by publishing metadata directly on Europeana with no need for intermediary services and a pilot study has been done to look at the effects of this new kind of aggregation.

To support the structuring of metadata according to other formats and standards, today there are also open source tools developed, that are available at Github. These tools often also contains functionalities for making it possible to create linked open data.

Mapping Tools

3M is a mapping tool with open source, available on Github. With 3M it is possible to re-define data from databases and other associated contextual information to other schedules. Fields or elements from a database (source node) are mapped to one or more units described in source schema so that data from an entire system can be transformed.

B2SHARE is a service that has been developed within the framework of EUDAT project (https://www.eudat.eu/) and which aims to support the visibility and searchability of information stored digitally. B2SHARE include mapping functions adapted to international standards, as well as persistent identifiers. It is possible to manage information during the mapping process or afterwards. Service can be accessed online.

MINT is a mapping function which has been developed in European projects Athena, Linked Heritage and Athena Plus. It supports harvesting and mapping metadata from content providers to LIDO format and transformation of items to Europeana Data Model (EDM).

Europeana Connection Kit (ECK) is a tool developed in a framework of Europeana Inside project which identifies existing workflows, standards and tools that can simplify the aggregation process for institutions.

Tools for Linked Open Data

The semantic web can simplify the processes that make information searchable and usable. There are currently several initiatives relating to publish data as open data, and to do it in a machine-readable format (eg RDF), with open licences and linked to other data sources. Linked open data creates many more opportunities to use and re-use cultural heritage collections. To learn more about linked open data there is a training material ”Using Linked Data Effectively” from EUCLID-project, even downloadable as an e-book.

OpenRefine, is an open source tool that and has been used in a different projects on linked open data within the cultural heritage sector. It can also be used to “clean” metadata, transform it from one format into another; and extend it with web services and external data.

Karma is a tool with open source software that allows users to quickly and easily integrate data from various data sources, including databases, spreadsheets, delimited text files, XML, JSON, KML and Web APIs. Users can integrate information by modelling it according to ontology, with automated processes. The tool can also automatically generate an ontology model which users can then adjust, and then publish as RDF and/or store in a database.

The Data Tank is an open source tool available at Github, With Data Tank it is possible to transform datasets into an HTTP API and describe them with DCAT-AP.

Recogito is an open source tool, developed as an initiative of Pelagios Commons. It is currently available for beta-testing, and scheduled to official opening in December 2016. Recogito includes open data and linked data annotation functionality.

All those common resources, services and tools make process more efficient: need for technological development at the institutions is reduced while information managers get greater expertise in issues concerning standards and management of digital cultural heritage information. Today, digital collections are managed mostly in the domain-specific standards connected to information management system in use. Therefore, use of common tools can be an advantage in managing different kind of information. Both materials specific to an institution (such as documents at the archives) and not (for example, objects that are part of the archival holdings, linked to documents) can be managed through the use of the most appropriate standard, while interoperability of these standards can be implemented using the same tool.

Some tools are more developed than others and there is probably need for adjustments to adapt those tools to specific needs of cultural heritage institutions, but they are a valuable basis for collaboration between several institutions!

Sanja Halling