Linked data and the future of aggregation
Today, we live in a world with great expectations on digital access to information. Adapting to those expectations brings on a strong need for increased availability and quality of digital cultural information online. According to the latest report from Enumerate about digitisation, digital access and preservation at the cultural heritage institutions, most of the participating institutions in the survey have a rich mix of cultural heritage materials. On average 23% of the heritage collections has been digitally reproduced. 32% of the digitally reproduced and born-digital heritage collections at the institutions are available online for general use. Obviously, there is a lot more to be done. Which are the main obstacles in making cultural heritage information available and usable online?
Cultural heritage information today is mainly described according to traditional methods that have been created over a long time. The digital era has given us tools to show the institutions’ collections in a way that eliminates physical limitations and therefore the material/information can reach a wider audience than ever before. New technology has also given heritage institutions opportunities to create digital stories and engage users in the digital cultural arena. Still, the new technical possibilities are sometimes seen as limitations rather than opportunities, mainly because they are challenging professionals to think differently and change their traditional ways of describing information. Technology is only an instrument for adding value – the real challenge is not to use technology for its own sake, as a fashionable add on, but as a way to create something completely new.
Metadata is structured data used for identification and description, and for facilitating access. Metadata refers to the information that describes an object, ”data about data”. There are currently several different types of descriptive metadata standards in use for managing digital cultural heritage information.
Objects and documents that from the beginning was part of the same collection or held at the same institution could today be fragmented across several types of organisations or parts of an organisation. Description of those objects and documents would then have been based on the traditions (for example, for registration of objects and cataloguing of documents) that were current at the time the different collections were created. This means that today those old traditions are still defining how information is being “translated” in the digital format.
Taking a map as an example, it can be described in very different ways depending on which cultural heritage domain it is described in. Sometimes the descriptions are similar, but expressed in different ways, but sometimes it is actually complementary information. A map is described as an object in the museum systems, with a description made in associated metadata standards; while , as an archival object, it would be described with entirely different descriptions and under other procedures and standards than in a museum system.
What does this mean? At the museum you could probably find a lot of information about, for example, a specific drawing or a certain aquarelle painting technique used for producing this specific map. At the archive, you could instead learn a lot about the context of this very map, revealing not so much about the document itself as at the museum, but for example about the people and the organisations that were involved in creating this map, and why it was created, who has owned it and so on. At the library, you could see if it also was a part of a hand-drawn atlas, and if it is possibly related to published reproductions. And the person, who made the map, is he defined as an artist, land surveyor, cartographer, illustrator or a records creator?
This means that it is quite difficult today to find out about all the aspects of one map in only one search- It takes quite a lot of navigation between different online portals and websites to get there. The situation is further complicated by the fact that different descriptions of information are not harmonised between different domains, so the same word, for example the term “provenance”, has different meanings across different domains. This makes it complicated for the user not knowing about all the models of the descriptions. Putting all those descriptions together in a meaningful way is the key issue for making qualitative cultural heritage information available to be used and reused easily between different domains. In order to achieve this, there is a general need at the cultural heritage institutions for common technical support for implementation of established international metadata standards and for making links between different data sets.
Today, cultural heritage institutions invest a lot of time and resources to get over those obstacles by developing processes of aggregation. When talking about digital cultural heritage, the word aggregation often refers to the transfer of collections of metadata from different institutions to different kinds of web portals, or for making metadata available through APIs. The processes of aggregation are connected to aggregators. Linked Heritage defines aggregator as follows: In the context of digital cultural heritage and particularly in the context of Europeana, aggregators are gathering material from individual organisations, standardising formats and metadata into a common model, and channelling them into Europeana according to the Europeana guidelines and procedures. There are different kinds of aggregators; country specific (national or regional, cross-domain or domain specific), project aggregators and independent organisations.
The services the aggregators provide are based on different ”mappings” (translations) between different metadata models, between those used at the institutions and the ones needed for delivering the information to the aggregators and their services and online portals. Today it requires a lot of time and resources to develop and manage the mappings of cultural heritage information, which is under constant development. Every single change at the institution, as for example a transfer to an updated version of the collection management system or a change of the metadata model of the national aggregator brings with it additional costs.
This also means that at the end of the aggregation processes, information between different institutions or domains is often being harmonised at the very basic common level, by making connections only between those parts of the description that really are comparable between different institutions. This makes it easier for the user to find information from different domains in one search only. However, that still leaves those users that are searching for additional information, or asking more advanced questions, with the trouble of having to navigate through different digital sources.
Aggregation digital cultural heritage – current situation bottlenecks Image: Elco van Staveren CC BY-SA 2.0
The future of these issues seems to have become much brighter since the development of the semantic web, and big data technology is opening up new possibilities for making information qualitative, linkable and usable in a lot of different ways. Will the role of the aggregators in the future be to link together all information instead, thus creating added value in terms of quality and complementary information?
Nevertheless, there are still issues to be solved before semantic technologies can be widely implemented at an institutional level. Interlinking with other data sets is done on a quite basic level today, and there is a lot more to be done before we can take real advantage of the semantic web. There is still a need to standardise the descriptions for enabling models for linking the information that is described on completely different levels. A promising model that would allow cultural heritage information from different domains to be linked and expressed in high quality, is CIDOC CRM, an ISO standard. The latest version was published in 2015, including a CRM RDFS schema. There is also already a lot of work done to harmonise CIDOC CRM with metadata standard schemas, resulting in FRBR00 (harmonisation with library material) and CIDOC CRM Sci (scientific observations). This sort of standardisation helps to create automated (or semi-automated) linking processes.
Linked data Image: Elco van Staveren CC BY-SA 2.0
However, when one talks about linked open data, it also means that data needs to be open. Here, Europeana has done an excellent work pushing forward CC0-licences for metadata. That is the main strength of the EDM model – making it possible for developers and users to freely create applications based on the material from Europeana. Neverthless, when searching for additional, or more advanced information, developers are left with the trouble of navigation through different materials, most of it not linkable and with unclear licences. Institutions that are willing to provide more metadata descriptions to Europeana as CC0 are often unable to do that. Partly it is because of current aggregation processes and partly because the EDM model is established as an ontology on a basic common level, not adapted to manage all the additional information based on non-harmonised different data models coming from different institutions and aggregators.
Therefore, it would be very interesting to see the possibilities of connecting EDM and CIDOC CRM-RDFS models, thus enabling institutions to provide as much information as linkable open data as they want to. Attempts to see connections between those models were done, for example in this Graphical representation of the harmonised EDM-CRM models. This could be a pragmatic way of evolution to a future in which the institutions themselves are, step by step, changing from collection databases to linked data. This would be a way of improving interoperability of cultural heritage data at the same time as making it linkable. Of course, that would be a long term perspective. In the near future, this could be done at the aggregators’ level in combination with common tools and services for making those processes easier.
What does all this means for the future of aggregation? Today, metadata still needs to be aggregated. It takes time to adapt to linked open data and it requires new kind of expertise at the institutions, and implementation or development of new tools and services.
Working towards an evolving method of creating linked open data would certainly be a way to create usable data across domain boundaries, to link together all information, thus creating added value in terms of quality and complementary information.
Hopefully, this will be a future step for the aggregators.