Archival Information + CIDOC CRM = true?

b2ap3_thumbnail_Skarmavbild-2016-05-12-kl.-17.37.52.png

Today there is a strong need for increased quality in the metadata descriptions of digital cultural heritage information, harmonisation of information between different domains and making collections available as machine-readable and linked data. Linked open data, in combination with accepted international standards and support for those standards, is a first step towards increased use of qualitative cultural heritage data and the possibility of the interconnection of different data sets.

Digisam has been participating in a project coordinated by the National Archives of Sweden, involving, among others, British Museum as a partner. The aim of the project was to examine whether the harmonisation of archival information and CIDOC CRM is possible, what the conditions for making data interoperable with this model look like when applied to data from archives and museums, and how those processes could be facilitated by a support service. The project has primarily tested the service 3M (Mapping Memory Manager),  developed by FORTH.

The aim of the project was also to discuss various alternatives for the design of persistent identifiers (PID), the unique code strings for identification of digital objects. A workshop on persistent identifiers was organised to identify and discuss current routines and systems in the LAM sector.

Archives & Collections

Challenges we faced in describing archival information with the CIDOC CRM RDF model  included defining the role of “archives creator”, the description of “volume” and object-based descriptions.

In CIDOC-CRM  a person or organisation can be assigned different roles. An information object can be created by a ”creator”. But creator in the archival context does not necessarily need to be the same person as the one who created the information. A person or organisation can, in the role of records- or archive creator, receive information created by others. This means that in the archival context there is a difference between ”archive” and ”collection”. A collection is a selection of items collected on the basis of a specific theme or choice, which could form an archive, but do not necessarily need to. A collection requires creators, but may have been acquired in several independent collectors activities.

Another challenge is the concept of ”volume” which goes back to a time when the archives were usually paper documents, which means that the volume represented both the physical object (cardboard box) and the logical object (information content of the documents in the box). This complicates the use of CIDOC-CRM as it requires a clear distinction between the physical and the logical nature (for example for updates) – though in the real archive world, much of the information on a volume is related to either the physical or the logical description of the object.

On the other hand, additional information could be added during the adaptation to a more object-oriented description. A letter might have a completely different value for researchers and users, and would be interesting as more than a document. It may have to do with, for example, specific material, special ink, etc.

The results of the mappings between archival data and CIDOC-CRM indicates that today there are challenges with regard to the specific requirements for the description of archival information. While there is great potential in the ability to link information descriptions, it has also been obvious that an initiative to harmonisation of the descriptions should be taken on a more general level.

Linking archival information and museum information

Given the challenges that we met when mapping archival information to the CIDOC CRM, we decided to test how it would be to create links between archival and museum information using the CIDOC CRM model. After a few searches in the material we decided to make a test with some photographs by the photographer Victor Lundgren. In the collection system of the museum Murberget, we found a lot of photographs by Viktor Lundgren, including some showing horses.

b2ap3_thumbnail_Skarmavbild-2016-05-12-kl.-17.37.52.png

Photo: Viktor Lundgren CC BY

The photographs were described at the object level, and it was quite straightforward to do a mapping of the information that was included.

b2ap3_thumbnail_Skarmavbild-2016-05-12-kl.-17.55.38.png

To find a basic common level based on the metadata that was available in both the museum system and the archival system, and with using CIDOC CRM as a starting point, we drew up the following model that includes information from both the archival information system and the collection system:

bild

In the archival system NAD (National Archive Database)  we found a photography collection by Janrik Bromé where we got a hit on the same photographer, Viktor Lundgren, however not as a photographer but as a subject in a photograph, probably a self-portrait that he has made to be sent out as Christmas cards, with the text on the back of the card: ”Merry Christmas and a Happy New Year! Best wishes Viktor Lundgr ”(the rest is missing).

viktor

The image below shows the result of the mapping on the basis of archival information, a graphic representation of the hierarchical structures, expressed through relationships in the CIDOC CRM.

b2ap3_thumbnail_visualisering2.png

When we found out the basic information about Viktor Lundgren we could easily find much more information about him in NAD, including church records (birth records, parish book and county judicial archives (probate). We could even find information about Lundgren as a writer, and information (and authority file) on him as a writer in National Library database, Libris  and VIAF (Virtual International Authority File)

However, our search for photographer Viktor Lundgren in the national photographer register available in web platform for authority files, Kulturnav  did not give any match, even if there was an authority record of him as photographer embedded in a metadata record from Sundsvall museum . In Kulturnav it is possible to cooperate on the authority lists, and in our working group the question came up on how to add a single authority record in the national photographer database. We made contact with Kulturnav/Nordic Museum and they published an authority record for a photographer Viktor Lundgren  in the register, so we could link to it. Now the interesting question about identifiers came up. Generally, authority files should not be duplicated, but here there are two different authority lists, one list for writers pointing to Lundgren in his role as a writer, and the other one, the national photographer register, pointing at his role as photographer. There is no doubt that the authority post about Lundgren should be a part of both lists, regarding his different roles, but is there a need for two separate persistent identifiers (with “SameAs”-connection) or should an identifier from Libris/VIAF be re-used? Technically, there are two ways to go, and we are looking forward to deal with this question in our future work.

Concerning other authority files (for example terms like ‘photographer’) we used TMP2 (ThesaurusManagement Platform),  a web platform to collaborate on and to publish thesaurus and authority files. There, we could link information in metadata with terms like ”photographer”,  ”Professional photography” , ”Black-and-white photograph,”  to name a few.

Results

Regarding the interoperability between archives, museums and information that could be harmonised by use of CIDOC CRM, there are both opportunities and challenges. Results of the mappings between archival data and CIDOC-CRM RDF show that there are challenges with regard to the specific requirements in description of archival information. Based on current limitations, it is primarily about the difficulties in the description of the material itself because the information is not mapped on the same level, but also in finding a way to express some specific terms, as for example “archival volume”.

Today, in order to link information between different metadata models, the focus is on the linking information with authority files. There is also a great potential in the possibility to link information by creating interoperability between data models, which was what we explored with the help of CIDOC CRM in the tests carried out. It is also clear that a comprehensive initiative should be taken on a more general level. In the library domain, similar issues have been handled to overcome similar challenges and adjustments have been made on a global level in cooperation with ICOM / CIDOC by developing adaptations of CIDOC for library materials, including the authority of the data; FRBR, FRAD and FRSAD models and FRBRoo. This means that the library and museum data today have a common conceptual model for the description of the information.

Do you have personal experience of the linking of information from archives and museums? Have you been working on harmonisation of these data models? We are grateful for your comments and views on the project, either directly here on the blog or by email to sanja.halling@riksarkivet.se (note: deadline for feedback is May 23).

Lina Marklund and Sanja Halling

Kommentar (5)

  • Vladimir Alexiev| 17 maj, 2016

    Suggestions on the mapping diagram http://digisam.se/images/modell_viktor.png:
    – don’t use E62_String but directly a literal
    – Represent the title as P102 has title E35_Title, not as P1_is_identified_by E41_Appellation
    – P67_refers_to is wrong. You must use a production event:
    P108i_was_produced_by .
    a E12_Production; P14_carried_out_by .
    – P62_depicts E20_Biological_Object is wrong. You should use E20_Biological_Object only for a specific individual (eg a specific horse), but for a generic concept use P62_depicts E55_Type
    – P62_depicts E44_Place_Appellation is wrong. That would mean something like a photo of a name plate, and not even that since a name plate is a physical object, while
    – Use global thesauri wherever possible. Eg:
    P2_has_type aat:300046300 ”photograph”
    P62_depicts aat:300250148 ”horse”
    P62_depicts tgn:1234567-place ”Some specific place”
    — Using the Kulturnav thesaurus is ok, if it’s well estbalished in Sweden.
    — Using the Athenaplus TMS2 thesaurus is probably not a good idea, since it’s not widely used and the URLs are not well designed.

    About EAD->CRM mapping: the question is far from decided, because:
    1. EAD is not very semantic at all. Eg holds a bunch of event & person info, all in a free text field.
    2. It’s not obvious how much of EAD to represent as RDF.
    Eg see ”The Semantic Mapping of Archival Metadata to the CIDOC CRM Ontology” (Journal of Archival Organization, 9:174–207, 2011), which proposes to use 6 parallel hierarchies of CRM nodes to represent the EAD levels hierarchy (from fonds to item). That is completely impractical, not to mention there are mistakes.

  • Sanja Halling| 24 maj, 2016

    Thank you for your comment, it is a very valuable input for us, and for our further work. Your mapping suggestions are very helpful and we will discuss them in our working group. We also find the question about design of the URIs and persistent identifiers very important, and have also published a checklist on that: http://digisam.se/index.php/hem/entry/a-checklist-for-persistent-identifiers 
    Please feel free to send us also your suggestions on this issue. 
    Best regards, Sanja

  • Lise Summers| 12 juni, 2016

    Hi Sanja

    A really interesting discussion, and one which has had me chasing up CIDOC CRM and descriptive crosswalks all day. One of the things you don’t mention is which archival standards you use? Vladimir mentions EAD, but if you are describing a person you also need to look at EAC, and should really go back to the international standards produced by the International Council on Archives – ISAD(G) for the object description and ISAAR-CPF for the creator/custodian.

    One of the problems for me is that you are describing the object in your photograph out of context – archival description starts (and sometimes stops) at the highest level e.g. fonds or series. Looking at CIDOC-CRM, I can see that there is the E78 Collection definition, which would equate with the holdings or collection of an archival institution, but no real way to create a subset of E78, such as fonds, records group or series? Maybe E41 Apellation? or E70 – thing?

    I think your volume is two things – one is the measurement of the size of the fonds, collection or series, or is a measurement of the individual object; the other is a physical object type.

    It’s been a fascinating exercise, and one that I can see will continue to engage me. I’d love to see some more detail of the work you are engaged in.

  • Lise Summers| 13 juni, 2016

    Further to my previous post, you may be interested in this outline of the work being done by the International Council on Archives to develop an integrated ontology (EGAD) for archival description – http://www.girona.cat/web/ica2014/ponents/textos/id56.pdf

  • Sanja Halling| 13 juni, 2016

    Hi Lise, and thanks a lot for your input, it is very useful for us!

    Regarding mapping to archival information particularly, we were looking at both EAD and EAC (for person/organisation records), and even raw data from the relational database. We faced challenges in defining relationships between different information levels in archival data, which are basically more dependent on their context and therefore can not always be understood equally well if context is not described first. We are going to publish some more information about the overall results of the project.

    However, what we were trying to examine in this part of the project was the possibility of connecting domain specific data models and CIDOC CRM. Basic idea was to see if a common extensible ontology could enable institutions to provide as much information as linkable open data as they want to. Basic common models for cultural heritage metadata (build on lowest common level) often needs to be further developed later on, and extensibility of CIDOC CRM could perhaps be a pragmatic way of evolution to a future in which the institutions themselves are, step by step, changing from collection databases to linked data, with a common ontology. This could also be a prospective of improvement interoperability of cultural heritage data at the same time as making it linkable. Those issues are also connected to our previous blog post on linked data and future of aggregation: http://digisam.se/index.php/hem/entry/linked-data-and-the-future-of-aggregation

    Thank you also for the update about the work being done by the International Council on Archives to develop an integrated ontology (EGAD) for archival description, it will be very interesting to follow!

    Best regards,

    Sanja