Enhancing CIDOC-CRM and compatible models with the concept of multiple interpretation

Modelling cultural heritage and archaeological objects is used as much for management as for research purposes. To ensure the sustainable benefit of digital data, models benefit from taking the data specificities of historical and archaeological domains into account. Starting from a conceptual model tailored to storing these specificities, we present, in this paper, an extended mapping to CIDOC-CRM and its compatible models. Offering an ideal framework to structure and highlight the best modelling practices, these ontologies are essentially dedicated to storing semantic data which provides information about cultural heritage objects. Based on this standard, our proposal focuses on multiple interpretation and sequential reality.


INTRODUCTION
Modelling cultural heritage (CH) and archaeological data is a research topic shared by a broad scientific community.Although the question of CH information modelling has been extensively studied, we believe that some aspects still have to be tackled.In that respect, we wish to point out two key points; the modelling of all available data about a given item, including hypothetical or refuted data, and the management of the entire lifecycle of an item, with the changes which affected, or will affect it.This means taking into consideration not only its past states, but also its current and future states (like treatment, predictive modelling or restoration, for example).This approach requires digitally preserving all kinds of scientific information relative to cultural heritage objects, in the broadest sense of the term.This preservation is intended to be linked with facts and arguments on which scientific information is constructed.One of the consequences of this method is the increase in the number of indexed data.Index-linking facilitates subsequent data reuse for the creation of new proposals.All these concepts were integrated in a model that we have developed and presented in 2014 (Van Ruymbeke et al., 2015) and that we will call from now the Multiple Interpretation Data Model (MIDM).
In this paper, we pursue the research with a mapping extension proposal to CIDOC-CRM (Le Boeuf et al., 2017) and its compatible models.We will explain below how the existing CRM classes, properties and explored paths can almost entirely cover the notions of the MIDM.We will also explain that some additions are necessary to complete the coverage.Particularly, we will put forward two trails to fulfil necessary additions.Finally, we will draw future research perspective.

ARCHAEOLOGICAL DATA SPECIFICATION
Although cultural heritage management and archaeology share some common study subjects, their goals are completely different.Where cultural heritage management acts to preserve: "places of cultural significances" (The Burra Charters, 2013, p. * Corresponding author 1), archaeology excavates to unearth past information.Even if the two sciences register overlapping data, the target of archaeological study is the past, where cultural heritage management turns towards present and future generations.

Archaeological objects and archaeological views
As Dean Saitta (Saitta, 2014) recently resumed: : «on the one hand, archaeology is a rigorous search for truth about the ancient past.On the other, it is a political dialogue with the present.».These short sentences, and the lines following them in the Saitta paper enlighten two important facts: -Archaeological objects are information carriers: to reach the past reality that they tirelessly track down, archaeologists use all data and all peer-approved methods that are available to them.The data that they use are not only observations and analyses made during archaeological excavations, but also archaeological objects, cultural heritage buildings, etc.Beyond their intrinsic value and the architectural or artistic merits, they are also appreciated for the information that they carry.This information is of great interest for historians and archaeologists: it teaches them about the past context of the object, and thanks to it, events, people and culture are gradually revealed to present-day researchers.To reach this goal, it is therefore imperative that data, even the most insignificant, be searched for, gathered and preserved.
-Archaeological views are interpretations: Most archaeological views, speeches and papers are the result of an interpretive reasoning that includes the subjectivity of the author.Moreover, this reasoning is undeniably impacted by the context of its creation.Scientific information expressed by a researcher is the result of many influences such as social, political, economic and ideological, not to mention epistemological backgrounds.

Imperfection of archaeological data
Even if it is well known that archaeological data are incomplete, imprecise, fuzzy, uncertain and sometimes contradictory (Desjardin et al., 2012), this imperfection poses growing problems in the digital era and more specifically in GIS implementation: as Jeffrey Stuart says (Stuart, 2014): «A further significant focus for Archaeological Informatics is the representation of uncertainty.This is still considered a challenge more generally in informatics, but has particular implications for systems holding cultural heritage information.Many aspects of cultural heritage defy precise definition, geographically, temporally, and culturally, and even where the subject matter is amenable to some form of precise definition, there is often a lack of certainty due to incomplete evidence or competing interpretation.».

Multiple Interpretation Data Model back ground
Originally intended to semantically enrich a 3D scan of a hundred-year-old city mock-up, and to interact with digitalized figurative and literal data (engravings, old maps, archives, bibliography, …) this conceptual model has evolved over the years (Billen et al., 2012;Pfeiffer et al., 2013;Van Ruymbeke et al., 2014, 2012, 2008).The proposed version (Figure 1) was designed in 2014.In this model, the historical reality and information about it were clearly separated into different classes.Indeed, the Life Map class was dedicated to storing all information relative to the Historical Object class.This distinction allowed for the separation of historical reality from hypotheses describing it.
In addition, Life Map was able to gather all available information even if it was contradictory or refuted.Episode, Version, and Event classes stored and managed various perfect and imperfect information relating to a historical object's state (the imperfection was mainly geometrical, chronological, or semantical ambiguity or incompleteness).Lastly, the Interpretative Sequence class ensured the organization of episodes into different ordered paths, following the different scientific hypotheses.

A necessary mapping
Considering the scientific and technological emergence of semantic web and ontological standards, it appeared necessary to transform our conceptual model into an RDF ontology (RDF -Semantic Web Standards).It also seemed reasonable to join cultural heritage standards whilst joining a scientists and users' community.

Which standard?
To reach these goals, considering CIDOC-CRM and its compatible models was obvious.In fact, recent papers (Ronzino, 2015;Ronzino et al., 2016a) showed just how prevalent the CIDOC -CRM is in the cultural heritage and semantics domain.
In addition, we identified in this model the flexibility and the richness necessary to map in our model.Moreover, its compatible models offer a wide range of interesting extensions.

Advantages of CIDOC-CRM and Compatible models
CIDOC-CRM (Le Boeuf et al., 2017) is an ontology developed more than twenty years ago.First dedicated to homogenizing museum inventory databases, it expanded and became an international standard for cultural heritage in 2006 (ISO 21127:2006).Enriched by several extensions, it now concerns not only the cultural heritage domain and its semantics but also related activities.
The CRMinf model provides the ability to link semantic proposals with the steps (observation, inference making, belief adoption) of reasoning leading up to them.A very recent paper proposes using events to express reliability with coefficients (Niccolucci and Hermon, 2016).
To easily link CIDOC-CRM to GEOSPARQL CRMgeo, an ontology integrating spatiotemporal properties of CIDOC-CRM items, proposed to separate real world classes (called phenomenal classes) from information classes (called declarative classes).(Hiebel et al., 2016).This distinction between the real word and the world described by information concerns time and geometry dimensions only.
CIDOC-CRM and its compatible model ensure the modelling of various streams of information.It has been designed to "accommodate alternative opinions and incomplete information" (Le Boeuf et al., 2017).In that goal, most properties are quantified as optional and repeatable for their domain and range ("many to many (0,n:0,n)").However, other cardinalities may be used and some CIDOC-CRM or compatible models properties are very constrained, notably in CRMarchaeo or CRMba.

Core mapping
The core mapping (Figure 2) uses existing CIDOC CRM and compatible models classes, properties, and paths to encompass most of the concepts of the MIDM.It constitutes the backbone of the two proposals described hereafter.It also relies on "object's identity" concept understood as: "the property intrinsic to each object which allows it to be differentiated from all others » (Billen and Hallot, 2016).

The formal language:
This mapping and the two extension proposals adopt the formal language and the naming conventions applied in CIDOC-CRM (Le Boeuf et al., 2017) and compatible models: Classes are identified by numbers preceded by letters.They are named using nominal groups.The letters used are as follows: "E" for CIDOC-CRM Classes, "S" for CRMsci classes, "I" for CRMinf classes, "A" for CRMarchaeo classes, "B" for CRMba classes, and "SP" for CRMgeo classes.Properties are also identified by numbers preceded by letters.Unlike classes, they are named using verbal phrases.The letters used are as follows: "P" for CIDOC-CRM properties, "O" for CRMsci properties, "J" for CRMinf properties, "AP" for CRMarchaeo properties, "BP" for CRMba properties, and "Q" for CRMgeo properties.

Historical Object:
The main class of the MIDM was defined as follows: "a consistent group of elements belonging to the same body from its emergence until its disappearance.The body in question can be an architectural body, a professional corporate body, a human body, etc." (Van Ruymbeke et al., 2015).We assume that this could correspond to an S15 Observable Entity, phrased in CRMsci in these terms: "This class comprises instances of E2 temporal Entity or E77 persistent Item, i.e.: items or phenomena that can be observed, either directly by human sensory impression, or enhanced with tools and measurement devices, such as physical things, their behaviour, states and interactions or events."(Doerr et al., 2017b).This definition and the hierarchical place of class S15 include a wide range of classes: Built work (Doerr et al., 2017a;Ronzino, 2016;Ronzino et al., 2016b), man-made object (Le Goff et al., 2014;Marlet et al., 2015), Stratigraphic Unit (Doerr et al., 2017a) but also Actor and person and all classes that descend from the Conceptual Object class (Le Boeuf et al., 2017).In this hierarchy, it is important to emphasize that all classes that descend from E92 Spacetime Volume (subclasses of E4 Period and E18 Physical Thing) occupy (properties Q1 and Q2, cardinality many to one, necessary (1,1:0,n)) a Phenomenal Spacetime Volume (Hiebel et al., 2015).This class has a temporal and spatial projection (properties Q3 and Q4, cardinality one to one (1,1:1,1)) which can be described by instances of declarative spatial or temporal classes (Hiebel et al., 2015).

Version:
S16 State, Sub-class of E2 Temporal Entity is described in CRMsci as follows: "This class comprises the persistence of a particular value range of the properties of a particular thing or things over a time-span."(Doerr et al., 2017b).We assume that it encompasses, partially, the MIDM Version class.In other words, we see S16 State as a phenomenal Version, that is to say a step in the spatial and functional evolution of an item.Admittedly, S16 State is not subclass of E92.As a result, it doesn't occupy a SP1 Phenomenal Spacetime Volume.However, one can think that if the item that comprises the state is itself a subclass of E92 Spacetime Volume, and thus occupies or has occupied a SP1 Spacetime Volume, spatial values that occurred during the time span of the state constitute spatial projection of the latter.

CRMinf paths, belief and reliability
Assured by CRMinf paths (Stead et al., 2015a(Stead et al., , 2015b)), the link with the Source class explored in the MIDM is deeply enriched.Thanks to this model, the entire development of an argumentation can be detailed.It allows for complete traceability, which also includes the formulators of a hypothesis.Moreover, a recent paper suggests the possibility of adding an index of reliability (Niccolucci and Hermon, 2016).Phenomenal and declarative classes created in CRMGeo for extents in space, time, and space-time separate the real world and the world described by information.Thanks to this, declarative classes store multiple, imperfect and conflicting data when unique reality remains in phenomenal or general classes.Indeed, "in the real world, exact spatiotemporal properties of phenomena [Periods (E4) or Physical Things (E18)] cannot be known due to factors such as fuzzy boundaries of the phenomena and errors in measurements.Nevertheless, the spatiotemporal properties exist and CRMgeo introduces them as Phenomenal Spacetime Volume (SP1), Phenomenal Place (SP2) and Phenomenal Time Span (SP13) as subclasses of Spacetime Volume (E92), Place (E53) and Time Span (E52)."(Hiebel et al., 2016).Unlike Spatiotemporal properties of phenomena which are hard to perceive in the real world, their semantic properties can be more easily discerned by contemporaneous observers.But most of the phenomena described in CIDOC-CRM occurred in the past.Consequently, our knowledge of their properties depends on historical and archaeological sources.Dedicated to storing semantic contents (covered by the Function class in the MIDM), CIDOC-CRM and compatible models can describe in detail information about real phenomena by use of properties or paths.These properties and paths, however, provide no distinction between reality and the information depicting it.In the Multiple Interpretation Data Model, this difference was expressed by the cardinality (1, n) between Historical Objects and Interpretative Sequences because we assumed that it is important to specify whether we model reality or information about it.Reality is supposed to be unique and true, information can be varied, fuzzy and uncertain.

Reality is sequential
One final aspect of the MIDM is not yet present in CIDOC-CRM and compatible models: the sequence of events.Just as constructed works can be divided into morphological building sections (Ronzino, 2016;Ronzino et al., 2016aRonzino et al., , 2016b)), we assume that all phenomena (for example a building life cycle) can be divided into different moments corresponding to the succession of its different states.We assume that such a succession occurs in reality and must of course be the subject of historical and archaeological hypotheses.Even if we can model different states, different events and different properties in CIDOC-CRM and compatible models, there is no class for sequences as such.There are two key advantages to having a specific class for sequences: the possibility to discretize reality into smaller entities, and consequently the possibility of linking information to it.

Multiplicity management
With the current state of CIDOC-CRM and compatible ontologies, multiple instances of semantic information regarding a given reality can be stored in two ways: by just keeping the most recent one and therefore losing the versioning (Bruseker et al., 2015;Stead et al., 2015b) or by adding information layers.This addition of layers is nothing but an accumulation.
In the archaeological domain, research subjects stretch over the long term and produce a huge amount of data.It is thus necessary to organise this data.This organization would ensure data reliability evaluation, semantic indexation, linking with sources and arguments and so on.Thanks to it, researchers would easily be able to find previous information and recycle it into new reasoning.

Objectives of proposed extensions
The present propositions aim at separating reality from positions held about it, breaking down reality into event sequences, allowing several different models of a given event or sequence, ensuring documented versioning of knowledge, and enabling links between positions targeting a given item to create new working hypotheses or new arguments.It is expressed as extension proposals added on top of CIDOC-CRM and its compatible models classes and properties.
To differentiate between reality and the discourse held about it, and to model interpretative sequences, we propose to follow the track built by Hiebel, Doerr and Eide (Hiebel et al., 2015) and to add (proposal n°1), or identify (proposal n°2) declarative classes to model functional (or semantic) parts of information.In both cases, a new class is also proposed for sequential aspects of phenomena.In the state of our research, we are investigating several extension possibilities.Two of them are described below.For simplicity, the extension proposals have been called: "Multiple Interpretation Data Ontology Proposals".New classes are identified by numbers preceded by the letter M. New properties are identified by numbers preceded by the letters MP.

Extension proposal A (Figure 3)
The first proposal consists in creating five classes and four properties, namely (M1, M2…).Most of them were conceived for semantic modelling.This aspect corresponds to the MIDM "Function" notion which is here epitomized by M1.

The M1 class, Semantic Dimension
, comprises all the semantic contents of a material or immaterial phenomenon.These semantic contents may be explicit or implicit, known or unknown, unique or multiple.It can be seen as all the real facts of which a phenomenon is comprised.
To take an example, the semantic content of the event: "the murder of Caesar" would include all real facts and real persons implicated in the event: the exact location and date, the murderers, the witnesses, the weapon, Caesar's last sentence, the fatal issues and so on.M1 is a superclass of S15 Observable Entity.It can be understood as the semantic equivalent of E92 Spacetime Volume.It gathers all significant contents of entities and activities constituting a complex entity.

4.3.2
The M2 class, Phenomenal Semantic Contents, represents the global contents carried by a phenomenon during its existence.This class corresponds to the real semantic contents of an instance.In historical and archaeological domains, it is impossible to describe these contents in their entirety.At the very least, one can approximate them by way of hypothetical discourses.An instance of M2 could be, for example, one of the functions carried out by a built work, or the symbol it represents, or one of its owners.

4.3.3
The M3 class, Declarative Semantics Contents, includes all information describing the semantic dimension of an object.We propose to use this class to store hypotheses relative to an item or its evolution.Historical and archaeological discourses could find their place in this class.Like declarative classes in CRMgeo, M3 Declarative Semantics is a subclass of E89 Propositional Object.It is also a subclass of M1 Semantics Dimension.

4.3.4
The M4 class, Semantics expression, includes all means of expressing the contents of M3 Declarative Semantics.Indeed, declarative semantics contents are most often expressed in the form of text, but they could also be instances of ontology relations.Like SP5 geometric Place Expression, SP12 Spacetime Volume Expression and SP14 Time Expression, it is a subclass of E73 information Object.We propose to make it a subclass of E62 String and I4 Proposition Set (As Stead showed, (2014) an instance of a CIDOC relation can be an I4 Proposition Set.We propose to add this I4 ancestry to expressive classes of CRMgeo.This hierarchical dependence is of importance for the "source" paths exposed above.

4.3.5
The M5 class, Sequence, is the new class for a sequence of events constituting a phenomenon in the real world.It is built by one or more instances of S16 state.M5 Sequence is the range of property MP4 "constitutes" whose domain is S16 State.

4.3.6
The MP1 property; M2 Phenomenal Semantics is the range of property MP1 "carries (is carried by)" whose domain is S15 Observable Entity.This property can be seen as equivalent to CRMgeo properties Q1 and Q2 "occupied".Considering the character of the state of MP1, we conjugate it at the present time (Le Boeuf et al., 2017).Q1 and Q2 are quantified: many to one, necessary (1,1:0, n).We assume that this should not be the same for MP1: each phenomenon could have an unlimited quantity of semantic contents, but must have at least one.We would quantify this property as many to many, necessary (1,n:0,n).

The MP2 property;
M3 is the domain of MP2 property "approximates" whose range is an M1 Semantics dimension.As is the case with Q11, Q12 and Q13 (Hiebel et al., 2015), this property approximates a semantic dimension.It does not state the quality or accuracy of the approximation, but states the intention to approximate the semantic dimension 4.3.8The MP3 property; M4 is the domain of property MP3 "defines Semantics Contents".Like Q10, Q14, and Q16, it associates an instance of M4 with an instance of M3 Declarative Semantics content whose contents it defines and "syntactic variants or use of different scripts may result in multiple instances of M4 defining exactly the same M3" (Hiebel et al., 2015).

MIDM Version, Interpretative Sequence and Life
Map classes: As shown in Figure 3, Property P67 "refers to" and its sub properties, specifically P129 "is about", link E89 to E1. E89 is the superclass of M3 Declarative Semantic Content.
Instances of these properties whose domain is a declarative class and range a S16 state constitute the MIDM Versions of an item.If the range is an instance of M5 Sequence, instances of P67 or P129 constitute MIDM Interpretative Sequences.All instances of P67 and P129 referring to a same S15 Observable entity constitute the MIDM Life Map of a Historical Object (=S15).

Extension proposal B (Figure 4)
The second proposal dramatically simplifies the first one.It works on the assumption that a semantic dimension is intrinsically included in the CRM classes and therefore in S15 Observable Entity and its subclasses.Consequently, we suggest considering E1 Entity as equivalent to the MIDM Function class.
From this point of view, to distinguish reality and information, we assume that E89 propositional object can be seen as the semantics declarative class, and that I4 Proposition Set may be considered as the semantics expression class.

Distinction between reality and discourse about it:
In the two proposals described above, the instances of semantic dimension of real world and information world belong to different classes.But, as shown in Table 2, semantics are encompassed differently in each proposal.

Sequence Modelling:
With the M5 class Sequence, real world phenomena can be divided into as many states as necessary.Thanks to declarative classes (M3 or E89 depending on the proposal) targeting semantics contents, the organization of successive states into sequences can be the subject of unlimited hypotheses (M4 or I4).Such hypotheses can be seen like "Interpretative Sequences" are feasible without the addition of other classes or properties.

Multiplicity management:
Multiple semantics modelling is often missing from information systems (Bruseker et al., 2015;Guillem et al., 2015;Stead et al., 2015b).Included in our initial version, multiplicity is allowed by declarative classes and particularly by the semantics declarative class.
Indeed, if we (again) take the battle of Trafalgar example, well known by CIDOC-CRM followers, our approach adds the possibility to model, for a same E92 Spacetime Volume at the end of the battle, two declarative states for the French ship "Le Redoutable": its sinking or its capture by English navy.We could also model two different names for the ship: "Le Redoubtable" (Hiebel et al., 2016(Hiebel et al., , 2015) ) or "Le Redoutable" without "b" before "t" (Beatty, 1825).This example brings us to say that in the above proposed approaches, semantics versioning is restored.

CONCLUSIONS
Semantics and knowledge modelling are ubiquitous concepts in CIDOC-CRM and compatible models.Their latest improvements empower modellers to store several versions of information regarding geometrical and temporal data.Although CIDOC-CRM is an unconstrained model, semantic information multiplicity management remained an issue.In this paper we showed through possible extensions that distinction between real phenomenon and discourse held about it makes this management easier.We propose to extend CIDOC-CRM with the key concept of the MIDM, "Sequence" and we imagine two different options to model phenomenal and declarative semantic data.
Although going through sequential reality aids in understanding it, this concept is not present in the classes defined by CIDOC-CRM and compatible models.This is the reason why we have proposed creating a new class: M5 Sequence.In the future, this class could help to model dynamic phenomena, like contemporaneous artistic installations, for example.Proposal A isolates semantic data and separates them from the rest of CIDOC-CRM classes.Therefore, it facilitates the difference between phenomenal and declarative data.On the contrary, proposal B uses existing CRM classes.Difference between phenomenal and declarative sematic content are thus less evident, but the extension is simpler and easier to implement.Conceptual theory aside, we now have to experiment, and either validate or modify our proposals with the help of practical testing.

Figure 1 .
Figure 1.Multiple Interpretation Data Model 3.3.4Other classes:In this mapping, E5 Event encompasses the MIDM Event class.Considering class structuring in CIDOC CRM and compatible models, our Episode class which generalized that any change affecting an item can be assimilated into class E2 temporal Entity.Figure and Agent classes match with CIDOC classes E21 Person and E39 Actor.

Figure 2 .
Figure 2. Mapping of MIDM on CIDOC-CRM and compatible models

Figure 3 .
Figure 3. Extension Proposal A

Table 2 .
In this proposal, only one class and one property are created: M5 Sequence and MP4 constitutes.As shown in Figure 4, MIDM classes Life Map, Version and Interpretative Sequence are here also instances of P67 and P129.Separate worlds with semantic dimension.