Towards Limiting Semantic Data Loss In 4D Urban Data Semantic Graph Generation

: To enrich urban digital twins and better understand city evolution, the integration of heterogeneous, spatio-temporal data has become a large area of research in the enrichment of 3D and 4D (3D + Time) semantic city models. These models, which can represent the 3D geospatial data of a city and their evolving semantic relations, may require data-driven integration approaches to provide temporal and concurrent views of the urban landscape. However, data integration often requires the transformation or conversion of data into a single shared data format, which can be prone to semantic data loss. To combat this, this paper proposes a model-centric ontology-based data integration approach towards limiting semantic data loss in 4D semantic urban data transformations to semantic graph formats. By integrating the underlying conceptual models of urban data standards, a uniﬁed spatio-temporal data model can be created as a network of ontologies. Transformation tools can use this model to map datasets to interoperable semantic graph formats of 4D city models. This paper will ﬁrstly illustrate how this approach facilitates the integration of rich 3D geospatial, spatio-temporal urban data and semantic web standards with a focus on limiting semantic data loss. Secondly, this paper will demonstrate how semantic graphs based on these models can be implemented for spatial and temporal queries toward 4D semantic city model enrichment.


INTRODUCTION
The continuing amelioration of semantic 3D city models has provided powerful tools for comprehending the dynamics of the constantly evolving urban landscape.Data-driven approaches such as the construction of virtual environments such as digitaltwins (Batty, 2018, Julin et al., 2018, Schrotter and Hürzeler, 2020) can help visualize and simulate city events and dynamics both real and imaginary, over time (Biljecki et al., 2015, Chaturvedi and Kolbe, 2019, Jaillot et al., 2020, Samuel et al., 2020).
Currently, the Open Geospatial Consortium's (OGC) 11CityGML2 open data model is often used to facilitate the storage and exchange of virtual 3D city models.In this model, all city objects can be represented in up to five different, well-defined levels-of-detail (LOD0 to LOD4) with increasing accuracy but structural complexity (e.g., a building is composed of walls, roofs, etc.) (Kolbe, 2009).It defines the three-dimensional geometry, topology, semantics, and appearance of the most relevant topographic objects in the urban context (Gröger and Plümer, 2012).
The CityGML standard is still evolving (Kutzner et al., 2020) to represent all the complex data requirements of 4D (3D + Time) semantic city models (Chaturvedi and Kolbe, 2019) and may need enrichment through data integration of other heterogeneous data sources to achieve more complete or detailed views of the urban landscape.To facilitate integration, the use of ontologies has been promoted as they provide flexible, machineprocessable formalizations of data models as semantic graphs (Claramunt, 2020, Psyllidis, 2015).Ontologies allow then to create semantic urban graphs based on urban data which may require transformation or conversion of data towards a single common data format.However, transformation approaches are naturally prone to semantic data loss and thus limiting this data loss poses a principal integration challenge.
Another challenge -within the context of multisource temporal urban data integration -is the inconsistent use of identifiers of city objects (Chaturvedi et al., 2017).Different snapshots of a city from different sources may use different identifiers for the same city objects between concurrent and successive representations of that city.Recently, graph-based change detection approaches (Jaillot et al., 2020, Nguyen andKolbe, 2020) have been proposed to calculate which city objects are being referenced across different city snapshots, even when there are geometric, or semantic differences between the same object in different temporal representations of the city.
As ontologies and graph formats have proven beneficial towards the semantic enrichment of interoperable multidimensional city models, this paper proposes a model-centric ontology-based data integration approach towards limiting semantic data loss in 4D semantic urban data transformations to semantic graph formats.This contribution consists of several aspects: 2. A data transformation approach for the generation of versioned CityGML 3.0 datasets as interoperable semantic RDF graphs.3. Implementation of these graphs in a spatio-temporal triple store to facilitate geospatial and temporal queries on realworld city data.
The remainder of this paper is structured as follows: Section 2 will describe related works in multidimensional semantic city models and spatio-temporal urban graph implementations; Section 3 will detail the proposed integration approach; Section 4 is dedicated to the implementation of the approach on a realworld city dataset and to discuss its strengths and limitations; Section 5 will conclude the proposed contribution, discuss future works on this research, and provide resources for reproducing the work.

RELATED WORKS
Although research in geometric and semantic data integration of CityGML with Building Information Model (BIM) standards like Industry Foundation Classes (IFC)4 has had much evolution (Deng et al., 2016), temporal 4D semantic city model integration remains a large area of research.As previously stated, many existing standards lack rich, native temporal support, and generally require one of two types of integration: either a data model is extended to integrate temporal data formats or multiple data models are integrated as a single, spatio-temporal data model.This data integration can also take place at two levels: data model integration at the conceptual, physical, or implementation level and integration at the model instance or dataset level.Taking this into consideration, some existing works propose approaches to facilitate this integration for enriching 3D semantic city models.

4D Semantic Urban Data Model Extension
Extending a 3D data model requires starting with a strong, semantic data model.Since its adoption by the OGC in 2012, CityGML has become a commonly used data model standard for this purpose as it is geospatial, easily extensible through Application Domain Extensions (ADEs), and provides a modular semantic conceptual model for various domains of the urban landscape.In the case of CityGML based 4D city models, the conceptual model is traditionally extended to allow better temporal data support (Chaturvedi et al., 2017).Some recent works allow the representation of concurrent and successive versions of city models with graph-based versioning frameworks (Samuel et al., 2020) while others allow for time series and time-dependent data representations (such as internet of things sensor data) (Biljecki et al., 2018).
These extensions are usually created manually and may be time-consuming to generate, verify, and update.Due to the utility demonstrated by these proposed models, the next version of CityGML 3.0 standard will incorporate parts of these works as new Versioning and Dynamizer modules.

Ontology-Based 4D Urban Data Integration
When creating a spatio-temporal data model from two or more sources, often a common data format is used to facilitate the integration.For this purpose, much research and official standards have been made for implementing 3D and 4D geospatial data using semantic web technologies, such as RDF (Resource Description Framework) and OWL (Web Ontology Language) (Bonduel et al., 2019, Brink et al., 2014, Lemmens et al., 2016, Métral and Falquet, 2018, Nuninger et al., 2020, Psyllidis, 2015).These proposals use machine-readable description logics to create ontologies for semantic modeling and integration.
Furthermore, network of ontology-based integration approaches for 4D semantic data integration have also been proposed (Hor et al., 2018, Tran et al., 2020, Psyllidis, 2015).These approaches are modular as each ontology integrated into the network can describe a single domain of information or data model.Additionally, extensions to RDF's query language, SPARQL, such as GeoSPARQL5 and stSPARQL6 can be integrated to support queries of 2D geometries and 2D geometric and topological relations, with stSPARQL also supporting temporal queries on graph data.
Works have also been proposed for the automatic generation of geospatial and spatio-temporal semantic ontologies from urban data.Some of these approaches use mapping transformations of conceptual urban data models as UML (Unified Modeling Language) to OWL (Brink et al., 2014, De Paepe et al., 2017).These approaches implement their own mappings or propose using the standard ISO 19150-27 transformation rules for geospatial data.Other approaches utilize mappings from physical urban data models as XML Schema to OWL (Vinasco-Alvarez et al., 2020, Vinasco-Alvarez et al., 2021, Usmani et al., 2020).These approaches all incur some amount of semantic loss during transformation.In general, UML to OWL transformations may have more well-defined mappings as both modeling languages use similar concepts like classes and properties.However, even the ISO 19150-2 standard can be ambiguous (OGC, 2017) and XML Schema may provide better definitions of spatial semantics (Brink et al., 2014).

Summary
In summary, most of these proposals demonstrate effective approaches towards integrating 4D semantic city data for 4D semantic city model enrichment.Implementing strong data standards in these models and preserving their interoperability is key during integration, which requires minimizing semantic loss during transformation.Currently, the upcoming version of CityGML 3.0 is promising to be one of the most popular data models in this context.This paper proposes a model-centric integration approach structured around CityGML 3.0, in its latest draft, and other spatio-temporal data standards.This approach will incorporate network of ontologies integration and automated ontology generation approaches to limit semantic data loss of the original conceptual data models.It will also demonstrate how this model can be used to generate interoperable 4D semantic graphs.

MODEL-CENTRIC DATA INTEGRATION FOR LIMITING SEMANTIC DATA LOSS
This section will define key concepts and the problem, introduce the proposed approach, and demonstrate its implementation.

Problem Definition
Within 4D semantic city models, the entities concerned are often referred to as city objects.These objects can represent buildings, roads, landmarks, and many other urban entities, and are composed of 3 types of data: • Geospatial: the object's 2D and/or 3D geometric representation, combined with its terrestrial location, often defined by coordinate reference systems.• Temporal: the period when the object exists, represented by two points in time referencing the objects creation and destruction.
• Semantic: data that can describe, categorize, or provide context to the object and its components or their relationships to other objects.
Integration of geospatial and temporal data are the most straightforward as the values that represent these data are real numbers and timestamps.Their integration may require conversion but is only prone to a loss of numeric precision (e.g.loss of geometric or geospatial precision during the conversion of coordinate reference systems (Seeger, 2005)).On the contrary, integrating semantic urban data -either as data models or data -can be more challenging.
Semantic integration at the conceptual level can be done by combining or mapping concepts from 2 or more conceptual models.For this, each conceptual model must be represented by the same modeling language to facilitate integration.Since different modeling languages have varying amounts of expressivity it is possible that a concept may not be able to be represented in its entirety or must be represented differently after transformation.This implies a loss of interoperability and data based on the new model may also share this loss.In this paper, these results are referred to as semantic loss.Once the conceptual models are represented in a homogeneous modeling language, equivalent concepts can be related to one another to integrate the models.Once this conceptual integration is complete, integrated 4D semantic datasets can be generated based on this model to enrich 4D semantic city models.
The following section will introduce the proposed integration methodology for limiting semantic loss in 4D semantic urban data integration through 4D semantic graphs.

4D Semantic Model Integration
For the integration of different 4D semantic urban conceptual models, the approach uses a network of ontologies as a semantic model.To this end the integrated model will be constructed from several existing conceptual models: • CityGML 3.0 UML model's most recent draft, in particular the Building and Versioning modules for their urban and spatio-temporal semantics.• GML 3.2 model represented through the ISO 19107 standard and GeoSPARQL's GML 3.2 ontology to support CityGML's geometric and geospatial semantics.• GeoSPARQL and stRDF ontologies to provide a spatiotemporal query framework for semantic graph interaction.
These models were chosen for their synergy as a network of ontology integration approach.Most of these models have can be implemented to create OWL ontologies from geospatial UML models.However, implementing this tool for this purpose has been tested by the OGC according to the Testbed 12 ShapeChange Engineering Report (OGC, 2017) and several ambiguities in the ISO 19150-2 standard have been identified which affect the CityGML 3.0 UML model to OWL transformation.The most notable of these being the ambiguous mapping of unions to OWL, which affects CityGML elements such as the union core:cityModelMember.This union groups several important properties of CityGML 3.0 city models, that reference city objects, versions, and version transitions of the model.
To overcome this ambiguity, the Testbed 12 report proposes 3 ShapeChange configurations, of which the "flattening" approach is used for transforming unions: For each UML property or attribute A -that has a union as its value type -a new property is created for each property option B of the union.The name of the new property is constructed by merging the names of A and B with a union separator as follows: The results of this mapping are shown in figure 2. While this ap-  It is also important that the relationships between UML classes and their properties are preserved in the generated ontology.This is ensured by configuring ShapeChange with three mapping transformation rules: 1. Universal quantifications are used in the definition of classes to refer to any properties they may have.2. Local naming conventions are used to distinguish between different properties of the same name.3. The domain and range of properties are always declared.
This results in bidirectional references between classes and properties which is useful for automatic semantic graph generation and helps avoid inconsistencies when validating the ontology.
Finally, ShapeChange is configured to map any geometric references to their equivalent in the GeoSPARQL and ISO standard ontologies.Any other configurations follow the OGC Testbed 12 recommendations, enabling ISO 19150-2 rules whenever possible.The final output of the transformation results in 17 linked ontologies, one for each module of the CityGML 3.0 Conceptual Model. Figure 3 exemplifies a mapping between the UML representation of a version and its representation in OWL 2. Next, links are made manually between the ISO 19107 and GeoSPARQL ontologies.Once validated, this is considered as a 4D semantic urban data network of ontologies.

Graph Generation of 4D Semantic City Models
In this section, the second part of the proposed methodology is demonstrated: how to generate 4D semantic datasets from the integrated network of ontologies.To do this, the class and properties assertions (or the TBox) of the ontology network are queried to generate 4D urban data graphs as instances of these assertions (or the ABox).First, these queries must answer the following questions of an atomic datum within a 4D urban dataset: • Which Class, Property, or Datatype in the network defines the datum?• Does the datum represent geometry?
• Does the datum have temporal properties?SPARQL queries can be implemented to answer these questions.For simplification, a city model as a CityGML 3.0 XML document is considered to be a set of nodes, where each node may have parents and/or children.Determining if a node can be a class or property instance is done using the node's XML tag and searching if a class or property assertion of the same identifier is in the network.Query 1 shows how this can be done for verifying if a node is an object property.This query uses a node's identifier "ntype" and its parent's identifier, "ptype", to find properties with "ptype" in their domain (line 5) or universal quantifications with "ntype" asserted by class restrictions (line 8).A regex function is used to filter these properties with local naming conventions (line 15).Once these types of queries are defined, they can be used to map city model datasets to the ABox of the ontology network as shown in algorithm 1.
Query ] . 14: } 15: FILTER regex(STR(?property), "ptype.* .ntype$")16: } Algorithm 1 looks through all nodes of a given city model in Line 1.For all nodes which are defined by class assertions (line 2) a triple will be created according to if the children of the node (line 4) are defined by a class assertion (line 5), an objectProperty assertion (line 8), a datatypeProperty assertions (line 14), or a datatype assertion (line 21).All of these cases will add the triple (lines 11, 16, 19, 23) to a graph to be returned in line 28.for gc in c.children do During graph generation of CityGML datasets, datatypes from the GeoSPARQL standard are used to represent geometric and geospatial data.To determine when an XML node represents geometry in GML 3.2, GeoSPARQL and stRDF's rules for declaring geo:gmlLiteral values can be implemented according to the definition:

Algorithm 1 Semantic Graph Generation
Valid geo:gmlLiterals are formed by encoding geometry information as a valid element from the GML schema ... in GML 3.2.1 this is every element directly or indirectly indirectly in the substitution group of the element {http://www.opengis.net/ont/gml/3.2}Abstract-Geometry. (OGC, 2012) By this definition if an XML node is an instance of a subclass of geo:AbstractGeometry in the ontology network, the XML node represents geometry and its 2D geometric representation can be stored in a triple in the graph.The value of the triple being a copy of the XML data of the original node and its descendant nodes encoded as a literal string.To represent temporal data, xsd:dateTime values are used to represent a point in time.In CityGML 3.0 every city object can use its core:creationDate and core:terminationDate properties as temporal points to representing the beginning and end of its existence.

Methodology Implementation
Under the Urban Data Services and Visualization Project (UD- apply this methodology to generate 4D semantic graphs as RDF from CityGML 3.0 data using the RDFLib13 Python library.As shown in figure 4, the methodology proposed in section 3.2 and 3.3 is illustrated.An additional preprocessing step is necessary before graph generation to prepare GML data for use with Geo-SPARQL's 2D queries.This step can add coordinate-referencesystem declarations if necessary and flattens 3D GML geometry to 2D.During generation, gml:id attributes are used as triple identifiers whenever possible.This also applies to Xlinks which are used to link CityGML 3.0's versioning data -such as a city model's version to the corresponding city objects of that version.Afterward, these 4D semantic urban graphs were loaded into a Parliament14 triple store to implement spatio-temporal queries through GeoSPARQL.
The following section will detail how UD-Graph and this methodology are applied to an initial 4D semantic urban dataset.

Resulting Dataset
The 4D semantic urban datasets that are used in these experiments originate from the 1st district of the city of Lyon, France15 as CityGML 2.0.The building and geospatial data from these datasets are converted to CityGML 3.0 with a CityGML 2 to 3 open-source conversion tool16 and then enriched with temporal versioning data and stored on the CityGML 3.0 Encoding Github17 .Using the proposed transformation methodology, an initial 4D semantic city model dataset was created.Figure 5 illustrates a historical succession of 2 versions of a city model containing 4 buildings.A version transition between these two versions is composed of 3 transactions (or changes): a building deletion (T ransaction 1), a building replacement (T ransaction 2), and a building insertion (T ransaction 3). Figure 6 shows urban data classes and class instances generated from the dataset according to the CityGML building and core ontologies (highlighted blue in figure 1) and the GML geospatial ontologies (highlighted red in figure 1), visualized by OntoGraph18 .
Query 2 returns all the buildings that have a physical existence intersecting a given point of time.Lines 5-7,9-11 define the creation and termination dates of a building, i.e., the time period of its physical existence.Lines 8 and 12 filter the buildings that exist during the temporal point specified by the user (lines 3,4).
In this example, the query looks for buildings that existed on 01/01/2010.Also, note that the properties in the query use local naming conventions as mentioned in section 3.2.1.
Query 2 Find all buildings that existed at a temporal point  ?cityobjectmember a ?type .7: }

Discussion
This initial approach integrates standardized conceptual models of spatio-temporal and urban data to create integrated 4D semantic datasets as a first step toward 4D semantic city model enrichment.Unlike approaches based on extending the CityGML 2.0 model to support spatio-temporal data, the CityGML 3.0 conceptual model is rich enough to represent 4D semantic city models without extension.By implementing automated conversion tools such as ShapeChange, ISO standard generation of CityGML 3.0 as a ontology network is possible and the benefits of ontology based integration approaches can be taken advantage of with minimal semantic data loss.
The CityGML 3.0 model is especially synergistic with this approach, regarding the creation of 4D semantic data, since the OGC generates the physical data model as XML Schema directly from the UML model using ShapeChange.Therefore, class and properties names from the UML model closely resemble their XML representation and thus transformation mappings from CityGML 3.0 to semantic urban graphs lose minimal semantic data and rest interoperable with their original conceptual model.
In addition, semantic graph formats like RDF align well with CityGML 3.0's graph-like representations of city models in the Versioning module.By integrating geospatial query frameworks like GeoSPARQL spatial navigation of these 4D semantic urban data graphs is possible.This can have many potential applications like change detection of city objects between versions of a city model which may rely on graph formats (Nguyen and Kolbe, 2020) and smart city applications based on Semantic Web technologies (Gaur et al., 2015, Bischof et al., 2014).
However, there are two modeling limitations of this approach to be addressed in future works.First and foremost, while the CityGML 3.0 application schema is directly generated from the conceptual model, the GML 3.2 model does share this characteristic.Many discrepancies in naming conventions between the GeoSPARQL ontology, ISO 19107 models, and GML 3.2 data were identified.For example, the application schema uses gml:id as a unique local identifier for geometric entities, while this does not exist in the GeoSPARQL ontology and ISO 19107 uses featureID as its identifier.To solve this, additional mappings are being integrated into the ontology network based on ISO TC211 standards to GML 3.2 instance mappings implemented by Enterprise Architect19 , an ISO standard compliant UML modeling platform.
Secondly, as the generated CityGML 3.0 ontologies are constructed from UML's frame-based, "closed-world" assumption of the conceptual model, the mapping transformations to OWL's more "open-world" assumption of the conceptual model are subjective and require some interpretation (Brink et al., 2014, Cox, 2013).These ontologies are defined according to a more restrictive interpretation in order to guard as many semantic relationships as possible.Because of this, it is possible that while this ontology network works well to generate 4D semantic graphs, other applications of UML to OWL mappings may require a more "open-world" interpretation depending on the purpose of the ontologies.
In addition, to ameliorate the temporal queries proposed in this paper, spatio-temporal query frameworks such as stSPARQL can be implemented to implement more complex temporal queries and improve performance (Garbis et al., 2013).

CONCLUSIONS
The evolving urban analysis landscape requires the integration of multiple heterogeneous and autonomous models catering to describing the urban lifecycle.However, a direct translation of data in different evolving data formats to the desired format may lead to semantic data loss.This article presents a methodology for integrating 4D semantic urban data models as a network of ontologies.This UML-based transformation approach based on standards helps to preserve interoperability and reduce semantic loss.This paper also proposes a pipeline for generating integrated RDF data in conformance to this model.The data obtained with this approach were validated and spatio-temporal queries were tested and executed.
Future courses of action include improving the existing semantic model and integrating more urban data models like cadastral data, city documents, and concurrent points of view of urban evolution.Also, the addition of bidirectional transformations could enable RDF to CityGML transformations for non-RDF ready applications.And finally, the implementation of more complex 3D and 4D urban data queries with SPARQL extensions like those provided by stSPARQL and BimSPARQL (Zhang et al., 2018) are being explored.

DATA AND REPRODUCIBILITY
Detailed notes and the code for reproducing the results of section 3 and 4 can be found on Software Heritage here20 .The generated CityGML 3.0 Versioning dataset is located under the folder /Transformations/test-data/RDF/OWL-basedtransformations.

Figure 1 .
Figure 1.4D Urban Data Semantic model as a network of ontologies

Figure 2 .
Figure 2. UML Union Flattening example with core:cityModelMember and core:cityObjectMember (top) and their mappings in OWL (bottom)

Figure 4 .
Figure 4. Proposed 4D Semantic Model Integration and Graph Generation Pipeline

Figure 5 .
Figure 5. Diagram of an RDF Graph of a Historical Succession Transition of Between 2 City Model Versions (above) with several Buildings (below) all the city objects in a version, v1, of a concurrent point of view.Line 3 verifies whether the version specified by the user is indeed a version and Line 4 and 5 return all the city object members and type for this given version.Query 3 Find all city objects in a city model version 1: SELECT DISTINCT ?cityobjectmember ?type 2

Figure 6 .
Figure 6.RDF Graph of classes and instances concerning building data generated according to the 4D Urban Data Semantic model defined in Figure 1; CityGML 3.0 Building data is circled in blue; GML geospatial data is circled in red; dashed arrows represent instances of semantic properties (or relationships)