RECONCILING CITY MODELS WITH BIM IN KNOWLEDGE GRAPHS: A FEASIBILITY STUDY OF DATA INTEGRATION FOR SOLAR ENERGY SIMULATION

Solar energy simulations are used to quantify the potential of the passive use (daylight, solar gains) and the active use (photovoltaics and solar thermal) of solar energy. The simulations can be performed at different scales e.g. buildings, neighbourhoods and cities, with different requirements on the data. For example, for the neighbourhood simulations we need simplified building geometries that can be retrieved from city models, and window information that can be extracted from BIM models (as in many cases window information is missing in city models). In this context, city models and BIM need to be integrated and reconciled. In this paper, we investigate two approaches to integrate and retrieve such information in a case study, where the BIM data is stored in IFC and the city model in CityGML (LOD2). The first approach is to perform a schema matching in an ETL tool, so as to convert and import window information from the IFC file into the CityGML model to create a LOD2-3 building model. We also investigate an alternative avenue, namely a semantic web approach, in which both the BIM and city models are transformed into knowledge graphs (linked data). City models and BIM utilize their respective but interlinked domain ontologies. Particularly, two ontologies are investigated for BIM data, i.e., the ifcOWL ontology and the building topology ontology (BOT). This paper compares different paths of such integrative data retrieval, as well as discloses the gaps mainly with the semantic web approach to further unlock its potential.

GeoBIM is often realised through data integration. Most commonly, BIM data is converted to geospatial data (often in the form of city models, e.g. CityGML). Such an approach of data integration usually relies on schema matching and conversion methods, which are often implemented using Extract, Transform, Load (ETL) tools. GeoBIM could also be accomplished with an integrated query approach (Karan and Irizarry, 2015). In such an approach the BIM and the geospatial data reside in separate environments, but there is a common query interface to the data.
One possibility to implement an integrated query approach is to use knowledge graphs (i.e. linked data in the model of Resource Description Framework; RDF), which is the means of representing knowledge and data on the Semantic Web. A knowledge graph consists of data stored in triples, where each triple is built on a subject, a predicate and an object (head entity, relation, and tail entity). Knowledge graphs are developed based on ontologies, in order to describe domain knowledge and to make data more interpretable and reusable by others, and also to enable machines to infer inexplicit information. Knowledge graphs have been increasingly utilized in both the BIM and the geospatial domains (Rasmussen et al., 2019;Huang et al., 2018;Huang et al., 2019). In addition, some early studies to align these knowledge graphs have been performed, see e.g. Delgado et al. (2013).
The general aim of this paper is to evaluate the knowledge graph based data integration approach for GeoBIM, from a feasibility perspective. Such a purpose is demonstrated in a case study of solar energy simulations, which entails synthesised information extraction. We believe the results and insights drawn from the case study can be generalised to other GeoBIM applications. The specific objectives are to evaluate the following research questions: -Are there suitable ontologies available for GeoBIM applications? -Which are the potential benefits/drawbacks of using a knowledge graph based approach for GeoBIM?

Data requirement in solar energy simulations
Solar energy simulations are mainly performed on three platforms: Computer Aided Design (CAD), BIM and Geographic Information Systems (GIS). In most CAD programs, users can build their 3D model of a singular building, a neighbourhood or a whole city. Most municipal urban planning departments work with CAD tools to design a detailed development plan (Kanters & Wall, 2016). CAD programs then perform the solar energy analyses through an external program, like Radiance 1 , to get feedback on the results back to its 3D modelling environment. BIM software also provide the possibility to conduct solar energy analyses, sometimes through a built-in analysis tool, sometimes via an external plugin (Jakica, 2017). Geodata (city models) is mainly used on the city level, and usually to develop solar maps: a platform that can show inhabitants of a city the solar potential of their roofs (and facades) (Eicker et al., 2014).
The data requirement for the solar energy analysis is slightly different for the three platforms, but in general it could be said that the analyses require geometry (at least building surfaces, and preferably a digital elevation model and vegetation data) and weather data. The requirements of level of detail and geometric quality of the data has been studied by e.g. Biljecki et al. (2015). Solar energy analyses can differ in purpose; active solar energy analyses, i.e. the quantification of how incoming solar energy can be transformed into electricity or heat. Daylight simulations or Building Performance Simulations (BPS) that simulate the energy need of buildings, do require more detailed information, for instance details about constructions of the wall, fenestration details (e.g. total area and properties of windows), occupancy behaviour, ventilation and Heating, ventilation and air conditioning (HVAC) settings, etc. Fenestration details are not only needed for BPS, this type of analysis is also interesting from an indoor working environment perspective since it has been shown that access to incoming daylight increases the employers' productivity and real estate value (Ander, 2003;Figueiro et al., 2002;Turan et al., 2020;Yang & Nam, 2010).

Data integration of BIM and geodata
To perform accurate solar energy simulations, it is vital that the data required for the simulation is available, e.g. that windows are included in the building model(s) used. Most of the 3D city models available today are at LOD1-2 (Donkers et al., 2016), namely models with simple roof shapes and without windows.
For buildings that are available as BIM models (both existing and planned buildings) a solution would be to integrate the BIM model with an existing 3D city model (geodata) to enrich the city model with required details.
A common approach to integrate BIM and geodata is to apply schema matching to convert BIM data in IFC format to CityGML. This conversion is, however, challenging due to differences in coordinate system, geometric representations, storage and access methods, as well as semantic mismatches between BIM and GIS data models (for reviews see e.g. Liu et al., 2017;Zhu et al., 2018).
Several studies have addressed the semantic mapping of element types in IFC (Industry Foundation Classes) to CityGML (Isikdag & Zlatanova, 2009;El-Mekawy et al., 2012;de Laat & van Berlo, 2011). Only 60-70 of the 900 element types in IFC can be converted to CityGML (de Laat & van Berlo, 2011) and there is not a one-to-one mapping between element types. As an example an IfcSlab element can be a roof/ceiling or a floor surface in a CityGML model. There are studies that have extended CityGML with an application domain extensions (ADE) (de Laat & van Berlo, 2011;Stouffs et al., 2018) to support the richer semantics in IFC. An example relevant to this study is that IFC has attributes for height and 1 https://github.com/NREL/Radiance/releases width of a window while in CityGML the size of a window must be calculated from the geometry (de Laat & van Berlo 2011).
The different geometric representations-IFC uses solids (CSG and sweep volumes) and CityGML uses boundary representations-is another challenge for the integration. In addition, it is common that there is no information in the IFC models stating which elements belong to the outer shell of a building (Donkers et al., 2016). This means that to find the outer walls of a building all IfcWall elements must be retrieved and the outer walls identified based on geometries. Examples of studies that have performed both semantic and geometric conversions are Donkers et al. (2016) who converted an IFC model to a valid LOD3 CityGML building model by mapping attributes from IFC to CityGML and applying morphological operators to achieve valid geometries; and Stouffs et al. (2018) who developed a method to perform a lossless conversion from IFC to CityGML with a triple graph grammar (TGG) method and by creating a CityGML ADE with additional classes and attributes.

Knowledge graphs and semantic web
Over the last two decades, semantic web technologies have been increasingly appreciated in both the geospatial and BIM domains, due to the apparent and demanding need of data exchange and integration in both inter-and across-domain settings. Semantic web technologies have also been proposed as a solution to the BIM and geodata integration problem (Karan and Irizarry, 2015;Liu et al., 2017).
Knowledge graphs are seen as a promising way to break data silos in today's big data era, and have become a backbone of many AI applications, including search engines, recommendation systems, and question answering . A knowledge graph is a multi-relational graph composed of entities (nodes) and relations (edges). Each edge is represented as a triple in the form head entity, relation, tail entity, indicating that two entities are connected by a particular relation. Such an intuitive data model provides an infrastructure for data to be organized into connected graph structures, and thus multi-source and heterogeneous data can be interlinked and integrated.
SPARQL is the most commonly used query language for RDF data (knowledge graphs). SPARQL can be used to express queries across divers data sources, whether the data is stored natively in RDF or viewed as RDF via middleware. Such a query language provides opportunities for querying across the GeoBIM data sources, instead of integrating BIM and geodata with an ETL process.
Knowledge graphs (linked data) have also been recognized as a promising means to achieve the vision of lifting BIM to its maturity level 3, i.e. that the data and process are exchanged purely on a web-scale and fully integrated across disciplines and companies (Rasmussen et al., 2019).

Ontologies for CityGML and IFC data
Knowledge graphs have become prevalent for data exchange and integration on the Web, for both geospatial data (CityGML city models) and BIM data from the Architecture, Engineering, Construction, Owner and Operation (AECOO) industry. In this context, ontologies have been designed for both of the two domains to serve as formal foundations for data integration and exchange.
A number of ontologies have been developed in the last decades in the geospatial domain, in virtue of the widespread use of geospatial data in various domains and applications. One of the most prominent outcomes is the standardisation of GeoSPARQL by the Open Geospatial Consortium (OGC) that includes a vocabulary for representing geospatial data as well as an extension of SPARQL for querying geospatial data in knowledge graphs (Perry and Herring, 2012). However, GeoSPARQL does currently not support 3D city models such as CityGML, in spite of the active discussions in OGC; this has also been added to the further developments of GeoSPARQL 2 . For CityGML, there have been some works that transformed the CityGML data schema into ontologies. In this study, we utilize the CityGML ontology developed by the University of Geneva 3 .
In the domain of BIM, the design of ontologies for IFC data have gained attention, mainly for the information exchange across different sectors that are involved in the building processes. A pioneer initiative in this direction is ifcOWL (Pauwels and Terkaj, 2016). ifcOWL is mainly based on direct transformation of the IFC EXPRESS schema. The major consideration in its design is the backward compatibility with the IFC EXPRESS schema. Therefore, ifcOWL has two major drawbacks: it is complex and large. It is particularly complex in view of the fact that it does not follow the best practices in the semantic web, e.g. modelling relations as classes. Many endeavours have been made to overcome the drawbacks of ifcOWL. In this context, the World Wide Web Consortium (W3C) convened a working group to address the issue of linked building data, i.e. the W3C LBD CG. After several years' incubation, the Building Topology Ontology (BOT) which provides high-level description of buildings such as storeys and spaces, and the building elements that they contain, as well as their web-friendly 3D models has been designed (Rasmussen et al., 2019). The BOT ontology significantly simplifies the representation of building data on the web, and thus eases its integration with other types of data (e.g. geospatial data and sensor web data). Nevertheless, it does not include the representation of concrete geometries of the building elements, which are among the most important information of building models. The rationale behind this is that the concepts for geometry are more widely used in different domains and applications, rather than the building information per se, and thus ontologies for geometric information should be developed separately from the BOT ontology. The approaches for representing construction-related geometries in knowledge graphs have been summarised by Wagner et al. (2020). The core of geometry representation is the Ontology for Managing Geometry (OMG) 4 . OMG is an ontology for attaching geometry descriptions to the corresponding objects (e.g, building objects) at three levels, depending on whether metadata of geometries need to be incorporated. The OMG ontology can be extended by the Ontology for Geometry File formats (FOG) 5 and Geometry Metadata Ontology (GOM) 6 . For details, refer to Wagner et al. (2020) and Bassier et al. (2020).

Background
The aim of this feasibility study is to investigate the potential benefits/drawbacks of using a knowledge graph based approach for GeoBIM. For that purpose we apply a user case from solar energy applications where we need to know the basic geometry of the building and the fenestration. Furthermore, it is assumed that this information should be in CityGML format (LOD3) to be used in the solar energy simulation program. To create the input data for the solar energy simulations, we have used two approaches. The first approach is a schema matching solution using an ETL tool where all the information is retrieved from a BIM model. The second approach is a knowledge graph approach. For that approach we start with a CityGML LOD2 model and a BIM model and based on that create the required CityGML LOD3 data.

Data
The building KTH demohuset is used as data in this study. The IFC-model was created as an example model for educational purposes.

A schema matching solution using an ETL tool
The IFC model was converted to CityGML version 2.0 with the ETL tool Feature Manipulation Engine (FME) from SAFE Software (https://www.safe.com/). First, all IfcWall elements from the IFC model were extracted and transformed to surfaces and a ray-casting method was applied to find the outer wall surfaces (see Olsson, 2018, for further details). Then surfaces were created to fill all openings in the wall surfaces. For holes (blue in Figure 1) surfaces were created by filling the openings; for openings that were aligned to an edge of a wall (yellow in Figure 1) the convex hull of the wall surface was used to find the surface edge that was not closed by the wall. Figure 1. One outer wall surface with two openings, where one opening is a hole (blue) and one opening is aligned to the edge of the wall surface (yellow).
A spatial matching was performed to pair the opening surfaces with the corresponding IfcWindow or IfcDoor elements in the IFC file. Finally the upper surface of the IfcRoof element was extracted and a gml:closureSurface was added to the bottom of the model.
To link the CityGML objects to corresponding IFC element in the IFC-file the GUIDs from the IFC model was used as gml:ids in the CityGML model with the prefix IFC_ added to state the source of the gml:id.

GeoBIM in knowledge graphs
We first introduce the general system architecture. Subsequently, we showcase the synthesised information ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume VI-4/W1-2020, 2020 3rd BIM/GIS Integration Workshop and 15th 3D GeoInfo Conference, 7-11 September 2020, London, UK extraction in knowledge graphs in a window information extraction case, i.e. extracting the window information for a LOD2 CityGML model through its link with an IFC model in the knowledge graph. Particularly, for IFC data, we investigate both the ifcOWL and the BOT ontologies for comparison, and for evaluation whether the simplified BOT ontology can facilitate our GeoBIM data integration task. For CityGML, we only employ the CityGML ontology designed by the University of Geneva (cf. Section 2.4). Figure 2 illustrates the overall system architecture of GeoBIM data integration in knowledge graphs. Simply put, the data from respective sources are transformed into knowledge graphs (to RDF). The constructed knowledge graphs are interlinked at both the ontological and instance levels. An integrative query interface based on SPARQL (potentially GeoSPARQL in the futureif GeoSPARQL will be extended) can retrieve the integrated data in the knowledge graphs. The knowledge graphs and query interface comprise the core of this approach. Above them programs and user interfaces can be developed to manipulate and consume the data.

System architecture
In this study, we transform the data into knowledge graphs based on their respective ontologies (CityGML and ifcOWL/BOT ontologies), and link the building instance through a relation in the SKOS vocabulary, i.e. skos:exactMatch 7 . Figure 2. System architecture of GeoBIM data integration in knowledge graphs

Information extraction
7 https://www.w3.org/TR/skos-reference/ The proposed system architecture enables information extraction based on synthesised data in knowledge graphs. The case study is to retrieve the window information, i.e. the total area of windows for the building and window geometries to be used in e.g. solar energy simulations.
The information extraction is based on SPARQL queries. The queries vary depending on the adopted ontologies for both the data sources. The CityGML ontology is used for the city model data. When using the ifcOWL ontology for the IFC data, we are able to use the SPARQL query in Listings 1 in the Appendix to extract the total window area of the building, yet we are unable to construct a query to get the geometries of the windows, as the coordinates are stored in lists that need to be reconstructed. Note that the query gives the total window area of the building regardless of whether the windows are on an outer wall or inner wall. In this test data there are no windows on inner walls in the building but for more complex buildings with windows on inner walls it must be stated in the IFC model whether it is an inner or outer window using the property IsExternal; but this property is seldom used in real world BIM models. An equivalent SPARQL query that extracts the total area of the windows using BOT ontology is shown in Listings 2 in the Appendix.
With the BOT ontology, we can use the SPARQL query in Listings 3 in the Appendix to get the geometries of the windows. The building elements and geometries are associated using the OMG ontology (basic linking, i.e. level 1). In this example the geometries are given in JSON strings, which are created via converting geometries to JSON in FME; note that how to represent geometries when using the BOT ontology has not been standardised, and using text strings is one plausible and simple avenue.
The result of the listings are RDF triples (i.e. knowledge graphs). To be used in the simulation program these triples need to be converted to CityGML format. This step is not conducted in this feasibility study.

CONCLUDING REMARKS
In this paper, we demonstrate the feasibility of using knowledge graphs (semantic web technologies) for parts of the GeoBIM data integration. We identify that the ontologies have been increasingly mature for these domains, i.e. CityGML and IFC. The BOT ontology can significantly simplify the representation and querying of IFC data in knowledge graphs compared to ifcOWL, but how to represent geometries with BOT ontology has not been standardised yet. For the ontologies of CityGML, we would anticipate the emergence of an upgrade of GeoSPARQL, where the 3D city models will potentially be supported.
A key step in the GeoBIM data integration process with knowledge graphs is the linking between the two data sources. Such linking includes both the alignment at the ontological as well as instance levels. For the alignment of concepts and relations in the ontologies from the two domains, a number of studies have been performed (see e.g. Delgado et al., 2013), where a number of methods for ontological matching have been developed. Nevertheless, we believe one challenge here is the standardisation of the ontologies. For BIM (IFC), the BOT ontology has gained momentum, yet it still has a long way forward. For 3D city models, we need standardised ontologies. We believe further development of GeoSPARQL would be promising in this respect. For the alignment at the instance level ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume VI-4/W1-2020, 2020 3rd BIM/GIS Integration Workshop and 15th 3D GeoInfo Conference, 7-11 September 2020, London, UK (building (element) entity alignment), we can partly reuse some techniques developed for entity alignment in knowledge graphs, with either rule-based or machine learning methods. However, one significant challenge here is that vocabularies for representing meaningful relations between building/building element from multiple data sources are needed, which are lacking at the moment. We hope that improved ontologies can be realized from the ongoing cooperation between BuildingSMART (founder of IFC) and OGC (founder of CityGML) 8 .
This feasibility study sheds insights into the benefits and drawbacks of a knowledge graph approach to the GeoBIM data integration problem. Compared with the ETL methods that are commonly used at present (and illustrated in Section 3.3), the knowledge graph based approach has several advantages, ranging from knowledge formalisation, data exchange and integration on the web, to the utilisation of mainstream web technologies, and the data becoming more compact (especially when using BOT ontology). However, we also should not neglect the drawbacks of the knowledge graph approach, such as the lack of software support, and the lack of genuinely standardised ontologies. Another highly relevant shortcoming is the lack of platforms that handle GeoBIM data in knowledge graphs, either as add-on for RDF stores or in a way that specific programs can import such data and consume it. This shortcoming holds valid also for the user interface. SPARQL queries, as shown in Section 3.4.2, can sometimes be complicated to compose, and entail significant workload for users to grasp. In this regard, graphic interfaces are important for users to compose the queries and potentially analyse the extracted data visually (see e.g. example of such a user interface for geospatial linked business data in Gür et al., 2017).
To conclude, there is a potential in using a knowledge graph approach in GeoBIM applications (such as solar energy simulations) but there needs to be much work added to formalise improved ontologies and creating tools that handle the knowledge graphs.