TOWARDS THE AUTOMATIC ONTOLOGY GENERATION AND ALIGNMENT OF BIM AND GIS DATA FORMATS

Establishing semantic interoperability between BIM and GIS is vital for geospatial information exchange. Semantic web have a natural ability to provide seamless semantic representation and integration among the heterogeneous domains like BIM and GIS through employing ontology. Ontology models can be defined (or generated) using domain-data representations and further aligned across other ontologies by the semantic similarity of their entities introducing cross-domain ontologies to achieve interoperability of heterogeneous information. However, due to extensive semantic features and complex alignment (mapping) relations between BIM and GIS data formats, many approaches are far from generating semantically-rich ontologies and perform effective alignment to address geospatial interoperability. This study highlights the fundamental perspectives to be addressed for BIM and GIS interoperability and proposes a comprehensive conceptual framework for automatic ontology generation followed by ontology alignment of open-standards for BIM and GIS data formats. It presents an approach based on transformation patterns to automatically generate ontology models, and semantic-based and structure-based alignment techniques to form cross-domain ontology. Proposed two-phase framework provides ontology model generation for input XML schemas (i.e. of IFC and CityGML formats), and illustrates alignment technique to potentially develop a cross-domain ontology. The study concludes anticipated results of cross-domain ontology can provides future perspectives in knowledge-discovery applications and seamless information exchange for BIM and GIS.


INTRODUCTION
The future of urban planning and development is unfolded as an important research area with the motive of building smart cities (Jamei et al., 2017). The purpose of a smart city manifests virtual representation of detailed integrated geospatial information as a digital twin. The open-standard data collection and representation using cutting-edge technologies like Geographic Information Systems (GIS) and Building Information Modeling (BIM) has further accelerated the emergence of this research area (Fosu et al., 2015, Ma, Ren, 2017, Wang et al., 2019. Several studies have recently investigated the benefits of effective integration of BIM and GIS (Song et al., 2017), including the perspective of infrastructure planning, development, and analysis. However, achieving integration among two distinct domains of BIM and GIS is quite challenging (Pauwels et al., 2017).
The traditional methods to achieve integration of BIM and GIS have highlighted issues such as information loss, software and data incompatibility, and use-case specific limitations (Liu et al., 2017, Noardo et al., 2020) (more details are provided in Section 2). On contrary, integration methods conducted based on semantic web technology have shown the promising contribution towards achieving the BIM and GIS interoperability (Karan et al., 2016, Hor et al., 2018.
Industrial Foundation Class (IFC) and City Geographic Markup Language (CityGML) are prominent information exchange * Corresponding author formats accepted among respective BIM and GIS communities, and further used in integration methods. Extensible Markup Language (XML) based versions of these openstandard formats are more flexible for integration. Therefore, due to the close nature of XML-based environment of semantic web, these XML-based formats and semantic web have been investigated for interoperability. These technologies complement one another in terms of interdisciplinary information exchange that is extensible and flexibility (Bikakis et al., 2013). Many efforts have been made comprising semantic web solutions for both perspectives of ontology development and mapping (alignment) of the built ontologies. However, these potential solutions have limitations primarily towards fundamentals of semantic web technique, developing ontologies (Pauwels et al., 2017, Zhu et al., 2018.
This study's main contribution is to highlight adaptation of the alignment of concepts between domain ontologies of information systems by requiring correspondences. Consequent to this approach, preceding enriched ontology models are demanded to establish a correlation between geospatial ontologies. Particularly, a conceptual framework with neural network-based alignment approach for cross-domain correspondence is presented. The rest of the paper is organized as follows. Section 2 provides comprehensive literature review of studies carried out for BIM and GIS integration, and further recognizes the role of semantic web in BIM and GIS integration with a room for improvements. A framework detailing about ontology generation and alignment is presented in the Section 3. In Section 4, a promising experimental evaluation and anticipated results are extrapolated. Section 5 summarizes this study and further points to the future directions.

LITERATURE REVIEW
Several studies have been presented in the literature that provides a critical and state-of-the-art review on the BIM and GIS integration methodologies by complimenting their domain and data-oriented strengths and weaknesses. In correspondence to literature on ontology generation and alignment itself, semantic web techniques have been investigated discretely for many years against corpus (text) based solutions towards mapping, matching and interoperable information exchange. The sections below provides the summarized review of their backgrounds and how they pave course for using semantic web as potential solution for BIM and GIS integration problem.

State-of-the-Art: BIM and GIS Integration
Various efforts have been conducted to classify BIM and GIS integration methodologies, including semantic or geometric level, unidirectional or bidirectional conversions involving commercial or open-source software (Fosu et al., 2015, Irizarry et al., 2013. Based on similar subject keywords, (Kang, Hong, 2015) showed that BIM and GIS integration have undergone various perspectives and those approaches can be classified into five groups: schema mapping (El-Mekawy,Östman, 2010, Deng et al., 2016, integrated web services (Cruz et al., 2004, Karan, Irizarry, 2015, ontological modelling (Karan, Irizarry, 2014, Peachavanish et al., 2006, Hor et al., 2016, and data transformations and schema extensions (El-Mekawy et al., 2012). Furthermore, a significant three-levelled framework classified by (Amirebrahimi et al., 2016) categorizes the integration studies into application, process and data level.
As mentioned in literature above, integration of building and geospatial information has been carried out over a decade based on various considerations of mapping, modelling schemas, implementation of services, data transformations or schema extensions using standard IFC and CityGML formats. Nonetheless, still no considerable way is established for interoperability since data formats of cross domain building elements are geometrically and semantically inconsistent (Pauwels et al., 2017, Zhu et al., 2018, Noardo et al., 2020. A study on GIS and BIM integration methods with parameters selection of Effort, Extensibility, Effectiveness and Flexibility (EEEF) (Liu et al., 2017) states Semantic Web as much more promising solution to their integration compared to other methods. Therefore, this study proposes semantic web based framework to achieve interoperability with potentially minimal loss of information.

State-of-the-Art: Ontology Generation and Alignment
The process of ontology development is complex, with mostly manual approaches (Liu et al., 2017). Expressing correct semantics of data in an ontological representation itself requires domain knowledge. The ontology development literature highlights semi or fully-automated processes for ontology generation (Hacherouf et al., 2015). Nonetheless, these established frameworks are manifested for corpus-based approaches. In reference to ontology generation for the integration of BIM and GIS formats, frameworks are not widely available to generate ontologies. Mostly follow manual or semi-automated approach to generate ontology models (Karan et al., 2016, Hor et al., 2016, and in some cases, a reference ontology is established for BIM and GIS integration (El-Mekawy,Östman, 2010, Deng et al., 2016.
Furthermore, for the interlinking of entities (concepts) between ontologies, ontology models of heterogeneous data-formats requires mapping techniques to obtain cross-domain integrated ontology for information analysis and knowledge-graph applications. Ontology alignment, also called ontology mapping, is the key to reaching interoperability over cross-domain ontologies (Raad, Evermann, 2015). The XML data-formats of BIM and GIS, ifcXML and CityGML respectively, have heterogeneous representations, hence, their ontologies are distributed. Thus, it is necessary to find alignment between them before processing information across these domains. Ontology alignment has been investigated for several years with specialized studies to help formally integrate ontologies or knowledge-bases formed in different domains (Giunchiglia et al., 2012, Farah et al., 2016. These approaches are generally limited to corpusbased studies, which requires further investigatation of alignment knowledge for entities specific to building and geospatial domain, as most approaches adapted for geospatial ontology alignment are either manual or lacks in mapping across entities (El-Mekawy,Östman, 2010, Deng et al., 2016. Henceforth, in this paper, we propose a framework to automatically generate ontology models from IFC and CityGML schemas. Furthermore, study extends towards aligning (mapping) geospatial ontology models, which mainly applies semantic-based Word2vec algorithm (Mikolov et al., 2013) and structure-based Node2vec algorithm (Grover, Leskovec, 2016) for ontology alignment of generated BIM and GIS ontologies.

METHODOLOGY
A conceptual framework is outlined in Fig. 1 as composite of multiple processes and algorithms to accomplish contemporary objectives -defined in two phases: ontology generation and ontology alignment of BIM and GIS data formats. For the first phase, this study utilize previous work following Ontology Generation for Geospatial Data (OGGD) (Usmani et al., 2020) to generate ontology models in Web Ontology Language (OWL) format from XML Schema Document (XSD). In the second phase, an innovative approach of Ontology Alignment for Geospatial Data (OAGD) is introduced that involves semantic and structural alignment technique utilizing BIM and GIS ontologies generated in the previous phase. The details on framework to achieve interoperability between distinct data formats is discussed in further sub-sections.

Ontology Generation of Geospatial Data
In the presented framework, OGGD carries out preparatory steps for automatically generating ontology for a given schema document, as presented in the first phase of Fig. 1. An XML Schema Document (XSD) is a set of XSD constructs in a hierarchical structure to precisely describe and validate XML documents. The structural complexity of an XSD depends on the XML documents -in this case ifcXML and CityGML, openstandard formats for BIM and GIS respectively.
The framework of OGGD, itself, is based on three steps (see  (Ganter, Wille, 1999). In the second step, sets of patterns in XSD constructs are identified using F S(X S) model and transformation patterns in FCA context. Further, if multiple patterns are identified for set of XSD constructs, pertinent patterns algorithm is adapted to precisely identify correct pattern. Finally, each XSD construct associated with an appropriately identified transformation pattern is represented in an ontology fragment, later converged into an OWL model. Accordingly, for a given ifcXML and CityGML schema, respective ontology models O BIM and O GIS are generated.

Ontology Alignment of Geospatial Data
The second phase OAGD of the proposed framework includes an alignment process to determine the correspondences (semantic-relations) between concepts (entities) across ontologies, which represent classes and individuals in generated ontology models. It proceeds with alignment method proposed by (Geng et al., 2020) to form a cross-domain ontology of BIM and GIS ontologies.
In this phase, first, features are extracted from O BIM and O GIS ontologies in sets of classes, properties and individuals along with annotations, and stacked in formal form for efficient retrieval of specific entities and their correspondences (Jiang et al., 2014). Feature extraction involves traversal of input ontologies and storing the labels or description of entities in a hash table as key and value pair for efficient retrieval. Next, to procure the alignment and to imply potential ontology mapping relations among concepts (e.g., one-to-many, many-to-one, or many-to-many) the semantic-based Word2vec 1 and structurebased Node2vec 2 algorithms are adopted on the generated hash table. These algorithms utilize each concept (entity) from ontology graph models to be represented in vector formats that estimate the semantic similarity between entities -composing a correspondence. Furthermore, from these vectors, an aggregated confidence value representing similarity assessment between mapping entity nodes can be estimated, which potentially identifies if the two entities are aligned by their similarity levels. For example, the "Building" entity refers to one of the components in CityGML that may have multiple Level of Details (LoD) instances, while in case of IFC, it is referred to as "IfcBuilding" with a single instance. In this cases, both entities represent semantics of same entity (i.e. the "building"). Such a process of aggregating similarity measures is defined as similarity aggregation, and can be applied using weighted average similarity proposed in (Acampora et al., 2013) which illustrates Eq. 1 applies aggregated similarity value for c as: where for i th similarity measure, wi is the associated weight of h similarity measures and simi(c) is the similarity value computed against each correspondence c for an alignment A with the k correspondences; such that i = {1, 2, ..., k}, for correspondence ci.
A leading study (Geng et al., 2020) employs that the similarity measures utilized to aggregate the weighted average similarity between two entity nodes determine the similarity between a pair of correspondence-ranges from 0 to 1. Where, close to 0 means there is no similarity alignment mapping for aligned entities of distinct domain ontologies, and similarity close to 1 denotes there is alignment mapping, which consequently indicates equal-to relation between two entities. In particular, Word2vec and Node2vec algorithms are integrated into OAGD to estimate the similarity among entities for better alignment results. Therefore, by applying weighted average similarity on vectors, similarity assessment between mapping entity node can be estimated, which potentially identifies aligned entities. These indicated aligned entities could be integrated to result in a crossdomain ontology.

Evaluation Metrics
Each phase of the proposed framework is evaluated distinctively. To evaluate the ontologies generated from the first phase, quantitative evaluation evaluation of ontology metrics and comparative analysis with other leading ontology development approaches (Pauwels, Terkaj, 2016) in the geospatial domain will be compared. Alongside, validation of generated ontologies and performance measures will be discussed.
Furthermore, for next phase, generated ontologies will be utilized as input data to the ontology alignment phase, and OAEI 3 evaluations the reliability of the proposed approach available and documented alignment mapping approaches will be assessed. In OAEI ontology mapping campaign, ontology integration methods are evaluated using precision, recall and Fmeasure metrics (Euzenat, Shvaiko, 2013), that can be described as follows: Precision is defined as the percentage of correctly identified mappings in all identified mappings, while recall is the percentage of correctly identified mappings in all existing mappings, and are presented in Eq. 2 and 3 respectively.
Here, correct identified mappings refers to alignments mappings as ground truth from base knowledge and domain expertise, and identified mappings are results by respective alignment approaches. Also, existing mappings refers to existing alignments mappings as test data. Accordingly, Eq. 4 represents Fmeasure, the harmonic mean of precision and recall.

Anticipated Outcomes and Discussion
For first phase of methodology, pilot study of OGGD (Usmani et al., 2020) implements a framework with defined transformation patterns for concept validation of automatic ontology generation for XSD schema of ifcXML and CityGML formats. The study unfolds ontological axioms including Class, Datatype, ObjectProperty and DatatypeProperty in ontology models mapped to respective XSD elements as potentially rich semantics of concepts, relations and properties. For ontology results to be conclusive, more transformation patterns are incorporated in OGGD to generate ontologies that are evaluated as semantically exhaustive. Also, even transformation patterns are defined, significant effort is required to manifest an automated framework that articulates XSD schema elements to OWL representation, making ontology generation as evolving process.
The second phase of methodology is a conceptual framework to identify mappings with similarity measures between BIM and GIS domain integrating two approaches-semantic-based alignment and structure-based alignment. Generated ontologies from the previous phase are taken as input to identify mappings with similarity measure between BIM and GIS domain entities. Besides, other ontology alignment tools -like COMA 3.0 4 and Yam++ 5 , will be compared to the results in terms of precision, recall and F-measure with the proposed approach. Alongside the execution time of both phases of the proposed methodology will be investigated to conduct performance improvements.

CONCLUSION AND FUTURE PERSPECTIVES
The integration of BIM and GIS has come a long way where innovative methods are adapted to bridge the gap between two fundamentally distinct domains. The proposed method utilizes open-standard BIM and GIS data-formats, ifcXML and CityGML respectively, to highlight semantic web as a promising approach towards establishing integration and provides a conceptual framework including ontology generation and ontology alignment phases.
First, an extension of the preparatory framework of OGGD is required. For the maximum transformation of XSD schema into a comprehensive ontology model, sufficient XSD to OWL transformation patterns must be implemented. Furthermore, new transformation patterns are required to account XSD elements (e.g. list) which are not enlisted in transformations (Bedini et al., 2011, Hacherouf et al., 2019. This paper aims to establish implementations for a complete set of transformation patterns and to generate comprehensive ontology models for XML-based geospatial schemas. Second, this paper proposes a new approach of ontology alignment, by integrating semanticbased Word2vec and structure-based Node2vec algorithms using similarity assessment, for conceptual mapping across domain generated ontologies. Consequently, cross-domain integrated ontology is accessible to facilitate analysis, information exchange and knowledge discovery applications.
The findings of the conceptual framework will be shared, and essential optimization strategies of the proposed method will be presented. The notion to represent data from BIM and GIS in a semantic web technology stack with minimal human intervention elevates information integration and exchange. Generating ontology models of geospatial data and automatically interlinking their cross-domain entities with defined specifications will create the semantic process efficient, extensible, and effective towards achieving interoperability and furtherance research.