The need for a differentiation between heterogeneous information integration approaches in the field of “BIM-GIS Integration”: a literature review

: The heterogeneous character of information models results in communication barriers between subsystems in railway organizations dealing with Building Information Modeling (BIM) and Geographic information systems (GIS). The integration of information is a promising way to bridge the heterogeneity of information models and satisfy the need for a more efficient communication. Integration efforts exploited in expert literature are often referenced using umbrella terms like “BIM-GIS Integration” or “GeoBIM”, although dealing with different challenges and addressing different purposes. This paper highlights the need for a differentiation between integration efforts covered by the umbrella term “BIM-GIS integration”. For this, a new approach for the categorization of information integration efforts was developed based on a literature research. Afterwards, challenges concerning information integration efforts in the field of “BIM-GIS Integration” were exploited and assigned to the respective categories to illustrate the importance of differentiation between heterogeneous information integration efforts.


The need for efficient, lateral communication
Organizations are highly specialized due to the division of labor (Smith 2010) and the increasing complexity of tasks an individual has to carry out (Bar-Yam 2002). Due to the independent development of highly specialized subsystems of organizations and the related heterogeneity of information requirements, organizations deal with heterogeneous information models developed for different purposes. The heterogeneous character of these information models causes communication barriers. However, the increasing complexity of the demands of

Research Method
This paper is part of a doctoral research investigation following the Design Science Research (DSR) method. The DSR method aims to create and evaluate Information Technology (IT) artifacts intended to solve the identified problems. According to Peffers et al. (2007) the DSR process generally includes six steps: 1) Problem identification 2) Definition of objectives 3) Design and development of artifacts (e.g. frameworks, methods, models) 4) Demonstration by using the artifact to solve the problem 5) Evaluation of the solution, comparing the objectives and the observed results 6) Communication of the problem, the artifact, its utility and effectiveness. This paper addresses solely the step problem identification and its communication to provide an understanding of integration efforts concerning the keyword "BIM-GIS Integration" as basis for the further research investigation.
The paper is structured into four major parts ( Figure 1). First, a literature research concerning "BIM-GIS Integration" was conducted. Second, categories concerning information integration efforts were developed. Third, challenges of information integration efforts were identified. Last, the challenges were assigned to the respective categories. The literature research was mainly conducted using Scopus, a database of peer-reviewed literature. The scope of the paper is limited to integration efforts addressed by the term "BIM-GIS Integration".

Categorization Method
The developed categorizations are following three different approaches based on the concept of enumerative definitions. The application of enumerative definitions means that the information integration efforts are identified and assigned to categories in order to provide an abstract meaning of the respective category (Waldmann 2008). The concept of enumerative definition is applied due to the difficult creation of formal definitions. The first approach makes use of existing categorizations developed in similar or same context. The second approach adapts existing categorization developed in the same or similar context. The third approach creates new categories from scratch. The three methods use the concept of enumerative definition to assign meaning to the adapted or created categories. The creation and adaption of categories is an iterative process and the validity of the developed categorization is related to the context in it is created. To allow an easy understanding, the categories are described, whereas the descriptions are not seen as formal definition.

Conceptualization
In this paper, the usage of the keyword "BIM-GIS Integration" is seen as problematic due the vagueness of the comparison BIM-GIS and the term integration. While BIM is in general associated with a method, the term GIS refers to systems. The comparisons CAD and GIS on system level, BIM and Urban Information Modeling (UIM) on method level, and e.g. Industry Foundation Classes (IFC) and CityGML on data level would be rather convenient (Hijazi and Donaubauer 2017). Similarly, Herle et al. (2020) compare BIM and Geospatial Information Modelling (GIM) and Amirebrahimi et al. (2015) distinguish between integration efforts at data-, process-, and application level. Thus, the vagueness regarding the comparison "BIM-GIS" leave scope for interpretation, e.g. if the integration occurs at system-, method-or data level. Furthermore, the integration subject at data level is not specified, since the integration effort may address e.g. databases, information models or ontologies. This paper deals with integration efforts concerning information models, whereas the understanding of information models follows the specification of the Meta-Object Facility (MOF) from Object Management Group (OMG) (2019).
On the other side, there is no common definition regarding the term integration. Thus, the term integration does not further specify if the information integration effort addresses e.g. the combination, the inclusion or the conversion of information models. Furthermore, integration subjects may refer to information models from different domains, heterogeneous infor-mation models or different instantiations of the same information model, e.g. different versions. A conceptualization of the term integration is challenging, since the conceptualization need to fit to the context of the research investigation including its limitations. For example, information integration in terms of combination of information models excludes conversion of an information model into another. Therefore, this paper follows and adapts the approach from Scherer and Schapke (2011) and refers to interdependent information models as integration subjects. For instance, three major interdependencies are vertical-(different detailedness), horizontal-(here: heterogeneity), and longitudinal interdependencies (changes in time). The differences between the information models occurring along these interdependencies must be "bridged" (Kolbe and Plümer 2004) to create a whole, integrated artifact. Additionally, particular interdependencies belong to different levels of the MOF specification, e.g. instanceand schema level. Consequently, the goal of the information integration process across the domains BIM and GIS is understood as the creation of a whole by bridging differences of interdependent information models at instance-or/and schemalevel, whereas combining and interlinking information models are seen as different approaches to achieve this goal. In the scope of this paper, information models provided as input for the information integration effort are called source information model.

Related literature
Four different categorization subjects were identified in the literature concerning "BIM-GIS Integration" at data level. The first categorization subject is related to the heterogeneity of information models and can be further subdivided into two different categorization approaches: First, the accumulation of information model differences (Brüggemann and von Both 2015;Kolbe and Plümer 2004;Liu et al. 2017;Herle et al. 2020). Here, differences of information models are listed without following a specific approach, e.g. different reference systems, geometric representations, granularity. The second categorization approach is based on rather general approaches following related research subjects like ontology integration (Brodeur 2012) or interoperability models (Herle et al. 2020). For example, the distinction between syntactic, structural and semantic heterogeneities.
The second categorization subject covers integration approaches or -methods. The categorization from Amirebrahimi et al. (2015) is complemented by Zhu et al. (2018) and Liu et al. (2017) by providing examples, whereas Zhu et al. (2018) additionally differentiate between semantic and geometric differences at data level and Liu et al. (2017) enhance the categorization at data level by the subcategories meta-model, extension, conversion/ translation. Furthermore, Hijazi and Donaubauer (2017, p. 44) distinguish between the integration methods conversion of IFC to CityGML, conversion of GIS/CityGML to IFC, Unified Modeling and Linking BIM and UIM. This categorization approach is picked up by Herle et al. (2020). The categorization approaches from Amirebrahimi et al. (2015) and Hijazi and Donaubauer (2017) served as orientation for the developed categories of integration approaches in this paper. Further categorization approaches of information integration methods are provided by e.g. Juan et al. (2006), Kang and Hong (2015).
The third categorization subject addresses the categorization of use cases related to "BIM-GIS Integration" (Liu et al. 2017;Fosu et al. 2015;Song et al. 2017;Noardo et al. 2019). However, the vagueness of the terminologies complicates comparison and evaluation of those categorizations. The use cases were relevant for categorization development concerning the purpose of "BIM-GIS Integration".
The fourth categorization approach refers to superordinated categorization of information integration efforts. Wang et al. (2019) distinguish between the following information integration efforts: BIM leads and GIS supports, GIS leads and BIM supports, and BIM and GIS equally involved. Additionally, Wang et al. (2019) has assigned expert literature to theses superordinated categories with respect to integration methods and use cases. This superordinated categorization approach is considered in the developed categorization concerning the purpose of information integration efforts.

Overview
The categorization of information integration efforts is based on three main categories: information characteristics, solution characteristics and purpose. The information characteristics refer to the categorization of information provided by the source information models. The solution characteristics address the categorization of integration-and communication methods. Finally, the categorization of the purpose refers to heterogeneous intentions of the respective information integration efforts. An overview of the categories is provided in Figure 2.

Purpose
Following West (2011), the "business value of data comes from its use in contributing to sound decisions". This contribution can occur in different steps of the decision-making process, e.g. problem analysis, simulation, or alternative evaluation Consequently, the purpose of information integration efforts is based on the use of the integrated information in the respective steps of the decision-making process. Additionally, there is the assumption that the decision-making process refers to a specific subject. Following Wang et al. (2019), there are three different points of views: the decision refers to subjects related to BIM and is supported by information related to GIS, the decision refers to subjects related to GIS and is supported by information related to BIM, or the decision subject refers to BIM and GIS equally.
According to the understanding in this paper, the requirement for sound decisions is the sufficient quality of the data embedded in the right context.
• Data Quality: The information integration effort aims to improve the data quality of the provided information set by obtaining relevant information from interdependent information models to satisfy the respective information requirements. Among others, key data requirements are timeliness, reliability, consistency and completeness. For instance, Karan and Irizarry (2016) write that planners require information from different domains for decision-making processes. Moreover, information required for decisions in supply chain management is stored in heterogeneous information models (Karan and Irizarry 2014). Also energy analysis and simulations may need information from heterogeneous information models (Sicilia and Costa 2017). • Data Context: The information of the provided information models is created and embedded in a specific context. Transferring the information to a related context is in some cases purposeful for the decision-making process, e.g. to apply GIS-functionalities on BIM-data or vice versa. For instance, Benner et al. (2005) write that the transformation of data is necessary to make tools from CA(A)D accessible to city modeling. And Li et al. (2020) transfer city-and building information to the context of Precinct Information Modeling (PIM).
In expert literature, the creation of software product interoperability is often highlighted as a major purpose of information integration efforts concerning "BIM-GIS Integration" (Li et al. 2020;Sicilia and Costa 2017). In this paper, software product interoperability is understood as potential capacity to fulfill the purpose of the integration process. Software product interoperability refers to the ability of software products to work with each other, whereas information integration means bridging differences between interdependent information models. Thus, information integration efforts are understood as necessary for the creation of interoperability between software products based on heterogeneous information models. However, not all information integration efforts are intended to achieve software product interoperability, e.g. linking information models to evaluate information consistency.

Information characteristics
2.3.1 Schema-and Instance-level: Information integration efforts can be applied at schema-and/or instance-level. The schema level refers to the level M1 of the MOF, whereas the instance level refers to level M0. A clear differentiation concerning schema-and instance-level is in some cases difficult, since information integration efforts at instance-level often implicitly deal with differences at schema level, e.g. when converting an instance-model from IFC to CityGML. In the following, the referenced information integration efforts in the category schema-level address solely differences at schema level, whereas integration efforts referring to the category instancelevel may cover differences at both levels. Instance-level: The instance-level covers instantiations referring to natural or abstract real-world objects. The instance-level information is stored in textual specifications like STEP Physical Model or XML. For example, Donkers et al. (2016) converted IFC datasets to CityGML. Akob et al. (2019) transferred BIM data to ArcGIS. Moreover, Vilgertshofer et al. (2017) linked instance information of IFC-and CityGML models.

Conceptual Differences:
Among others, heterogeneous information models cover differences at conceptual level. Following Benerecetti et. al. (2001), conceptual heterogeneity of ontologies can be grouped in differences in granularity, coverage, and perspective. The transfer of conceptual differences from the field of ontologies to information models follows the assumption that schemas and ontologies are similar since both provide a vocabulary of terms that describes a domain of interest and both constrain the meaning of terms used in the vocabulary (Euzenat and Shvaiko 2013). However, the clear assignment of conceptual differences to source information models is seen as problematic due to the vagueness of the terms granularity, coverage, and perspective. Furthermore, interdependencies between heterogeneous information models often refer to combined conceptual differences. In the following, three conceptual differences are described and examples concerning combined conceptual differences are provided subsequently.
• Difference in granularity: Information models may differ regarding the granularity of the entities. However, a comparison of the granularity level of information models from different domains is often problematic due to the missing consistency. • Difference in coverage: Integration efforts dealing with similar coverage of source information models mean that the information models represent similar real-world entities. • Difference in perspective: Differences regarding the perspective of two source information models result from different views on information.
In the integration effort of Beetz and Borrmann (2018) source information models represent the street network in Netherlands (RWS-OTL and CB-NL) and Germany (OKSTRA). Here, the information models are based on similar coverage and similar perspective. Furthermore, major amount of information integration efforts address the interdependencies of the information models IFC and CityGML concerning building information (see categories intersection and conversion). The coverage overlaps since both source information models refer (partly) to buildings. The granularity differs because e.g. IFC models provide more detailed entities regarding interior characteristics of physical building elements. The source information models cover differences in perspective, because IFC models are generally created in the design-view (prescriptive) and CityGML models are based on a topographic view (descriptive).

Real-world Objects:
The objects provided by information models at instance-level refer to real-world objects. However, conceptual and semantic differences impede a clear description whether the respective instances of heterogeneous information models represent the same or different real-world object. The following categories represent a simplistic view using the principles of quantity operators and should be referenced with caution. Nevertheless, this categorization is deemed to be necessary and adequate for the purpose of this paper. • Difference: Information models based on instance-level difference describe different real-world objects. For example, the integration of information models representing road-networks from different countries refer to different real-world objects (Beetz and Borrmann 2018) • Union: Source information models based on both instancelevel intersection and -difference refer to the same and different objects. For instance, converting an IFC model into CityGML model and embedding that building information in an information model representing the surrounding environment. The source IFC model and the resulting CityGML model refer to the same real-world object, whereas the surrounding environment refers to different real-world objects.

Solution characteristics
2.4.1 Integration Methods address the process of bridging the heterogeneity of the source information models. As mentioned previously, a variety of categorizations regarding information integration methods across the domains BIM and GIS were published during the past years. The categorizations approaches from Amirebrahimi et al. (2015) and Hijazi and Donaubauer (2017) served as basis for the developed categorization, which is illustrated in Figure 3. Interlinking: The interlinking of information models means the establishment of explicit links between different source information models. Beetz et al. (2014) have developed a method defining RDF dictionaries for wall structure of IFC models. Hor et al. (Hor et al. 2016;Hor et al. 2018) propose a method to integrate IFC and CityGML using RDF-graphs. Vilgertshofer et al. (2017) connected IFC tunnel proposal with CityGML instance models using Semantic Web technologies. Esfahani (2013) propose the multi-model method according to Scherer and Schapke (2011) link the elements of heterogeneous information models in the field of infrastructure. Beetz and Borrmann (2018) Amirebrahimi et al. (2015) have developed an Urban Flood Model as metamodel. Teo and Cho (2016) propose a multi-purpose geometric network model (MGNM) to connect indoor and outdoor network connections. Choi et al. (2008) have developed an Ubiquitous Space Information Model for Indoor GIS. Moreover, Aien et al. (2013) have proposed a 3D cadastral data model (3DCDM). Benner et al. (2005) have developed the QUASY model as meta model concerning IFC and CityGML.
Additionally, combinations of information integration methods at schema and instance level were identified during the literature research. For example, the respective information models are merged at schema-level, and the resulting shared model may be used as intermediate model (El-Mekawy et al. 2012) or as target model for the conversion process (Benner et al. 2005). Another example for combined methods is the interlinking between the source information model and a shared model. Among others, the three methods extension, interlinking and merging are intended to prevent information loss which may occur during plain conversion. Figure 3: Categorization of integration methods.

Communication
Methods refer to the way of transmission of the information from the sender to the receiver. Two different ways communicating the information with respect to integration efforts were identified.
• Exchange: The exchange of information means the import and export of the integrated information model in software products using network solutions or external storage devices. For instance, Akob et al. (2019) imported BIM data into the software product ArcGIS.

•
Querying: Communication of information is achieved through querying the integrated information model. For instance, several integration efforts concerning interlinking methods use SPARQL to access the integrated information (Karan and Irizarry 2014;Hor et al. 2016;Zhao et al. 2019;Zhang and Beetz 2016;Sicilia and Costa 2017) Here, modifications of software products are either part of integration efforts at data or application level. For instance, modifications of software products at data level may be required when interpreting an extended information model, querying, or implementing web services. In distinction to that, modifications of software products implying changes of software product functionalities refer to integration efforts at application level.

Identified challenges
Information overload: The scope of the integrated information needs to fit to the purpose of the integration efforts. The availability of data irrelevant for the respective decision-making process step impedes its target-oriented conduction (Laat and van Berlo 2011;Hijazi et al. 2019).
File size: The file size of the integrated information model may be significantly bigger than the file size of the source model. For instance, after conversion of IFC to CityGML (Laat and van Berlo 2011) or after converting the IFC model based on EXPRESS to an ifcOWL model (Pauwels and Roxin 2016).
However, the trend in information systems is moving away from file-based information exchange. Thus, file size as challenge may become less relevant or altered.
Semiotic heterogeneity: The meaning of symbols depends on the expectations of the information sender or -receiver associated with these symbols (Ogden and Richards 1923). Shared expectations are essential for a common understanding of the relevant concepts and depend on the previous knowledge provided by the communication participants. In general, this previous knowledge differs between communication participants coming from different domains or disciplines. The resulting heterogeneous interpretation of concepts in formal schema specification of source information models (Beetz 2009;Brodeur 2012) or other messages hinders the successful integration of heterogeneous information models.
Selectivity: Several information integration efforts are designed for specific, artificial application scenarios and are intended to bridge selected heterogeneities (Hijazi and Donaubauer 2017). Additionally, the developers come from a specific discipline, such that their solutions are driven by a specific perspective. These circumstances often result in selective characteristics concerning the information integration solutions.
Lack of expert knowledge: The development of information integration solutions and its usage often requires comprehensive knowledge, e.g. knowledge about the integration subjects from both BIM and GIS and technical knowledge about data modelling and integration (Karan et al. 2016;Liu et al. 2017). However, individual developers and users only seldomly possess this comprehensive knowledge, a fact that results in time intensive training or insufficient solutions.
Automation of matching processes: The information integration procedure needs to be automated to prevent time-expensive and error prone matching tasks carried out by the user. Additionally, the automation is required due to the lack of expert knowledge of the user necessary for successful matching. However, the full automation of the matching process is difficult to achieve (Schneider 2019).
Legal issues: Legal protection concerning the usage of the information is essential for the acceptance of the respective information integration solution for industrial application. In the field of information integration, the source data comes from different stakeholders maintaining the respective usage rights of the data. Especially in distributed environments with a multitude of stakeholders the protection of intellectual property rights is challenging, e.g. moving BIM data to public city databases (Hijazi and Donaubauer 2017).

Object identification:
The information referring to the same realworld object need to be identified to create an alignment between the information models at instance-level. This task is called object identification in the field of database integration (Batini and Scannapieco 2016). The identification of information referring to the same real-world object is complicated, since a clear 1:1 mapping is often not possible (Kolbe and Plümer 2004).
Conflict handling: Instance information referring to the same object may conflict with each other. The conflict handling strategies is a relevant research topic in the field of database integration (Batini and Scannapieco 2016). The conflicts between source information models are non-trivial in some circumstances. For instance, conflict handling strategies may need extensive expert knowledge concerning the integration subject and depend on the intended context. Moreover, the conflicting information may need to be processed first to identify an existent conflict.

Validity of information:
The integrated information needs to be valid in the respective context in which it is used. For instance, the functionalities of software applications may be limited to data related to real-world objects at a specific time or specific reference system. Information deviating from these requirements may be invalid in the context wherein the software products functionalities are embedded.
Assessment of data quality: The value of a decision strongly depends on the quality of the data on which the decision is carried out. The assessment of the data quality is relevant for information integration processes intended to attain a better data quality. Among others, this challenge addresses the question which data requirements exists for the specific use cases.

ASSIGNMENT
The assignment of the challenges to the respective categories means that the challenges rather occur when the information integration effort refers to that category. Here, the aim is to illustrate the importance of a differentiation between heterogeneous information integration efforts. For example, information integration efforts at instance-level may face challenges regarding the model size and information overload, whereas integration efforts at schema level rather deal with semantic and semiotic heterogeneity issues. Object identification and conflict handling primarily occur in integration efforts dealing with several intersecting information models. Consequently, object identification and conflict handling are primarily relevant for the integration methods interlinking and merging. Difference in perspective of information models result in major challenges in terms of bridging semiotic heterogeneities, whereas difference in coverage may emerge in difficulties concerning the deployment of the interface between the information models. Challenges regarding software product interoperability mainly occur in information integration efforts dealing with extension, interlinking, or merging as integration method. Furthermore, interlinking information using Semantic Web Technologies in distributed environments result in challenges regarding data security and access rights. Information integration efforts intended to bring the information into another context may face challenges regarding the validity of the information. And the assessment of the data quality is especially relevant for information integration efforts dealing with multiple information models and intended to improve the respective data quality. In addition, some challenges cannot be assigned to specific categories, but are relevant for all information integration efforts. For instance, the selectivity of the developed solution, the lack of expert knowledge and the need for automation.

DISCUSSION
In this paper, the integration of information models is understood as the creation of a whole by combining or linking interdependent information models. In the field of "BIM-GIS Integration" the major interdependency refers to the heterogeneity of information models. Additional interdependencies, like differences caused by time-dependent changes, are generally not considered in the developed information integration efforts. However, this consideration might be relevant since heterogeneous information models are often not intended to represent the real-world objects at the same time.
The concept of enumerative definitions was followed to provide a meaningful assignment of the conducted literature review. However, this process has turned out difficult in some cases due to the vagueness of the respective subcategories. For instance, the assignment of expert literature to categories referring to the realworld objects is complicated, since a clear understanding of the situation when two objects refer to each other is missing. Similarly, the categorization regarding conceptual differences is too vague to assign expert literature in a clear manner. For instance, a comparison of the granularity level between heterogeneous information models is hardly achievable without further specifications. Also, the differences concerning the coverage are hard to identify, due to the ambiguity of the concepts used in the respective schemata.
The developed categorization of integration methods could be adapted with respect to the integration level since some information integration efforts refer to different categories at instance-and schema-level. For example, the usage of a shared model as intermediate model in the conversion process refer to both categories merging and conversion. An adaption of the categorization approach may assign these combined integration approaches in a clearer manner. Additionally, the advantages and disadvantages of the respective integration methods and information exchange methods need to be analysed in more detail.
The dichotomous categorization of the purpose in terms of data quality and data context could be further subcategorized. For this, the different information requirements addressed by information integration approaches need to be investigated and the different data contexts need to be specified. Furthermore, an investigation of the correlation between the subcategories may be valuable, e.g. the correlation between information and solution characteristics categories may indicate to patterns useful for the understanding of information integration efforts in the field of "BIM-GIS Integration".

CONCLUSION
In summary, there are two major findings: First, there is the need for a common understanding of the concept information integration referenced by the keyword "BIM-GIS Integration". This need is underlined by the cumbersome character of the deduction process of an adequate conceptualization concerning these terms for the context of this paper. Second, there is a need for the differentiation between heterogeneous information integration efforts. This need is illustrated by the developed categories regarding information integration efforts and their assignment to corresponding challenges. In conclusion, a scientific discourse which supplies these needs (e.g. by more precise keywords) would be accompanied by positive effects regarding both communication and understanding of information integration efforts currently referenced by the umbrella term "BIM-GIS Integration".