Integrated Management of Heterogeneous Geodata with a Hybrid 3d Geoinformation System

The availability of 3D geodata is increasing tremendously. However, there is still a lack of appropriate tools for integrated data management and analysis solutions that can cope with the great diversity of geodata. Thus, our goal is a hybrid 3D geoinformation system which is able to combine and analyse heterogeneous 3D geodata in an efficient and consistent way. Based on the accepted standard ISO 19107, we propose a hybrid data model to overcome structural, geometric and topological data heterogeneity on a conceptual level. Our concept is hybrid with respect to data given in different data models, dimensions and quality levels. Through the explicit modelling of geometric correspondences, multiple representations and data inconsistency can be handled.


INTRODUCTION
These days, we face an abundance of comprehensive geodata representing an immense variety of real world objects.However, due to the multitude of different sensors, algorithms and modelling concepts used for data acquisition and processing, such geodata is highly complex and heterogeneousposing a big challenge when the data has to be evaluated together.Data heterogeneity generally goes back to structural aspects concerning conceptual data modelling on the one hand, and geometric and topological aspects on the other hand.
In terms of structural heterogeneity in conceptual modelling, Bishr (1998) distinguishes between semantic, schema and syntactic heterogeneity (Gröger and Kolbe, 2003).Semantic heterogeneity arises when dissimilar ways of understanding real world phenomena lead to different object abstractions.A typical example is the interpretation of streets.While they are treated as areal objects in the real estate cadastre, routing algorithms refer to line descriptions.Schema heterogeneity, however, denotes structural differences in the modelling concept.For instance, the same object property could be modelled as a class in concept A, as an attribute in concept B, while being neglected in concept C. Syntactic heterogeneity is related to different geometric data models.The 2D world is mainly based on raster and vector representations, whereas much more modelling concepts are in use for data of higher dimensions.Typical data models for 2.5D surfaces are grids or TINs; 3D solids can be described by voxel and boundary representations (BRep) as well as by mathematical definitions like parametric instancing or halfspace modelling.Constructive solid geometry (CSG) and cell decomposition specify different modelling strategies for generating complex 3D objects through the combination of several basic 3D primitives, which can be represented in any of the aforementioned data models for solids.
Geometric and topological heterogeneity is related to various aspects.These can be different reference systems but also properties of data quality.Due to the sensors used for data collection, their configuration during measurement and possibly applied post-processing steps, data sets can differ significantly in accuracy, resolution, density and completeness.Inconsistencies may occur when, for instance, the integration of various data sets lead to spatial intersections or interpenetrations of different geoobjects, or when there remain gaps between geometries which in fact should be adjacent.Geometric and topological problems can also result from combining data sets of different dimensions as for example 3D building models and a 2.5D digital elevation model.Inconsistencies will appear as floating or sinking buildings.
The problems mentioned so far are based on the assumption that each geoobject is given through only one spatial representation.Additional difficulties are caused when a single object is multiply represented.Beside showing the object in different levels of detail, multiple representations can be inhomogeneous with respect to all of the aforementioned types of heterogeneity.
Considering all these aspects, a meaningful and efficient usage of geodata necessarily requires geoinformation systems which allow for an integrated management and analysis of data given in various geometric data models, dimensions and quality levels.The importance and urgency of being able to handle heterogeneous data is not least demonstrated by the recently established INSPIRE (Infrastructure for Spatial Information in Europe) directive which drives the development of infrastructures for an interoperable exchange of geodata.
The contribution of this paper to the highly topical search for hybrid concepts and methods for handling geodata is an allencompassing modelling concept which extends an existing ISO standard in such a way that it is hybrid in the sense of data model, dimension and quality.Our data model is designed to be an appropriate basis for a powerful and flexible 3D geoinformation system: Powerful since it provides the basis for efficient consistency analyses and updating processes, and flexible since it supports multiple representations and is able to cope with structural as well as geometric and topological data heterogeneity.Visualization aspects and the modelling of semantics are not taken into account here.
The paper is organized as follows: After an overview of related work in chapter 2, our hybrid data model will be presented in chapter 3. Here, modelling principles to overcome both structural and geometric plus topological data heterogeneity will be introduced.Finally, chapter 4 will conclude the paper.

RELATED WORK
Since data heterogeneity is a complex and multifaceted topic, approaches dealing with inhomogeneous geodata usually focus on certain subproblems.They try to overcome either structural heterogeneity covering semantic, schema or syntactic issues, or geometric and topological heterogeneity which is mainly related to data quality and different levels of detail.

Work on structural heterogeneity
Concerning structural aspects, first investigations on hybrid data models and analysis methods go back to the 1980's, however, covering solely the 2D world by focusing on raster and 2D vector data (Fritsch, 1988).An integrated view of hybrid 3D data has only been a topic of research for a few years.A step in this direction is taken by Dakowicz and Gold (2010) who go beyond pure 2D representations by suggesting a unified spatial model for 2D and 2.5D data.They convert points, lines, polygons and surfaces into separate Voronoi Diagrams which can be integrated afterwards.Existing approaches which are able to handle also 3D data are generally tailored to specific applications and, thus, just address subproblems as for example the combination of 2D and 3D building data (Inhye et al., 2007), the merge of TINs and grids for the representation of digital elevation models (Proctor and Gerber, 2004), or the handling of CSG-and BRep-models in the CAD world (Stekolschik, 2007).Lee and Zlatanova (2008) propose a 3D data model especially suited for emergency response.Here, neighbourhood relations are explicitly modelled through graph models allowing for efficient routing algorithms; the geometric part of the data model is limited to BRep-representations, though.The same restriction holds for the slice representation which is introduced by Chen and Schneider (2009) as a general data representation method for 3D spatial data.
The approaches mentioned so far address structural data heterogeneity with respect to very specific application scenarios and, thus, are usually not suitable for general use.In principle, standards are indispensable when interoperability problems have to be avoided.The Open Geospatial Consortium Inc. (OGC) is one of the driving forces for the development of standards.OpenGIS is the brand name for standardization processes under the umbrella of OGC.The OGC closely cooperates with the International Organization for Standardization (ISO).A lot of OpenGIS specifications have already become an ISO standard.So it is the case for the OGC Topic 1 "Feature geometry", whose specifications and concepts can also be found in the ISO 19107 Spatial Scheme standard.Focusing on the description of vector data only, ISO 19107 comprises geometric and topological modelling concepts for 3D objects (Herring, 2001).Since only boundary representations are supported, syntactic data heterogeneity remains a problem (Gröger and Kolbe, 2003).
The Special Interest Group 3D (SIG 3D) -a working group within the German initiative GDI NRW for the development of infrastructures for geodata -deals with an interdisciplinary definition of 3D city models.The proposed specification for city models, CityGML, is based on ISO 19107 (Kolbe, et al., 2005).CityGML means a considerable advance to the interoperability of 3D city models.Nevertheless, a lossless integration of data which follows the CSG modelling approach still is not possible since CSG concepts are not supported.Nor is it possible to integrate parametric instancing as it is often used for modelling frequently occurring similar objects.
Seen from a conceptual point of view, data integration is feasible without interoperability problems when data sets follow the same modelling standard.However, this only holds true if, additionally, the data is consistent on the object level in terms of geometric and topological aspects.In practice, spatial data from different data sets covering the same region cannot be expected to be consistent as to accuracy, completeness, level of detail etc.

Work on geometric and topological heterogeneity
Geometric and topological heterogeneity of geodata inevitably leads to inconsistencies in merged data sets.Consistency analyses of 3D geodata is a complex task.Gröger and Plümer (2011) approach the problem by defining the consistency of a 3D city model through a modularly designed axiomatic characterization for topological components and their aggregations.For this purpose, the city model is topologically interpreted as a complete and unique 3D tessellation where each geometric object is represented exactly once.Multiple representations are not supported.However, detecting and managing multiply represented objects plays an important role in geoinformation systems, especially when different data sets are to be combined.In the 2D world, a number of approaches have been developed each of them focussing on specific data types.For instance, Walter (1997) proposed a method for the matching of street data from different sources.Based on this, Volz and Walter (2004) realized the integration of multiply represented 2D vector data on the schema level, and Chen and Walter (2010) presented a solution for the automatic quality assessment of such data.While the range of approaches for identifying and processing multiply represented 2D geodata is wide, the situation is different for 3D data.Although first ideas for analysing the consistency of selected 3D geometries have been presented over recent years -for example, Peter (2009) compares geometric properties of planar 3D faces to estimate the consistency of different building representations -there is still a considerable need for research in this area.

Consequences for our work
The review of existing approaches dealing with inhomogeneous geodata reveals a number of yet unsolved challenges on the way towards full data interoperability.The vast majority of approaches present application specific solutions for a rather narrow range of different data types, data models or quality levels; an overall modelling concept for arbitrary geodata is still missing.Additional problems and limitations result from the separate consideration of structural heterogeneity on the one side and geometric and topological heterogeneity on the other side.For example, while CityGML ensures interoperability on a semantic and syntactic level, the explicit treatment of geometric and topological heterogeneity is neglected; consistency analyses are not supported.However, a full interoperability and, no less important, a sustainable management of geodata, which is a basic requirement for efficient analyses and updating processes, necessarily demands for an integrated view on all heterogeneity aspects -structural as well as geometric and topological ones.

HYBRID DATA MODEL
We introduce an application-independent modelling concept which is hybrid in the sense of structural and geometric plus topological aspects: The data model developed for handling structural data heterogeneity will be described in section 3.1; hybrid modelling strategies to cope with geometric and topological data heterogeneity will be subject of section 3.2.

Approach to overcome structural heterogeneity
In order to overcome syntactical heterogeneity, we base our data model on fundamental modelling elements which are part of the most relevant existing geometric data models.We introduce the term hybrid core to denote such common modelling elements.It appears that a general hybrid core which is valid for all data models does not exist.To prove, it is sufficient to compare the 2D vector format with the 2D raster representation.2D vector data is modelled through points, lines and faces.Since lines and faces, in turn, are described by sequences of points, the point turns out to be the basic modelling element of 2D vector data.The existence of a hybrid core for vector and raster data would implicate the point to be a basic modelling element of raster data, too.However, points cannot be expressed in the raster format in purely geometric terms.The explicit semantic modelling as point object is additionally required because -as a consequence from the approximating character of raster data -a raster cell could also represent a short line or a small surface.
Since a general hybrid core is not available, we create an artificial one based on the working hypothesis which that all modelling types considered so far (e.g.vector, raster, TIN, grid, voxel, cell decomposition, CSG, etc.) can be transferred to BRep.By internally creating boundary representations for all data sets, even syntactically inhomogeneous geodata can be reduced to a hybrid core comprising points, lines, surfaces and solids.In the case of raster and voxel data, where each 2D or 3D cell then is described by its bounding lines or surfaces respectively, this modelling concept is of course not efficient.However, according to fast advances in the development of high-speed processors and parallel computing, it seems reasonable to ignore performance issues for now.Efficient access structures can be added to the model at a later stage.
In order to ensure as much interoperability as possible, we build our modelling concept on the ISO 19107 standard.ISO 19107 is a widely accepted international standard for the modelling of geometric and topological aspects of a so-called feature, which denotes an abstraction of a real world phenomenon (Andrae, 2009).Based on BRep, it is appropriate to describe 2D and 3D vector data as well as TINs and grids.We propose several standard compatible extensions which open the standard to further geometric representations.Here, the focus is on approximating data models like raster and voxel, and on the CSG modelling approach.
Figure 1 shows our data model in UML notation; explanations will be given in the following sections: Essential modelling principles of ISO 19107, the basis of our data model, will be described in section 3.1.1(Figure 1 presents corresponding object classes in light grey).The standard compatible extensions for raster and voxel data will be given in section 3.1.2(highlighted in orange (horizontally hatched)), while section 3.1.3will show how the CSG concept (also highlighted in orange (horizontally hatched)) can be integrated in ISO 19107.By means of the object classes and corresponding associations coloured in green (diagonally hatched), Figure 1 illustrates how various existing geometric data representations finally can be expressed by our data model.

Basic modelling principles
ISO 19107 defines GM_Object as a base class for the geometric properties of all geoobjects.An instance of GM_Object is either a GM_Primitive, a GM_Aggregate or a GM_Complex.Specializations of GM_Primitive are the classes GM_Point, GM_Curve, GM_Surface and GM_Solid.These geometric primitives cannot be divided into further primitives and, thus, represent basic elements.Instances of the class GM_Aggregate are unstructured collections of geometries free of any topological restrictions.Aggregates whose components all belong to the same primitive type are elements of the class GM_MultiPrimitive.
In contrast to GM_Aggregate, GM_Complex offers an opportunity to combine geometric elements in a structured way.Topological constraints ensure these elements to be disjoint and not self-intersecting; they are allowed to touch each other, though.A complex belongs to the class GM_Composite if the following additional conditions are fulfilled: 1) all components of the complex are of the same primitive type; 2) the complex is isomorphic to a primitive.Important specializations of GM_Composite are GM_CompositeCurve, GM_CompositeSurface and GM_CompositeSolid.
As mentioned above, ISO 19107 additionally allows for the explicit modelling of a geoobject's topological properties by separate classes.To simplify matters for now, we do without an explicit topological modelling.Topological properties can be derived anyway when geoobjects are modelled as instances of the class GM_Complex.A geometric complex describes topology implicitly since ISO 19107 defines that -in contrast to primitives and aggregates which represent open sets -a complex contains its components plus the boundary of each component.

Extensions for raster and voxel data
According to our working hypothesis, a raster representation of a geoobject can be interpreted as a composition of single surface elements, in which each surface element corresponds to one pixel and is described by its bounding lines.Figure 2 shows an exemplary 2D geoobject in both raster (Figure 2a) and boundary representation (Figure 2b).In order to emphasize the different character of these two concepts, pixels are illustrated in black, surface elements in grey with black boundaries.Due to the properties and topological relations of raster cells (not selfintersecting, disjoint), such a composition of surface elements meets the requirements of a GM_Complex.But, modelling a raster object as a general complex means losing knowledge about important geometric properties since a complex does not know about its components' primitive types: ISO 19107 does not specify or restrict which primitive types may occur in a complex; even a mixture of dissimilar types is allowed.
Modelling a raster object instead as an instance of GM_Com-positeSurface, which is a specialization of GM_Composite and, thus, also of GM_Complex, would preserve the knowledge about occurring primitive types.However, as will be shown by the examples in Figure 2c, GM_CompositeSurface cannot express raster objects of arbitrary shape.The reason is that a composite is defined to be isomorphic to a primitive; consequently, a composite surface -here, the union of various raster cells -has to be isomorphic to a single surface primitive.As ISO 19107 requires a surface primitive to be simple, i.e. free of self-intersections and self-touches, only those raster objects can be modelled as a valid composite surface whose raster cells each have at least one edge in common with another raster cell.While this is true for Figure 2c(1), raster objects similar to the example in Figure 2c(2) cannot be modelled as composite since the outer boundary of the merged cells touches itself.
To overcome this problem, the new class GM_ComplexComposite is introduced as a specialization of GM_Complex.An instance of this class is a complex of several composites which can be of different composite types.Restrictions forcing these composites to be of identical primitive type are realized through the specializations GM_ComplexCompositePoint, GM_ComplexCompositeCurve, GM_ComplexCompositeSurface and GM_ComplexComposite-Solid. Based on this extension to the data model, it is now possible to model raster objects of arbitrary shape.The class appropriate for this purpose, GM_ComplexCompositeSurface, even allows for the modelling of completely unconnected raster cells or raster configurations in which cells are connected through just a corner as it is the case in Figure 2c(2).As illustrated in Figure 2d by means of different colours, parts of the object which are isomorphic to a single surface are modelled as instances of GM_CompositeSurface; all together, they can then be interpreted as a complex of three composite surfaces, i.e., as an instance of the class GM_ComplexCompositeSurface.Extensions for the modelling of voxel representations follow analogous considerations.The new object class introduced for this purpose is called GM_ComplexCompositeSolid.

Extensions for CSG data
In principle, CSG data can be converted into BRep by determining the visible bounding faces.Doing so, however, implies the loss of information on the construction process and geometric conditions of the CSG object (Gröger et al., 2005).Such information can be relevant for updating purposes.
We integrate the CSG concept in the data model through the new object class GM_CSGObject.Derived from the aggregate GM_MultiSolid, this class allows its components to overlap and penetrate each other, which is a characteristic property of CSG objects.By means of the so-called CSG node, realized through the class GM_CSGNode, the hierarchical structure of the CSG construction process can be modelled.GM_CSGNode serves as a base class to define transformations, Boolean operations and CSG solids, the constructive elements of a CSG object.A Boolean operation, for example, refers to two nodes to which it is applied.Transformations are modelled accordingly.A CSG solid refers to an instance of GM_CompositeSolid, which ensures that the solid's boundary is a part of the object.
Our object oriented way of modelling CSG objects makes it possible to completely hide their constructive design from the rest of the standard.Special analysis methods for CSG objects can be introduced without changing the standard.

Approach to overcome geometric and topological heterogeneity
The data model proposed in section 3.1 can cope with structural heterogeneity; data of different dimensions and geometric data representations can be handled, analyzed and visualized together.An arbitrary geoobject, which is called a "Feature" in our data model, can be realized through one or more representations, each of them modelled as an instance of GM_Object.These instances actually do not need to cover the geoobject completely, but instead can also describe only parts of the object.Thus, on the one hand, our modelling concept provides the possibility to manage multiply represented geoobjects.On the other hand, it is also feasible to combine various object parts to one geoobject, even if these object parts stem from very different geometric representations (e.g. from a TIN mesh and a voxel representation).
However, an efficient usage, analysis and interpretation of the data is only possible if geometric equivalences between different object representations are known, i.e., if it is known which geometry of one representation corresponds to which geometry of another representation of the same geoobject.In the following, we will denote such geometric correspondences between different object representations as hybrid identities.
Assuming an ideal world, in which coordinates of corresponding object representations coincide exactly, hybrid identities are given implicitly through incident geometries.As an example, Figure 3a depicts several representations of a simple building: a 3D vector representation of the building's solid, the 2D vector outline, a raster representation of the building's footprint, and a 3D point cloud observed at one building face.Since the boundaries of these representations exactly match with each other, corresponding geometries can automatically be derived by means of geometric comparisons.Such an ideal situation illustrated in Figure 3a is a special case which can only occur as result of specific conversions or when one representation has been created based on another (e.g. a 3D solid through extruding a 2D outline).In practice, we usually face geodata which is geometrically and topologically heterogeneous due to inaccuracies, generalization processes or incomplete data acquisition.As a consequence, multiple object representations derived thereof show significant discrepancies between corresponding geometries (Figure 3b).Thus, knowledge about hybrid identities is not given implicitly any more, but has to be added explicitly instead.Details on modelling aspects and the possible usage of hybrid identities are described in the following sections 3.2.1 and 3.2.2.

Data model for hybrid identities
Figure 4 shows the concept we developed for the explicit modelling of hybrid identities.The concept goes beyond the modelling of purely geometric aspects since knowledge about correspondences and relations between different object representations is introduced.The class HybridIdentity is used for managing hybrid identities.Each hybrid identity refers to at least two mutually corresponding structures modelled as instances of the class HybridElement.Depending on whether such a hybrid element stands for a single primitive or is a collection of several primitives, it can be a hybrid primitive, a hybrid complex or a hybrid aggregate.The way in which several hybrid primitives are combined to a hybrid complex or aggregate follows the basic modelling principles as proposed in section 3.1.1.In order to avoid redundancy, a hybrid primitive does not contain an explicit geometric description but refers to an existing instance of the class GM_Primitive.Conversely, an instance of GM_Object refers to all hybrid identities in which it is involved.

Potential usage of hybrid identities
The data model for hybrid identities is designed to offer as much flexibility as possible.Being modelled independently of each other, hybrid identities can be defined for either a whole object or components of it.Additionally, one and the same object or object part can belong to several hybrid identities.Based on the example of a multiply represented 2D line object, Figure 5 demonstrates a small selection of the many possibilities to define hybrid identities.Figure 5a shows the two representations available for the 2D line object.The linear one (in the following referred to as rep_A) stems from a 2D vector representation and is modelled as an instance of GM_Complex, here, consisting of a single line and its boundary.The areal one (rep_B) originates from raster data and is given as an instance of GM_ComplexCompositeSurface.Possible hybrid identities can, for instance, be defined for the following geometries: the line of rep_A and a subset of the surface patches of rep_B (Figure 5b); the line of rep_A and a sequence of lines bounding the surface patches of rep_B (Figure 5c); the points bounding the line of rep_A and single surface patches of rep_B (Figure 5d); the points bounding the line of rep_A and single endpoints of lines which bound the surface patches of rep_B (Figure 5e).The definition of hybrid identities neither resolves multiple representations nor geometric and topological data heterogeneity but makes them manageable.The explicit modelling of correspondences between various object representations provides the basis for consistency evaluations.Respective analyses, of course, require a detailed definition of the term consistency.However, the definition and interpretation of data consistency is not fully application-independent, and, thus, is beyond the scope of this paper.

CONCLUSIONS AND OUTLOOK
We proposed a data model which is meant to provide an application-independent conceptual basis for smart geoinformation systems.The data model is hybrid in the sense of structural and geometric aspects.Through targeted extensions of an existing ISO standard, our concept is able to bridge the gap between 2D, 2.5D and 3D data, and break down barriers between various modelling strategies.The consideration of geometric and topological heterogeneity is realized on the conceptual level: So-called hybrid identities can be defined for various objects or object parts no matter if they are geometrically and topologically consistent to each other or not.The explicit modelling of such geometric correspondences allows not only for the connection of objects or object parts given in different types, geometric data models, dimensions and quality levels; it also supports consistency analyses and updating measures which is an important aspect considering the frequently occurring changes in geodata.The system supports multi-representations which can be based on either the same or differing data models.Additionally, it is also possible to model parts of a single object using different modelling concepts.While, for example, the main body of a building can efficiently be represented by cell decomposition, decorative elements such as 2.5D reliefs could be added as fine surface meshes.
In future work, we will evaluate the efficiency and the potential of our hybrid modelling concept based on exemplary application scenarios.One application might be mapping and integrating multiply represented inconsistent building data into our hybrid data concept, and modelling hybrid identities for corresponding geometries.Another scenario could be the connection of disjoint or only partially overlapping data setsas for example vector representations of street data and raster images of evacuation plans representing the interior of buildings -as basis for a combined outdoor-indoor-navigation.
Through the integrated evaluation of geodata from different sources covering different aspects of real world objects, we expect a deeper insight in geometric but also semantic relations.Explicitly defined hybrid identities constitute links between various data sets, and, thus, provide a basis for the inference and comprehension of higher context.

Figure 3 .
Figure 3. Multiple representations of a building in an ideal consistent and error-free world (a), and in the real world (b).

Figure 4 .
Figure 4. Data model for hybrid identities.
Figure 5. (a) Raster and vector representation of a line object, (b)-(e) exemplary definitions of correspondences (red, bold).