USING GEOMETRY-BASED METRICS AS PART OF FITNESS-FOR-PURPOSE EVALUATIONS OF 3D CITY MODELS

: Three-dimensional geospatial information is being increasingly used in a range of tasks beyond visualisation. 3D datasets, however, are often being produced without exact speciﬁcations and at mixed levels of geometric complexity. This leads to variations within the models’ geometric and semantic complexity as well as the degree of deviation from the corresponding real world objects. Existing descriptors and measures of 3D data such as CityGML’s level of detail are perhaps only partially sufﬁcient in communicating data quality and ﬁtness-for-purpose. This study investigates whether alternative, automated, geometry-based metrics describing the variation of complexity within 3D datasets could provide additional relevant information as part of a process of ﬁtness-for-purpose evaluation. The metrics include: mean vertex/edge/face counts per building; vertex/face ratio; minimum 2D footprint area and; minimum feature length. Each metric was tested on six 3D city models from international locations. The results show that geometry-based metrics can provide additional information on 3D city models as part of ﬁtness-for-purpose evaluations. The metrics, while they cannot be used in isolation, may provide a complement to enhance existing data descriptors if backed up with local knowledge, where possible.


INTRODUCTION
Three-dimensional geospatial information (3D GI) is being increasingly used in a large range of tasks beyond visualisation, with an expectation of 3D capability both among specialist users and the general public (Ellul and Wong, 2015).3D GI offers additional functionality not available in 2D, in particular when analysing visibility, surface, sub-surface and shadowing (Zlatanova et al., 2002).Other 3D specific applications include volumetric calculations which allow for accurate assessment of building capacity as well as forest size (Rahlf et al., 2014;Vanegas et al., 2012).3D city models initially focused on visualisation and geometry (Batty et al., 2000) rather than on its geometrictopological structure (Gröger and Plümer, 2012).In recent years, governments and councils around the world have been extending their 2D GIS implementations in cities to 3D (Albrecht and Moser, 2010).The availability of open 3D city models has overcome the cost barrier of data, allowing for many new applications that can be supported by this technology.These datasets, however, are created in isolation, by different producers, and may be created to a local specification for a specific purpose.This leads to variations in the models' geometric and semantic complexity, as well as the degree of deviation from the corresponding real world objects (Löwner and Gröger, 2016).Further inconsistencies in these datasets may include: the choice of features modelled; the level of geometric detail features are modelled at; the level of semantics; the inclusion of textures; the choice of representation; the file formats used and the delivery mechanisms to potential users.Although applications such as environmental analysis (Ngo et al., 2014), navigation (Musliman et al., 2006) and cadastre and land management (Jazayeri et al., 2014) increasingly call for standardized 3D models with consistent topology, many visually convincing datasets show weak or invalid geometry (Zhao et al., 2014).In practice, 3D city models focus on the recreation of correct geometry and visual satisfaction, with little to no consideration on attributes or semantics.
As 3D datasets vary intrinsically and extrinsically, there is a need to be able to quantify and describe these datasets to users in order to enable them to make informed choices where multiple 3D datasets exist and for when selecting a dataset for a specific visualisation or analytical task.Existing measures of 3D data such as CityGML's concept of level of detail (LoD) (Kolbe et al., 2005) are perhaps only partially sufficient in fully communicating data quality and fitness-for-purpose as the specification is not unambiguous (Biljecki et al., 2016).This allows for a high freedom of interpretation resulting in potential inconsistencies and misunderstanding.There is therefore a need to explore alternative measures to describe complexity levels, which may allow users to better assess the suitability of data for specific applications.
In this paper, two aspects of the problem are considered.Firstly, alternative, automated and geometry-based metrics to analyse the variation of complexity within 3D city datasets are investigated.As a trusted source of 3D ground-truth data is often not available for 3D city datasets, the descriptive metrics explored in this study aim to operate without reference to validation data.Further, the metrics are intended to be independent of any particular 3D format and should be applicable to any 3D boundary-representation of building models.Secondly, the paper also investigates whether automated metrics can better describe 3D data to users, providing additional information on the dataset and thus allowing for a more effective assessment of fitness-for-purpose.The simplicity of the metrics should ensure ease of understanding for the users.The study also looks to understand if the above metrics are useful in comparing different 3D city models.

BACKGROUND
2.1 3D city models and 3D representations 3D models can be represented in many ways from triangulated meshes to simple extruded polygons to structured boundary 3D models, but there is not a single data structure that works best for all purposes and applications (Ohori et al., 2015;Stoter and Zlatanova, 2003).Where one representation may excel in one aspect such as modelling curves, another may better represent and manage 3D tunnels (Tuan, 2013).For 3D GI, boundary representation (BRep) is the most widespread representation, with many algorithms available for computing physical properties from that representation (Haala et al., 1998).BRep defines spatial objects by their bounding elements such as planar faces, with vertices and edges defined by the intersection of the bounding planes.Different methods can be used to create BRep models, from semiautomated methods such as photogrammetry to manual modelling based on computer-aided design (CAD) (Döllner and Buchholz, 2005).The methodologies vary in their aims where semi-automated methods seek to produce large coverage datasets with least manual effort, CAD-based modelling allows for much finer detail at a smaller coverage.These different methodologies introduce different sources of errors and artefacts into the 3D data.

3D data quality
Data quality is defined by the degree to which a set of inherent characteristics fulfils requirements (ISO, 2005).Specifically, geographic information data quality is expressed by multiple elements including completeness, logical consistency, positional accuracy, thematic accuracy, temporal quality and usability element (ISO, 2013).As yet, no standard approaches to measuring 3D data quality have been developed, although 3D city models are being produced worldwide at an increasing rate (Stoter et al., 2016).
Existing methodologies in evaluating GI quality can be split into two general approaches: extrinsic (comparison with external data); and intrinsic (evaluation derived from the data itself).For extrinsic evaluations, Akca et al. (2010) assessed the geometric accuracy of the model generation process from LiDAR data, comparing the generated models to a validation dataset.Haala and Kada (2010) provides a state of the art review of 3D building reconstruction methods identifying that the development of fully automatic algorithms is still required to overcome the considerable manual operations.Cheng et al. (2015) proposed a framework using a fuzzy realistic index to evaluate the visual and geometric quality of 3D models.Haithcoat et al. (2001) examined geometric fidelity by subdividing the model and verifications data into voxels to calculate omission (missing data) and commission (false positives) within 3D datasets.Krämer et al. (2007) describes a quality model for 3D city models by translating the six criteria for 2D quality measurements: positional accuracy; completeness; semantic accuracy; correctness; temporal conformance and; logical consistency.These measures, while they may apply to 3D in theory, cannot always be generated in practice.For 2D quality assessments, a verification dataset may be available from an existing external source or collected using ground-based survey methods.In comparison, an external 3D dataset may not exist and to capture primary verification data in 3D is arguably more difficult and time consuming.Methods comparing between verification and input 3D datasets to assess measures such as reference system accuracy, positional accuracy and completeness are therefore not always possible in practice.On a more subjective basis, Durupt and Taillandier (2006) provided a visual evaluation of automatic building reconstruction methods, using an operational approach.
For intrinsic evaluations, Wagner et al. (2013a) explored 3D geometric quality and outline several metrics as part of the City-Doctor tool including planarity, self-intersection, surface orientation error and duplicated points.The tool and its validation process provides an error report which is useful for healing invalid geometries but it does not provide a general statement on the grade of compliance with data specifications or usability of the model (Wagner et al., 2013a).Ledoux (2013) presented a prototype methodology to validate individual solids according to international standards for geographic information.The prototype is able to inform the user of the nature of the errors and of their locations, but requires manual effort to modify and correct the geometry.Alam et al. (2013) looked at validation and healing of CityGML while similarly Zhao et al. (2014) developed a repair framework for the geometric repair of CityGML models.

Level of detail, geometric and semantic complexity
Level of detail (LoD) is a term and concept adopted by a wide range of disciplines, from building information modelling to computer graphics, each with its nuances in its definition (Bolpagni, 2016).Within GIS, the concept is most commonly used in 3D city modelling to represent different levels of geometric and semantic complexity.A practical implementation is within the Open Geospatial Consortium's CityGML standard (Kolbe et al., 2005) which uses five LoDs to indicate how much detail should be modelled, ranging from simple 2.5D model of footprints to detailed interiors (further details can be found in Löwner et al. (2013)).It is important to note that the level of detail does not explicitly convey data quality as defined by the ISO (2013).For example, it is possible to have a LoD1 block model which is accurate and a LoD3 model which is of poorer quality (Biljecki et al., 2015).The current measures of level of detail, however, should not be discounted entirely as they are still useful in the wider context of data quality and within fitness-for-purpose evaluations.Descriptors of geometric and semantic complexity can provide general information from which quality-related knowledge may be derived.Users can therefore assess the suitability of a dataset and ascertain if it is able to satisfy the requirements of the user's application e.g.selecting LoD2 or higher if roof geometry is required.
Since its introduction, the concept of LoD has become increasingly inadequate in communicating to users the geometric and semantic qualities of 3D models in an unambiguous way (Biljecki et al., 2014).Shortcomings specific to the CityGML standard include, but are not limited to, the coupling of indoor objects with the highest LoD, lack of multiple indoor LoDs, lack of explicit representation of windows until LoD3 and the freedom of interpretation at each LoD (Löwner and Gröger, 2016).Additionally, many datasets may in fact consist of buildings at different LoD, with modelling focus on important buildings (landmarks) as opposed to suburban areas.It should be noted that efforts are underway in refining and updating the CityGML standard for version 3.0 to overcome these deficiencies (Benner et al., 2013;Machl, 2013;Biljecki et al., 2016;Löwner et al., 2013;Löwner and Gröger, 2016).
In summary, while some 3D data quality assessments may be automated, others may still contain manual elements which may be laborious and time-consuming.Further problems include the fact that detailed reference datasets might not be available yet at a large scale for extrinsic data quality assessment (Elberink and Vosselman, 2011).The remainder of this paper presents alternative, simpler, intrinsic and automated metrics for communicating the usefulness of a 3D dataset.The metadata or descriptors could provide additional and supplementary quality-related information which users could utilise within data selection processes and fitness-for-purpose evaluations.

Data
Six 3D city datasets were selected and presented in this study (Figure 1).Due to the cost prohibitive nature of commercial 3D datasets, the selection criterion was for the data to be freely available.The datasets include: 1) Berlin, Germany (Berlin Busi-ness Location Center, 2016); 2) Frankfurt1 , Germany [city centre only2 ] (Stadtvermessungsamt Frankfurt am Main, 2016); 3) Toronto, Canada (City of Toronto, 2016); 4) Washington D. C., USA (District of Columbia, 2016); 5) Adelaide, Australia (Adelaide City Council, 2016) and; 6) Rotterdam, The Netherlands (Geemente Rotterdam, 2016).While it is recognised that this is not an exhaustive list of 3D datasets available, the datasets were generated using a variety of methodologies (see Table 1 for references) providing a range of potential 3D modelling artefacts to test the proposed metrics.

Metrics
Six metrics are investigated in this study: Mean number of vertices/edges/faces per building (Metrics 1, 2 and 3); Mean number of vertices per face (4); Frequency distribution of 2D footprint area (5) and; Frequency distribution of feature length (6).The metrics selected are intentionally simple to ensure ease of understanding by the user.The first three are simple geometry metrics normalised by the total number of buildings to create three complexity measures.They provide an indication of the detail and complexity of the buildings within the dataset but are strongly dependent on the architecture of the modelled area3 .Mean number of vertices per face metric provide an indication of the efficiency of the vertices in modelling.For example if a representation can define a building with fewer nodes without losing detail, then it is more efficient.While it is relatively easy to understand and could potentially identify representations with a large amount of redundant vertices and collinear points, the metric is dependent on the specific modelling process chosen.Lastly, two minimum size metrics are included.Under a revised LoD specification for 3D building by Biljecki et al. (2016), the inclusion of minimum size was suggested to help users identify the selection criteria for objects to be acquired during modelling.Minimum size could be defined by either the minimum footprint area or minimum feature length.Both variations are implemented and tested in this study.A mean value of each metric was also produced for each dataset to provide a city-scale measure.Frequency distributions of the minimum size metrics were also calculated.

Method
Each dataset was converted from its delivery format and stored in an Oracle Spatial Database 11g using FME 2014 SP1.The storage in a spatial database with a spatial index allowed for efficient interrogation of the geometry at the city scale.It also provided consistency between datasets when querying the geometry.Following conversion, a custom Java parser generated the metrics, storing the results back to the database.For the minimum 2D footprint, only the polygon representing the ground surface was evaluated.Where datasets were structured as CityGML, the elements were differentiated as Roof Surface, Wall Surface or Ground Surface allowing for simple extraction.For the remaining datasets, a parser extracted 2D footprints from the 3D models.Finally, for minimum feature length, a parser decomposed every feature into an edge component to calculate its length and was stored with the building identifier in an output table.The minimum value was then extracted for each building to provide the shortest 3D length of each building.Zhao et al. (2014) identifies four main sources of error within 3D city models: 1) choice of modelling tools; 2) model optimization; 3) conversion; 4) and semantics editing.As the datasets were converted and stored into an Oracle spatial database, it was important to ensure that additional geometry errors were not introduced into the data during the database conversion process.There may be, however, errors inherent in the data derived from any conversion processes carried out by the data producer e.g.conversion from CAD models.The validation of the data conversion process ensures that any subsequent errors discovered within the data is not related to the database conversion.A vertex count, edge count, face count and coordinates comparison was conducted on samples of each of the six datasets, before (in the native format as delivered) and after the conversion process, to ensure there was no loss of information, distortions or artefacts.The process of storing into the Oracle spatial database filters out any invalid geometries, excluding them from analysis.

RESULTS
The focus of all six datasets was predominantly on geometric detail.Where attributes were present, they may have been incomplete or inconsistent.The geometric detail focused on detail of roof surfaces rather than fac ¸ade detail -none of the six datasets sampled contained representations of windows or doors.This may have been due to the use of airborne data acquisition methods as all six datasets employed aerial photography, LiDAR or a combination of both to create the 3D city models.Some inconsistencies were found when processing the geometry.For example, where buildings were composed of multiple polyhedrons using a parent-child identifier relationship (e.g.GML PARENT ID and GML ID), there were instances where the parent identifier was mislabelled or omitted.This lead to omission or commission errors when analysing a building as a single entity.The results of the metrics analysis are presented below (Tables 1 & 2).

DISCUSSION
In this study intrinsic, automated, geometry-based metrics were analysed.This is because: 1) there is a lack of external, groundtruth 3D data; 2) assessing data at a city wide coverage is laborious and time-consuming and; 3) existing 3D datasets focus on geometry, with incomplete or no attributes.
The metrics described in section 3 were developed in absence of any ground-truth data for 3D and relied solely on the interrogation of the geometry.The aim was to investigate whether these automated, geometry-based metrics could provide users with additional relevant information as part of the fitness-for-purpose evaluation.These metrics are city-wide in coverage, rather than focusing on individual geometry and are descriptive.
It is important to note that the metrics are not direct measures of data quality but rather geometry-based characteristics useful for fitness-for-purpose evaluations (similar to level of detail).Secondly, these metrics cannot be viewed in isolation and should be inspected relative to each other.Their utility is enhanced when evaluated in conjunction with local knowledge of the architecture and an understanding of the wider context of the city model.For example, ascertaining the total number of vertices of a 3D city model does not provide much information, but the size of geographic area covered and the total number of buildings provides context required for interpretation.Thirdly, the metrics are normalised by the total number of buildings to a notionally common   scale.The metrics could also be normalised by other geometric attributes such as volume or size of footprint area but these are not investigated in this study.Lastly, the metrics are dependent on local architecture thus comparison between multiple 3D datasets from different locales is not possible.

Simple geometry metrics
The simple geometry metrics (mean number of vertices/edges/faces per building) provide an indication of detail and complexity of the buildings with a dataset.All three measures indicate a similar pattern in the results.Specifically, the mean number of vertices per building measure reveals Adelaide (204.014) and Washington D.C. (84.969) have significantly higher values than any of the other datasets.Referring back to the method of generation in Table 1, the high values are most likely attributed to the choice of CAD-based modelling tools.One exception from this is the Toronto dataset which, while created using CAD-based software, has a relatively low mean number of vertices per building.This is due to the dataset being composed of predominantly LoD1 block buildings.As suggested by Zhao et al. (2014), to produce visually satisfying 3D models with the least effort, interactive modelling tools can be used to shape the appearance of models.The freedom these tools provide, however, may lead to error-prone meshes (Botsch et al., 2010) which can contain excessive and redundant detail.A 3D city model with a high number of vertices per building could affect the subsequent 3D spatial analyses.A trade-off must be made between the adequacy of 3D detail, the visual impact of the resulting 3D dataset, the suitability of the response times and the overall usability of the 3D model (Ellul and Altenbuchner, 2014).For example, within the application of 3D noise mapping, Deng et al. (2016) argues that having more detailed and complex geometry is, in fact, not beneficial.
The mean number of vertices per face metric was intended to provide a measure of efficiency and detail in a model where the lower the ratio, the less efficient the model.For example, it is possible for a building model to have a large amount of vertices, and therefore seemingly more detail, but for these vertices to provide no additional information.The extra vertices could be collinear points which are therefore redundant within the representation.With this measure, Adelaide presents the score (0.619) which could, on first inspection, be a result of the superfluous complexity introduced by CAD modelling tools, indicating a lack of efficiency.However, Washington D.C.'s score (1.596) is in line with other, non-CAD software generated datasets possessing a value between 1.59 and 1.91.This metric is dependent on both the local architecture and the specific modelling process chosen.The metric may be more effective and provide more utility if measured as a frequency distribution rather than as a normalised, single value, city-wide metric.

Minimum size metrics
Two interpretations of minimum size were investigated in this study: minimum footprint area and minimum feature length.Both metrics, however, had issues when calculating a single city-wide value.Errors within the modelling process from small parcels and short edges meant that values generated were not representative e.g.very small <0.001m or null values.The frequency distribution for both measures was therefore investigated.
Minimum footprint area was proposed to provide an indication of the smallest 2D area a dataset was modelled at.Table 2 shows the frequency distribution of 2D footprint area of all six datasets.The metric works well for clean datasets.For example, 99.82% of buildings in the Toronto dataset were above 100m 2 in footprint area.A user could therefore identify and define the minimum size modelled to be 100m 2 .Upon manual inspection of the remaining 208 buildings that were less than 100m 2 , only two buildings (both under 0.005m 2 in area) represented digitisation or modelling errors.Of the six datasets, Rotterdam registered the largest proportion of 1 to 10m 2 buildings (24.24%).These were composed of small buildings or shed-like subsidiary structures, which were attributed with its own unique parent building identifier (Figure 2).These may also be shared features between two separate buildings e.g. a shared entrance.Further inspection into the 44,078 features in the Rotterdam dataset of buildings with footprints less than 10m 2 in area showed that 929 (2.1%) shared a common boundary with two or more buildings with an area 10m 2 or larger.The ambiguity in assigning building parent identifiers for shared features in this instance has rendered the metric less effective.Additional work is required to identify and quantify whether these features are standalone, small, subsidiary structures or misattributed components of a larger building.There is also a need in future standards to explicitly define the consistent handling of shared building parts and the subsequent assignment of the parent building identifiers.This is a practical demonstration of the requirement for semantic checks on the relationship between building and building parts as described by Wagner et al. (2013b).In summary, the metric is useful in revealing the minimum dimensions of a modelled feature if the data is relatively clean and consistent, but by assessing the frequency distribution, it is useful for identifying inconsistencies and errors derived from the modelling process.Table 2 shows the frequency distribution of minimum feature length across the six datasets.In this study, the metric is defined as the shortest edge length of any 3D edge of each building.With the exception of Berlin and Frankfurt, over 70% of the buildings in the other datasets contained at least one edge of between 0 and 1m in length.Further inspection at the frequency distribution between 0 and 1m of the minimum feature length shows that Adelaide (51.9%) and Rotterdam (62.4%) have a very large number of very short edges of up to 20cm long.The sources of these short edges include: the method of building reconstruction (e.g.manual, automatic or semi-automatic); the choice of modelling tools e.g.CAD modelling; derived detail from 2D footprint; straight line representation of curved surfaces or; erroneous short edges (e.g. from collinear points).Regardless of the source, the presence of these short edges vastly increase the computation load required to store, visualise and analyse these datasets.An absolute count of short edges (defined as any edge with a length less than 0.5m) was conducted for every building (Table 2).It can be see that both Adelaide (82%) and Rotterdam (68%) have a large proportion of buildings with at least one short edge.Toronto (18%) and Berlin (20%) contain the fewest buildings with at least one short edge.Deconstructing it further, it shows that 70.3% of buildings in Adelaide possess 11 or more short edges and almost half (46.91%) of buildings in Rotterdam have between two to ten short edges.It must be noted, however, the sources of these short edges are different.For Adelaide, the short edges are derived from the choice of modelling tools as the dataset was created from Autodesk 3ds max models (Figure 3).For Rotterdam, the high frequency of short edges was due to a large number buildings with curves represented as multiple short, straight segments (Figure 4).This is an inherent inadequacy of BRep models.For example, although CityGML contains an abstract class for curves, features composed of multiple curves are not recommended within the modelling guidelines (SIG3D, 2014).Similarly, Oracle Spatial is able to store MUL-TICURVE, but this was not used in order to not distort or alter the original geometry prior to analysis.It is therefore important to consider the local architectural style when choosing the 3D representation used to model the area.
The poor attribution of building parts to buildings and artefacts of the creation process that exist within the datasets biased the metric, making it difficult to use and interpret.It does, however, allow users to identify models which may be overly complex due to a proliferation of short edges within its representation.These redundant short edges renders the datasets larger than necessary, without providing additional detail and therefore less efficient to use.Further work is required to clarify and define minimum feature length as an indication of the lowest level of modelling as it is not possible to calculate a metric retrospectively.Alternative definitions of feature lengths could be considered, such as the diagonal of the minimum bounding rectangle and require further investigation.

Applying the metrics in practice
The metrics can provide potential users with an indication of the complexity and usefulness of the dataset to an extent, but they cannot be viewed in isolation.A certain level of expert and local knowledge, if available, is therefore required on the part of the user to interpret the metrics.The approach in modelling of the buildings also highly impacts the interpretation of these metrics.
Where one data producer may model aggregated buildings as one building, another may model as multiple, individual buildings.The metrics may be useful for comparing between multiple 3D datasets of the same area, but cannot be easily compared to other areas due to architectural variation.Beyond the users, these metrics could allow data producers to compare geometric qualities between different iterations of the same 3D dataset produced.

Recommendations and future work
Several recommendations arise from this study.Firstly, the incorporation of simple geometry measures within metadata could provide better contextual information for potential users when carrying out fitness-for-purpose evaluations.Secondly, explorations into existing 3D city models shows in practice that there is a need for clearer and less ambiguous 3D specifications and detailed clarification in exception cases such as shared building parts.Thirdly, there is a need to consider the impact of the choice of modelling tools on visual satisfaction and the performance of a model.There is a need to quantify and communicate if a 3D model is better suited for visualisation or analysis purposes.
Further work is required in identifying and quantifying the different sources of 3D error utilising a larger sample set.The metrics investigated in this study focused on BRep models.Additional work on the potential application of the metrics to different forms of data such as voxels and point clouds is required.Other geometry measures could also be investigated, such as minimum height, minimum bounding volume, the ratio between roof and ground vertices, and assessing surface normal vectors.Investigating the spatial variation of geometric complexity and of the other metrics may also be of use.Exploring existing algorithms and 3D data quality measures from other fields such as geometry processing could avoid duplication of effort.Testing the usability of the metrics with real users could also be of benefit.

CONCLUSION
This study provides an alternative and automated approach in describing the variation of complexity within 3D city models in the absence of ground-truth data.It demonstrates that a wealth of information can be derived and extracted solely from the geometry, providing additional information on the 3D city models relevant to the users as part of a process of fitness-for-purpose evaluation.These metrics, although they cannot be used in isolation, may provide a complement to enhance existing data descriptors if backed up with local knowledge, where possible.Further work is required on quantifying sources of 3D error and continued improvement in data quality assessment methods.

Figure 2 .
Figure 2. Shared features such as overhangs found in Rotterdam with its own unique parent identifier

Figure 4 .
Figure 4. Curved surfaces represented by multiple short straight segments in Rotterdam

Table 1 .
Summary of the 3D datasets and metrics