AUTOMATED CLASSIFICATION OF HERITAGE BUILDINGS FOR AS-BUILT BIM USING MACHINE LEARNING TECHNIQUES

Semantically rich three dimensional models such as Building Information Models (BIMs) are increasingly used in digital heritage. They provide the required information to varying stakeholders during the different stages of the historic buildings life cyle which is crucial in the conservation process. The creation of as-built BIM models is based on point cloud data. However, manually interpreting this data is labour intensive and often leads to misinterpretations. By automatically classifying the point cloud, the information can be proccesed more effeciently. A key aspect in this automated scan-to-BIM process is the classification of building objects. In this research we look to automatically recognise elements in existing buildings to create compact semantic information models. Our algorithm efficiently extracts the main structural components such as floors, ceilings, roofs, walls and beams despite the presence of significant clutter and occlusions. More specifically, Support Vector Machines (SVM) are proposed for the classification. The algorithm is evaluated using real data of a variety of existing buildings. The results prove that the used classifier recognizes the objects with both high precision and recall. As a result, entire data sets are reliably labelled at once. The approach enables experts to better document and process heritage assets.


INTRODUCTION
Data management is becoming increasingly important in the heritage industry.Stakeholders are adopting intelligent data models such as Buildings Information Models (BIM) to better control their data.These intelligent databases centralise the immense amount of information about a building at the varying stages of the conservation process.By combining metric and non-metric information in a specific data structure, experts can better manage, analyse and diagnose the asset (Giudice and Osello, 2013).Furthermore, the centralised approach copes with the problem of data heterogeneity.
Where BIM was initially engaged for new structures, the industry and the heritage sector now look to implement the technology for existing buildings for the purposes of heritage documentation, maintenance, quality control, etc. (Volk et al., 2014, Pauwels et al., 2013, Simeone et al., 2014).These models reflect the state of the building in its current state (as-built condition).It serves as a spatial database where all the available documentation of the structure can be stored and analysed by the stakeholders.In contrast to traditional as-design models, these as-built BIM models are based on existing documentation.This is problematic as the information of heritage buildings is often sparse, outdated or nonexisting.
The production of as-built BIM models involves the acquisition of the geometry of the structure and the reconstruction of the BIM model based on point cloud data (Garagnani and Manferdini, 2013).Currently, this process is manual which is labour intensive and prone to misinterpretations.In order to create BIM's more efficiently, we look to automate this process.A key step in the automated workflow is the identification of structural elements such as floors, ceilings, roofs, walls and beams (Bassier et al., 2016).This field of research is typically referred to as Semantic labelling.
In this work we propose an automated method for the classification of structural elements to aid the production of as-built BIM models.More specifically, we look to identify structural objects in existing buildings (Fig. 2).These objects typically have varying characteristics, are surrounded by clutter and are partially occluded (Tang et al., 2010, Xiong andHuber, 2010).The scope of this research is focussed on the processing of a wide variety of buildings in varying conditions.Additionally, our method is applicable to any point cloud data.
The remainder of this work is structured as follows.The background is presented in Section 2. In Section 3 the related work is discussed.In Section 4 the methodology is presented.The test design and experimental results are proposed in Section 5. Finally, the conclusions are presented in Section 6.

BACKGROUND
Laser scanning and photogrammetry are becoming increasingly widespread in the recording of cultural heritage sites (Boehler and Marbs, 2004).Where previously only select sparse data was available, current systems are able to measure or compute hundreds of thousands of points per second grouped in point clouds.A popular system for building surveying is a Terrestrial Laser Scanner (TLS).This static scanning device is placed on a surveying tripod on multiple locations and captures up to a million points per second.
The point cloud data serves as a basis for the reconstruction of the BIM geometry.The user manually identifies the elements and fits primitives to the point cloud.The fitted objects are predefined in libraries or created by the user (Quattrini et al., 2015 Baik et al., 2014).For instance, Dore and Murphy presented a heritage specific component library named Heritage Building Information Modeling (HBIM) (Dore et al., 2015, Murphy et al., 2013).As the manual effort is extensive, automated approaches are proposed for the BIM creation of existing buildings.In general, this Scan-to-BIM process consists of the following steps.
First, the points are clustered into groups using statistical procedures (Anand et al., 2012).The grouped points are replaced by primitives for computational efficiency.Typically planar primitives are used as a basis for further reconstruction.Second, the candidate data is partitioned into different classes.Reasoning frameworks exploiting geometric and contextual information are employed to identify the objects of interest.Finally, the labelled data set is used as a basis for class-specific reconstruction algorithms.In this research, an automated approach is proposed for the semantic labelling of the objects using machine learning techniques.
Semantic labelling is a classification task that identifies the class of new observations given a set of values known as features (Bishop, 2006, Alpaydin, 2010).These values encode distinct local and contextual information about the observation.Features are intuitively defined or extracted from training data sets.Examples of local geometric features are the surface area, dimensions and orientation.Geometric contextual features may describe the similarity, coplanarity and orthogonality.The set of feature values of each primitive is grouped in a feature vector.
The feature vectors are processed by a reasoning model to identify the class of the primitives.These functions are referred to as classifiers.Both heuristics and machine learning algorithms are proposed for classification.Heuristic models are based on user defined rules in a hierarchy.These rules require no training of the model parameters as they are intuitively set.While very ef-ficient, heuristics are inherently biased and case-specific.Alternatively, more complex models are employed such as Discriminant Analysis (DA), Decision Trees, Ensemble Classifiers, Support vector Machines (SVM), Neural networks, Random Fields, etc. (Bishop, 2006, Alpaydin, 2010, Sutton and Mccallum, 2011, Koller and Friedman, 2009, Criminisi and Shotton, 2013, Domke, 2013).These algorithms train their model parameters based on training data.While the non-linear functions and probabilistic nature of some of these models allow for a more accurate approximation of the reality, the computational effort is challenging.Also, these models typically are a black box solution that leaves little room for user control.Additionally, extensive training data is required for these methods to work adequately.

RELATED WORK
Classification is a major field of research in the area of pattern recognition (Bishop, 2006).It is considered an instance of supervised learning with applications such as text processing, image identification and insurance.Several researchers have proposed promising results for the labelling of built environments encoding user based rules (Bassier et al., 2016, Pu and Vosselman, 2009, Lin et al., 2013).However, to classify a wider variety of buildings, more complex models are proposed.For instance, probabilistic graphical models are considered (Koller andFriedman, 2009, Sutton andMccallum, 2011).By connecting several nodes into a graph, probabilistic reasoning allows the likelihood maximisation of the labels of the nodes.Markov Random Fields (MRF) and Conditional Random Fields (CRF) were proposed for the classification of indoor scenes (Koppula et al., 2013, Anand et al., 2012, Gerke and Xiao, 2014, Niemeyer et al., 2014).Similar approaches were utilized for close range terrestrial classification (Lim and Suter, 2009).Ensemble classifiers are also considered for the identification of building elements (Xiong et al., 2013).The input data differs between varying approaches.Some researchers directly segment the point cloud (Weinmann et al., 2015) while others prefer to work with primitives such as planar meshes.
Although working directly on the point clouds can be more accurate, it also introduces a higher computational cost and uncertainty into the process.In our work, we use primitives as a basis because of the computational advantages.
Prior knowledge has also been considered for the semantic labelling of buildings.Existing plans or models significantly reduce the search area for inliers and aid in the scene understanding.Several methods have been proposed for model matching between the as-built and as-designed conditions (Yue et al., 2007, Bosche andHaas, 2008).Our approach does not rely on prior knowledge as it is not always available and we are formulating a general approach to process any building.

METHODOLOGY
In this paper, a reasoning frame work is presented that identifies the structural elements in buildings.An automated feature extraction algorithm combined with an SVM classifier is proposed for the classification of the surfaces.The algorithm takes any set of planar meshes and outputs the classified objects.The algorithm and the extracted features are discussed in the following paragraphs.

Preprocessing
The scope of this work is the processing of any point cloud data of a building.Therefore, the input data of the algorithm is independent of specific sensors or algorithms.Only the metric information is processed as it is the basis of any point cloud.During the preprocessing step, the data is segmented and replaced by primitives.These are well understood problems and commercial software is available.In this research, the Pointfuse engine of Arithmetica is utilised (Arithmetica, 2015).After loading the points into the software, planar meshes are fit incrementally through each point cluster with similar normals.A 1cm sampling resolution is used for the primitive fitting.As a result, planar triangular meshes are created.

Feature Extraction
The various features are computed from the planar meshes.They encode information that allows the algorithm to classify the surfaces even in cluttered environments.Therefore, the features should be both distinct and robust.Both local and contextual information is exploited.The former encodes the objects geometric information while the latter encodes both associative and non-associative information in relation to other surfaces.In our approach, the contextual information is computed for nearby surfaces and for the data set as a whole.By integrating global information, general patterns are better detected.Table 1 summarizes the different types and the number of features used in the experiments.
The local features are robust descriptors to determine the type of class of the object.The Surface Area is a good indicator for the separation of clutter and structural elements.The Orientation of the surface indicates if the surface belongs to a vertical or horizontal class.For instance, a large horizontal surface is more likely to be a floor, ceiling or roof.The Dimensions give more information about the shape of the object.
The contextual features are more refined descriptors to recognize the specific class of the object.The context differs for the varying feature descriptors.Both the immediate neighbourhood of the observed surface is exploited as well as the entire data set.Additionally, several features employ only surfaces with a specific orientation or size as context.These surfaces act as a reference for the spatial analysis of the observed surfaces.For instance, the Normal Similarity feature dij = ni • n large computes the difference between the normal vector of the surface ni and the normals of the nearby large surfaces n large .Coplanarity is defined as where |( ci − cj) • ni| is the distance between centroids along the normal to the neighbouring surfaces (Eq.1).A default angle of 30 • is specified as the threshold.Parallellity is defined as the distance along the normal to the closest parallel surface.For this feature, the reference surfaces are conditioned to be directly in front of or behind of the observed surface.geometry detailing such as niches.The Topology is encoded by the percentage of occurrence of a certain relation.Five relations are evaluated: The percentage of the area of the observed surface that is located directly underneath a large horizontal surface within a threshold (1), the percentage of the area of the observed surface that is located directly above a large horizontal surface within a threshold(2), the percentage of the area of the observed surface that has nothing above it (3), the percentage of the area of the observed surface that is in between two large vertical surfaces (4) and the presence of noise surfaces within line of sight directly above the observed surface (5).As a result, all the computed features values are bundled in feature vectors and are passed to the classifier model for further processing.

Model formulation
Given the feature vectors x1→t = {x1, . . ., xn}, one of k labels is predicted for the surfaces t.In this research Support Vector Machines are proposed to classify the observed surfaces (Bishop, 2006, Brownlee, 2015).These non-probabilistic functions separate the feature space in two by defining a hyperplane given the feature values.The geometrical distance of xi to the hyperplane is given by where ø(xi) denotes a fixed feature-space transformation (kernel function), b the bias and w = {w1, . . ., wn} is the weight vector for each feature (Eq.2).If In our research, we employ a quadratic kernel function as it is more suitable to deal with the complexity of structural element parameters.
The decision function is computed by maximising the margin or distance from the hyperplane to the closest feature vectors x.This is performed by optimizing the function for w and b.
To satisfy the separation criteria, the solution is constrained on y(w T ø(xi) + b) − 1 ≥ 0. The maximum margin is then given by equation 3.
The function is be solved by incorporating the constraints in the optimization.Lagrange multipliers are used to compute the solution (Bishop, 2006).New surfaces are classified by evaluating the signed distance function y(x, w, b) from the hyperplane to the surface feature vector.As Support Vector Machines are fundamentally two-class classifiers, we employ multiple SVM's in a one-versus-one configuration.k(k − 1)/2 different two-class functions are computed on all possible pairs of classes.New surfaces are labelled according to the number of votes of the combined SVM's.This is a very efficient approach as long as the number of labels is fairly low.

Learning
The maximum-margin hyperplane is optimized using a large set of known data points.A downside of Support Vector Machines is their tendency to overfit the hyperplane.Therefore, a regularization parameter (λ) is introduced that penalizes large parameters (Criminisi and Shotton, 2013).This keeps the model from relying too much on individual data points.During the training, cross-validation is employed to enhance the model performance.
The data is partitioned into K-folds.Each partition is consecutively withheld as the other partitions are used for training.The final optimized maximum-margin hyperplane is given by the averaged model parameters.

User interface
The feature extraction and prediction algorithm are implemented in the Rhinoceros plug-in Grasshopper.This intuitive procedural programming platform allows for flexible data processing and evaluation.Additionally, the classified surfaces are exported to the Rhinoceros model space for validation and further processing.

EXPERIMENTS
The algorithm was trained and tested on a variety of existing buildings including houses, offices, industrial buildings, churches, etc. 10 structures representing different types of heritage buildings were selected for the evaluation (Fig. 1).The sites were acquired with terrestrial laser scanning and vary from 20 to 120 scans.The buildings were measured under realistic conditions and are highly occluded and cluttered.After registration, Aritmica's Pointfuse (Arithmetica, 2015) was employed for the planar mesh extraction.Surfaces larger than 0.4m 2 were withheld for the identification of the elements as they are more likely to belong to a structural object.Over 7000 surfaces were manually labelled as ground truth.

Training
The model was trained with the 17 predictors from table 1.The available classes included floors, ceilings, roofs, walls, beams and clutter.The k-fold cross-validation was performed by sequentially using 9 structures for training and the remaining structure for testing.The quadratic SVM was trained in under 40s.

Classification SVM
The performance of the model is shown in Fig. 2. The average accuracy of the model is 81%.The average recall and precision is 80% and 82% respectively.Overall, this is very accurate given the large variety of buildings that were evaluated.Typical objects such as floors, ceilings and walls were extracted with over 85% accuracy.This proves that while buildings have many unique objects and are heavily cluttered, their structural elements are reliably detected.This is accentuated by the amount of clutter.Over 28% of the scenes consist of clutter larger than 0.4m.Together with the small surfaces, over 90% of the environment consists of non-structural objects.
Fig. 2 shows that the beams and roofs classes slightly underperform.This is due to the limited amount of available data.With only a small percentage of the surfaces available for the training, the parameter estimation of these classes is error prone.Additionally, beams have less distinct features.They are often occluded by ceilings, have varying directions and are hard to approximate by planar surfaces.

Discussion
The labelling of structural elements in heritage buildings can be performed with heuristics or machine learning techniques.Both are capable of detecting typical objects in built environments.
Commonly, classifiers such as SVM outperform heuristics in complex environments as they are not limited by user assumptions.However, the required amount of training data for these models to  perform adequately is problematic for certain classes.Elements such as roofs and beams represent only a small fraction of the data and thus the model will provide a poor solution for these classes.In these cases, hard constraints from heuristics in combination with machine learning may provide a solution.

CONCLUSION
In this paper, a classification algorithm is presented for the automated identification of structural elements in heritage buildings.Using Support Vector Machines, the floors, ceilings, roofs, beams, walls and clutter are reliable labelled even in highly cluttered and occluded environments.The experiments prove that the classification is highly accurate for a wide variety of buildings including regular houses, castles, churches, etc.Furthermore, the integration of the algorithm in a flexible environment allows for intuitive validation by the user.
The identification of building objects is crucial in the process of digitizing heritage.More specifically, the proposed approach can be integrated in the automated creation of semantically rich three dimensional models such as BIMs.In future work, the method will be investigated further to improve the labelling performance for low occurring classes.Additionally, research will be performed to detect a wider variety of structures and objects.

Figure 1 :
Figure 1: Used test cases for the model evaluation.The test data includes houses, offices, factories, a castle and a church.
Proximity captures the repetitivity of certain object configurations.It is defined as the minimum distance dminij = |ri − rj| between the boundary of the observed surface ri and a set of reference surfaces rj.The following distances are evaluated: The vertical distance to the closest large horizontal surface above (1), the vertical distance to the closest large horizontal surface underneath (2), the distance to the closest large vertical surface (3), and the number of connected surfaces are observed (4).The Topology feature encodes the location of the observed surface in relation to the context.These are good descriptors to differentiate the floors from the ceilings and roofs and to detect wall ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume IV-2/W2, 2017 26th International CIPA Symposium 2017, 28 August-01 September 2017, Ottawa, Canada This contribution has been peer-reviewed.The double-blind peer-review was conducted on the basis of the full paper.doi:10.5194/isprs-annals-IV-2-W2-25-2017| © Authors 2017.CC BY 4.0 License.

Figure 2 :
Figure 2: Confusion matrices of multiclass SVM training: Recall performance (left) and precision performance (right).The percentages under the True classes represents the data distribution of the surfaces.

Table 1 :
Types of predictor features A very promising group of classiDong, 2013)upport Vector Machines.These fundamentally two-class algorithms are non-probabilistic classifiers that use hyperplanes to separate the data into classes.SVM have been successfully implemented for the classification of both indoor and outdoor point cloud data(Adan and  Huber, 2011, Yang andDong, 2013).In our research, we employ Support Vector Machines in combination with extensive feature vectors as they are very efficient and accurate.