ROBUST CLASSIFICATION AND SEGMENTATION OF PLANAR AND LINEAR FEATURES FOR CONSTRUCTION SITE PROGRESS MONITORING AND STRUCTURAL DIMENSION COMPLIANCE CONTROL

The application of terrestrial laser scanners (TLSs) on construction sites for automating construction progress monitoring and controlling structural dimension compliance is growing markedly. However, current research in construction management relies on the planned building information model (BIM) to assign the accumulated point clouds to their corresponding structural elements, which may not be reliable in cases where the dimensions of the as-built structure differ from those of the planned model and/or the planned model is not available with sufficient detail. In addition outliers exist in construction site datasets due to data artefacts caused by moving objects, occlusions and dust. In order to overcome the aforementioned limitations, a novel method for robust classification and segmentation of planar and linear features is proposed to reduce the effects of outliers present in the LiDAR data collected from construction sites. First, coplanar and collinear points are classified through a robust principal components analysis procedure. The classified points are then grouped using a robust clustering method. A method is also proposed to robustly extract the points belonging to the flat-slab floors and/or ceilings without performing the aforementioned stages in order to preserve computational efficiency. The applicability of the proposed method is investigated in two scenarios, namely, a laboratory with 30 million points and an actual construction site with over 150 million points. The results obtained by the two experiments validate the suitability of the proposed method for robust segmentation of planar and linear features in contaminated datasets, such as those collected from construction sites. * Corresponding author


INTRODUCTION
Construction project progress monitoring and deviation control are essential to allow decision makers to identify discrepancies between the planned and the as-built states of a project in order to take timely measures where required (Maalek and Sadeghpour, 2012). In practice, monitoring is performed manually, a time consuming, error-prone and labour-intensive task particularly on large scale projects (Golparvar-Fard et al. 2009). To reduce the time and cost associated with such manual approaches, a limited (and/or frequency) of onsite data are collected, which diminishes the ability of the project manager to identify the causes of delays and cost overruns on time.
In addition, the reliable determination of project performance is highly dependent on the accuracy of the data collected during the monitoring process (Saadat and Cretin, 2002). Currently, site supervisory personnel spend 30-50% of their time manually inspecting and controlling the quality of the manually accumulated onsite data (Golparvar-Fard et al. 2009). Reduction of this time by means of a novel approach to onsite data collection and analysis suggests that more time can be allocated towards improving vital construction related concerns such as safety, as well as workforce productivity and communications. In order to help overcome the aforementioned limitations of current manual practices, automating the monitoring and control processes on construction sites has been proposed in recent years.

State-of-the-Art in Construction Management
In current practices, the time of completion of an activity is recorded in order to measure the potential deviations between the planned and the actual states of the project (Cox et al. 2003, Golparvar-Fard et al. 2015. However, this metric does not provide sufficient information to determine: i) the compliance of the dimensions of the as-built structures to those of the planned; and ii) the schedule delays throughout the progression of an activity (Maalek et al. 2014). In order to help improve these limitations, the "scope of work performed" should be determined by means of a remote sensing technology (Maalek et al. 2014). Terrestrial laser scanners (TLS) are widely used to measure the 3D coordinates of the structural elements.
Current research in construction management is devoted to the automatic extraction the "scope of the work performed" for each structural element from the accumulated TLS point clouds. However, most object-based recognition models use the planned 4D model as a priori knowledge to assign the collected 3D point clouds to their corresponding structural elements (Golparvar-Fard et al. 2009Bosché et al. 2015). This approach may not be reliable in cases where the location of the as-built structure differs from that of the planned (Shahi et al. 2013) or the issued-for-construction (IFC) plan with sufficient detail is not readily available.
In order to reduce this dependency on the planned model, it is proposed to generate the 3D/4D as-built model using only the geometric primitives of the accumulated points. Since the most generic building elements as well as most man-made objects are constructed from the intersection of planar (columns, beams) and linear (reinforcement bar) features (Nunnally, 2010;Vosselman et al. 2004), the classification and segmentation of planar and linear features are the major focus of this study.

Point Cloud Classification and Segmentation
As mentioned, the automatic detection of planar surfaces from TLS point clouds is the initial step to identify the most important structural elements. In order to extract features from point clouds, the initial step is devoted to labelling and grouping of the point clouds with similar physical attributes, also known as the classification and segmentation processes respectively (Rabbani et al. 2006).

PCA-based Point Cloud Classification:
There are two commonly-used methods to classify point clouds into planar surfaces, namely, 3D Hough transform and principal components analysis (PCA). Vosselman et al. (2004) use the 3D Hough transform to define every point in space with a plane in the parameter space, which allows the determination of planar surfaces without the estimation of the normal vectors. However, the use of Hough transformation for planar classification is computationally expensive and the results are highly affected by outliers (Lari, 2014). Therefore, special consideration is given to the use of PCA for the classification of point cloud.
PCA is the eigenvalue decomposition of the covariance matrix of a multivariate data set. It is used to summarize the variation of the data set in independent (orthogonal) axes (Johnson and Wichern, 2007). In the case of a three-dimensional point cloud, three orthogonal axes can be determined. Many researchers have used PCA for the classification of planar surfaces (Tovari and Pfeifer, 2005;Rottensteiner et al., 2005;Rabbani et al. 2006;Pu and Vosselman, 2006;Belton and Lichti, 2006;Filin and Pfeifer, 2006;Kim et al. 2007;Bremer et al. 2013;Lari, 2014). First, for each point cloud, a neighbourhood is defined. The PCA is performed on the pre-defined neighbourhood of each point. For coplanar points, the variation of a noise-free dataset in the direction of the surface normal is equal to zero. If the pattern of the neighbourhood of the desired point forms a planar surface, the point is classified as a plane.
Currently, there are methods available to classify points to planar/linear surfaces for datasets with no data contamination (i.e. no outliers). However, the classification of a dataset affected by outliers 1 using the classical PCA method is highly affected by the presence of outlying points (Serneels andVerdonck, 2008, Hubert et al. 2012). In order to improve the classification results for contaminated data sets, Nurunnabi et al. (2012a, b) proposed the use of robust PCA, which incorporates a robust estimate of the covariance matrix called the fast minimum covariance determinant (Fast-MCD) proposed by Rousseeuw and Driessen (1999). Their proposed robust PCA method for planar classification and segmentation showed 1 Which is the case on construction sites. significant improvement in contaminated data sets. Their comparison to the random sample consensus (RANSAC) method indicated that the robust PCA is better able to detect more outliers (Nurunnabi et al., 2013(Nurunnabi et al., , 2014. In order to determine the most efficient robust covariance matrix estimate, a review of the current state of robust dispersion (covariance) estimates is given in the following sub-section.

Robust Dispersion Estimates:
Robust statistics are methods of estimating models of contaminated data by reducing the effect of the outliers (Maronna et al. 2006). The breakdown value is the measure of robustness of an estimator with respect to the outlying observations (Hampel, 1971). It indicates the smallest fraction of contaminants in a sample that causes the estimator to break down (i.e. to take on values that are arbitrarily meaningless). An estimate with a breakdown point of 50% is ideal since it is able to detect the pattern of the majority of the uncontaminated data with up to 50% data contamination. There are currently two well-known multivariate dispersion estimates with high breakdown values (i.e. 50%), namely, the minimum volume ellipsoid (MVE) and the MCD.
The MVE is the smallest ellipsoid that covers a subset of h data points out of a set of n observations. The (n-h) points left are the outliers of the dataset. The MCD is concerned with selecting h points out of n for which the covariance matrix has the lowest determinant. The MCD has the same breakdown point as the MVE except that it is asymptotically normal (Butler et al. 1993) and has a higher convergence rate (Davies, 1992). In the study conducted by Jensen et al. (2007), it was concluded that the MCD is more suitable for larger sample sizes with a large percentage of data contamination. Therefore, an estimator of the MCD is preferred for the processing of point clouds in highly occluded areas such as a construction site.
There are currently two well-known MCD estimators namely, the fast-MCD (Rousseeuw and Driessen, 1999) and deterministic-MCD (Det-MCD; Hubert et al. 2012). Compared to the fast-MCD, deterministic Det-MCD is permutation invariant (i.e. the outcome of the estimator is not a function of the order of the observations). This is of great importance since the reordering of the point cloud samples does not affect the result of the robust covariance estimation subset. In addition, the computation time of the Det-MCD is much lower than that of Fast-MCD (Hubert et al. 2012). Therefore, in this study, the Det-MCD proposed by Hubert et al. (2012) is used to improve the classification of point clouds.

Point Cloud Segmentation:
Two methods are generally used to segment the classified planar/linear point clouds, namely, region growing and clustering. Region growing methods are widely implemented (Tovari and Pfeifer, 2005;Rottensteiner et al., 2005;Rabbani et al. 2006;Pu and Vosselman, 2006;Belton and Lichti, 2006;Belton, 2008;Nurunnabi et al. 2012a;2012b;2014) due to their computational efficiency. However, since the result of the segmentation is a function of the selected seed point/region (i.e. not permutation invariant), it is not considered as a robust method (Wang and Shan, 2009). Therefore, particular interest is given to segmentation procedures using cluster analysis.
In cluster analysis, an n-dimensional array of attributes is first defined. The points sharing similar attributes are then segmented into the same cluster. In the research carried out by Song and Feng (2008) and Shi et al. (2011), the k-means clustering algorithm was used to group point clouds with similar attributes. However, a k-means clustering approach requires a priori knowledge of the number of clusters and hence is not suitable for applications when this is unknown. In the work of Filin and Pfeifer (2006), clustering of the point clouds was carried out by seeking the mode of the histogram of the frequency of the attributes. However, the correct identification of the mode may be challenging in multivariate attribute cases (Haralick and Sahpiro, 1992). In the work of Lari and Habib (2014), a two-step segmentation method is proposed. First a region growing method is used to identify planar patches. These planar patches are then grouped/clustered in order to complete the segmentation. However, the choice of threshold used to cluster the attributes is currently subjective, which may result in over or under segmentation depending on the dataset.
As explained, the attributes in this study are robustly estimated during the classification process. Therefore, compact clusters are expected to be formed. In the research carried out by Bayne et al. (1980), Golden and Meehl (1980), Hartigan (1985), and Everitt et al. (2011) the complete linkage method was shown to be efficient for identifying compact clusters. This method does not require a priori knowledge about the number of clusters. In addition, it is not highly affected by outliers. However, it can break large clusters (Steinbach et al. 2003), resulting in oversegmentation. Here, an iterative robust complete linkage algorithm is proposed to reduce over-segmentation.

OBJECTIVE AND METHODOLOGY
The overall goal of this research is to automatically summarize acquired point clouds of construction sites into a set of vertices (i.e. automatic generation of the as-built model) using only the geometric primitives. To that end, a novel method is proposed to robustly segment coplanar and collinear points as a means of extracting the most common structural elements (beams, columns, slabs and reinforcement bars). Initially, the points are classified into planes and lines through a robust PCA, which uses the Det-MCD proposed by Hubert et al. (2012) to robustly estimate the covariance matrix. The coplanar and collinear points with similar attributes are then grouped together using a novel clustering approach. The modified convex-hull algorithm is used to detect the boundaries of each segment. The closest segments are then intersected in order to generate the 3D asbuilt model. The detailed explanation of the aforementioned stages is given in the following.

Robust Planar and Linear Classification
In order to classify point clouds into planes and lines, a neighbourhood is defined around each point. The 50 mm neighbourhood size is chosen based on the dimensions of the smallest structural elements that are required to be extracted 2 . Robust PCA is performed to determine the pattern of the variation within each neighbourhood. For coplanar points, the variation of the data in the direction of the surface normal is zero. For collinear points, all of the variation is summarized in one direction. This is illustrated in Figure 1. 2 In the work of Belton and Lichti (2006) and Weinmann et al. (2014), efforts were made to optimize the neighborhood size while performing the classical PCA. As will be proven in the following, the robust PCA is able to detect the outliers present within the predefined neighborhood, which reduces the dependency of the classification results on the initially defined neighborhood size. In order to illustrate the benefits of using robust PCA over classical PCA, in particular for the identification of mixed pixels, a point cloud comprising four adjacent planes scanned from a single instrument location was simulated. Random errors were added to the data using the specifications of the Leica HDS6100 TLS 3 , the instrument used to collect real data for this research. Mixed pixel artefacts were added between two of the planes using the following equation: (1) where X1, X2, S1, S2 and SM are shown in Figure 2a. The simulated point clouds are shown in Figure 2b.  Figure 2b. Figure 3a represents the percentage of misclassified mixed pixels with respect to the threshold used for the percentage of variance, explained by the largest eigenvalue (the neighbourhood size was fixed at 100). It can be seen that the planar classification results using the robust PCA includes fewer type II errors than the classical PCA. Figure 3b shows the relative percentage of improvement in the number of misclassified mixed pixels with respect to the neighbourhood size (the threshold of the maximum normalized eigenvalue was fixed to 55%). It can be inferred that the percentage of improvement in the misclassified points within the planar classification is more evident as the neighbourhood size increases. The results shown in Figure 3 indicate that the proposed robust PCA is less dependent on the thresholds used (i.e. more robust) and the choice of initial neighbourhood size.

Robust Planar Segmentation
From the robust PCA, points belonging to planar and linear features are identified. For each planar point, the four planar attributes, the robustly estimated surface normal vector and location (robust mean of the neighbourhood), are used to cluster points with similar attributes. As expressed in Section 2.2.3, the complete linkage algorithm is used to cluster coplanar points. According to the complete linkage algorithm, initially, a cluster is assigned to each point. The two clusters (say U and V) with the most similarity are merged together to form cluster UV. The distance between the similarity attribute of cluster (UV) and any cluster W is then calculated as follows: (2) The cluster with the minimum distance to cluster UV is merged into UV, say point W, and the process is continued for cluster UVW. The grouping is finalized when the distance measured by Equation (2) is greater than a predefined threshold. The process is then repeated for the remaining clusters. However, the choice of the similarity threshold is subjective, which reduces the robustness of the method 4 . In order to reduce the dependence of the segmentation on the specific value of the threshold, a new iterative process is proposed (Figure 4). Initially, the complete linkage algorithm is performed on the robustly-estimated plane parameters to group the coplanar points with similar attributes. The threshold is chosen so as to prevent under-segmentation 5 . For each cluster, the plane parameters are then estimated from the eigenvalue decomposition of the covariance matrix robustly estimated by DetMCD. The complete linkage algorithm is then carried out for the new plane parameters. The process is continued until the number of clusters remain constant.
For the identified cluster, a robust complete linkage is implemented to help reduce the dependency on the initial threshold (i.e. minimize over segmentation). First, the closest clusters are identified, say clusters I and J with sizes NI ≤ NJ. A random set of observations from cluster I is added to cluster J (no more than 25% of NJ) 6 . For the newly developed cluster, the DetMCD is performed to identify the outliers. The two clusters are merged if and only if less than half of the determined outliers are from cluster I. The process continues until no more clusters can be added to cluster IJ. The process is then repeated for the remaining clusters. In order to improve the computation efficiency, clusters with attributes that are farther than a certain threshold are not examined.

Robust Extraction of Flat Slab Floor and Ceiling
A new method is proposed to identify and extract the points on planar slab floors and ceilings before performing the proposed robust PCA using only the histogram of point elevation. This is particularly beneficial to help reduce the calculation time of the proposed segmentation procedure. A similar idea was introduced in (Arastounia and Lichti, 2013) to reduce the points on the ground in an electrical substation dataset. Here, a robust floor and ceiling extraction method is proposed to minimize the dependency on the thresholds used.
The typical histogram of point elevation for a room or a construction site with flat slab ceiling and floor is schematically shown in Figure 5. As illustrated, the histogram of elevation consists of two major peaks, representing the points of the floor and the ceiling. To determine the location of these two modes, the median-shift algorithm proposed by Shapira et al. (2009) is used. The two modes are regarded as points Pf and Pc in Figure  5. In order to robustly identify the points on the ceiling and the floor using the identified modes (peaks), first, all points within a predefined radius (r), here 5cm, from the modes Pf and Pc are identified. The Det-MCD algorithm is then applied on the specified points in order to identify the floor and ceiling.

Linear Segmentation
Every line in space can be uniquely defined by the intersection of two non-parallel planes. This concept is used to segment collinear points. After performing the robust PCA, each linearly classified point is defined by the robust directional vector and the robust location (mean of the neighbourhood). The cross product of the directional vector and the location vector results in a normal vector of a plane that passes through the line of interest and the origin. Initially, this metric is used within the complete linkage method to segment points with similar normal vectors. For each planar segment, the origin is then moved to an arbitrary location outside of the plane. The normal vector for each point in the cluster is again estimated using the robust directional vector and the new location 7 . The complete linkage algorithm is again performed to determine the final segments.

Boundary Detection and Robust Surface Fitting
Using the clustering methods proposed in Sections 3.2 and 3.4, spatially discontinuous surfaces with similar attributes are also grouped together. In order to enforce surface continuity, outer boundary points are determined using the modified convex hull algorithm proposed by Sampath and Shan (2007) and inner boundary points are defined using the method proposed by Lari (2014). Therefore, discontinuous surfaces are separated into different clusters.
The plane and line parameters for each identified cluster are robustly estimated using DetMCD. The closest planes and lines are then intersected to determine the vertices of the structural elements.

EXPERIMENTS
Two sets of LiDAR data were collected using a Leica HDS6100 TLS. The first set of experiments was for the as-built modelling of a laboratory at the University of Calgary. The second set of data was collected from an actual construction site and the planar and linear features are robustly segmented.

Experiment 1: Mechanics of Materials Laboratory
The first set of data was collected from the Mechanics of Materials laboratory at the University of Calgary ( Figure 6). As illustrated in Figure 6a, the laboratory consists of many metallic tables, which may result in data contamination due to multipath reflections. Therefore, it can be considered as a fair representation of an actual indoor construction site. Approximately 30 million 3D points of the interior surfaces were recorded from three different scan-stations. Figure 6b shows the plan view of the planned model. As illustrated, the lab consists of 26 different walls. The elevation of the ceiling relative to the floor is 2.7 m. The planned model suggests that the roof, floor and the surrounding walls are planar surfaces. The objective of this experiment is to robustly extract the planes representing the walls, floor and ceiling in order to control dimension compliance.

Robust Extraction of Floor and Flat Slab Ceiling:
First, the points of the flat slab floor and ceiling are extracted using the method presented in Section 3.3. The histogram of the elevation is shown in Figure 7, which complies with the hypothesis presented in Figure 5. The smaller peak, shown in blue, represents the metallic tables. The precision, recall and accuracy (Olsen and Denlen, 2008) of the extracted points are 91.5%, 100% and 92% for the floor and 92.4%, 100% and 93.4% for the ceiling respectively 8 . As illustrated, no Type II errors were detected during the planar feature extraction, which indicates the robustness of the proposed method. In addition, the extracted points accounted for approximately half of the total accumulated points, which suggests a significant reduction in the time of data classification and segmentation.

Segmentation and As-built Model:
Using the methods presented in Section 3.1, the robust PCA was performed on the remaining points. The planar parameters were then clustered using the method described in Section 3.2. The results of the segmentation are shown in Figure 8b. Approximately 94.7% of the points were segmented correctly. Figure 8c shows the asbuilt 3D model of the laboratory. The vertices were determined by intersecting the nearest planar clusters using the method described in Section 3.5.

Experiment 2: Graduate Student Residence Hall Construction Site
The second dataset was collected from the Graduate Student Hall of Residence construction site at the University of Calgary ( Figure 9a). Approximately, 150 million points were collected from four scan locations with the Leica HDS6100, shown in Figure 9b. The building is a concrete structure with box-shaped columns. The goal was to robustly segment the planar surfaces (floor slab and column facets) and linear features (reinforcement bar) using the methods proposed in Section 3.

Robust Floor Extraction:
The points on the planar floor slab were extracted using the method proposed in Section 3.3. Figure 10 shows the histogram of point elevation of the acquired data. As illustrated, the shape of the histogram of the points on the floor complies with that proposed in Figure 5. Approximately 65 million points were removed using the proposed method, which led to a great reduction in the calculation time for the planar and linear segmentation of the remaining points. The precision, recall and accuracy rates are 90.3%, 100% and 94.6% respectively 9 . Figure 10. Histogram of point elevation

Robust Classification:
The robust PCA proposed in Section 3.1 was performed on the remaining point cloud to identify the planar and linear features. The results of the classification are presented in Figure 11. Figure 11a represents the point cloud after the removal of the points on the floor. Figure 11b illustrates the points classified as lying on planar surfaces. As illustrated in Figure 11b, the proposed robust classification and floor extraction methods are able to correctly distinguish planar plates with a thickness of 5 cm from the points on floors. Figure 11c shows the remaining points after removing planar surfaces. The points classified as linear are shown in Figure 11d. The precision, recall and accuracy for the planar classification are 93.2%, 92.4% and 91.6% respectively. For linear classification, the precision, recall and accuracy are 91.8%, 89.6% and 92.8% respectively. Figure 11. Robust planar and linear classification: a) after removing the floor; b) points classified as planar surfaces; c) points after removing the points classified as planes; d) points classified as linear 9 Approximate values since the actual points are determined manually

Robust Segmentation:
The results of the robust segmentation of the classified point cloud are shown in Figure  12. Figures 12a through 12c 10 show the improvement of the planar segmentation results after each stage of the method proposed in Section 3.2. Figure 12a represents the segmentation of planar surfaces after the first iteration, in which 185 clusters were identified and over-segmentation is apparent. Figure 12b shows the planar segmentation after the last iteration (the third iteration). The number of clusters has been reduced to 132. Figure 12c illustrates the results after the robust complete linkage algorithm has been applied. The number of clusters was further reduced to 87. After this stage, approximately 95.2% of points were segmented correctly. Figure 12d shows the linear segmentation results. The point density has been reduced for clarity. The reinforcement bar on the top of the elevator shaft has also been magnified to better represent the linear segmentation results. Approximately, 91.4% of the reinforcement bars were clustered correctly. For the remaining linearly classified points, about 86.9% of points were clustered correctly. It may be possible to improve the linear segmentation by means of a better choice for the location of the origins in the method proposed in Section 3.4.

CONCLUSTION
The use of LiDAR for construction site progress monitoring and structural dimension compliance control is evolving markedly. However, the point clouds collected in a dynamic environment such as a construction site are expected to be contaminated with outliers. Here, a robust method for the classification and segmentation of planar and linear features in LiDAR data collected from construction sites has been introduced. The classification method uses a robust PCA to reduce the effects of 10 The boundary detection has been carried out to differentiate discontinuous surfaces outliers on the pattern of the data. It was also shown that the results of the classification are less affected by the choice of the size of neighbourhood. However, a robust optimum neighbourhood search method is required to further enhance the classification results.
A novel method for robust planar segmentation was proposed using an iterative complete linkage clustering method and the DetMCD covariance estimator. The method is particularly beneficial since its performance is not a function of a subjectively pre-defined threshold.
A robust method for extraction of planar floors and ceilings has been developed. This method has shown to be very efficient in extracting the points on floors and ceilings as well as reducing the calculation time for the classification and segmentation of the remaining points.
A new two-step method for linear segmentation was also introduced. Currently, the choice of the second origin after the initial segmentation is arbitrary and subjective and hence more investigation is required to find the optimum location of the origins to improve the linear segmentation results.
The applicability of the proposed planar and linear segmentation methods have been investigated in two datasets.
The results indicate promise for the robust segmentation and classification of planar and linear features in contaminated datasets.
In future studies, the applicability of the proposed methods will be examined on two construction sites located at the University of Calgary as construction progresses. The inconsistencies between the planned 4D BIM model and the automatically generated as-built model will be investigated through a novel change detection algorithm. The robust segmentation and classification of NURB surfaces and the use of alpha-shapes in detecting the boundaries of these types of segments will also be studied.