K-MEANS CLUSTERING BASED ON OMNIVARIANCE ATTRIBUTE FOR BUILDING DETECTION FROM AIRBORNE LIDAR DATA

: Building detection is an important process in urban applications. In the last decades, 3D point clouds derived from airborne LiDAR have been widely explored. In this paper, we propose a building detection method based on K-means clustering and the omnivariance attribute derived from eigenvalues. The main contributions lie on the automatic detection without the need for training and optimal neighborhood definition for local attribute estimation. Additionally, one refinement step based on mathematical morphology (MM) operators to minimize the classification errors (commission and omission errors) is proposed. The experiments were conducted in three study areas. In general, the results indicated the potential of proposed method, presenting an average F score around 97%.


INTRODUCTION
Geographic information systems (GIS) are essential tools for several urban applications such as evaluation of damage caused by natural disasters, urban growth planning and monitoring. Today, around 55% of the world's population live in urban areas, a proportion that is expected to increase to 68% by 2050, adding 2.5 billion people to urban areas by 2050 (The United Nations, 2018). These numbers support the importance of urban planning and the use of up-to-date cartographic bases to assist decision making.
In general, buildings play an important role in monitoring urban environments, since a high percentage is covered by them. Besides, the development and growth of cities is usually related to the increase in the number of buildings. Considering these aspects, several research efforts have been conducted aiming at the development of automatic or semi-automatic building detection methods.
In the last decades, the scientific community has explored the use of remote sensing data such as aerial images, satellite images, point clouds derived from photogrammetric and LiDAR systems. According to Santos et al. (2021), airborne LiDAR data emerges as a suitable alternative, since the 3D point cloud is obtained directly from the integration of sensors, not requiring a photogrammetric matching process, and is not influenced by imaging conditions. According to Hui et al. (2021), building detection methods can be divided into two categories: machine learning and classical methods. The first category is based on learning classifiers, which require training data such as Random Forest (RF) and Convolutional Neural Network (CNN) (Protopapadakis et al., 2016;Ni et al., 2017;Maltezos et al., 2019;Zhang et al., 2020). However, the use of training data can be a limitation, since, in some cases, there is still a need to label the initial dataset, which is a time-consuming task.

___________________________ * Corresponding author
In the second category, classical methods explore geometric morphological features that allow for distinguishing buildings from other objects. Usually, these methods perform region growing using geometric attributes, and also clustering algorithms such as RANSAC, K-means and C-means (Awrangjeb and Fraser, 2014;Cai et al., 2019, Santos et al., 2019Liu et al., 2020;Hui et al., 2021). The limitation of the classical methods is related to the definition of multiples thresholds for segmentation and region growing stage, for example. Due to the point density variation, the definition of local neighborhood and attribute estimation are also challenging stages.
Despite the diversity of developed methods, there is no general approach capable of encompassing all the complexity of an urban scene and airborne LiDAR data characteristics. Thus, building detection is still a topic of interest to researchers. In this sense, we propose a building detection method based on the well-known K-means clustering and omnivariance attribute. The main contribution is automatic building detection; thus avoiding the need for training data. As complementary contributions, we have the optimal neighborhood definition and the use of a single geometric attribute to separate above ground points into building and non-building classes. Additionally, we propose a refinement based on mathematical morphology (MM) to minimize the classification errors (omission and commission).

BUILDING DETECTION METHOD
In Figure 1, we show the main steps of the proposed method, which is composed of three main stages. In the first stage, the selection of above ground points is performed considering a normalized digital surface model (nDSM) and a height threshold (TH). In the second stage, above ground points are separated into two classes: building and non-building. The classification is carried out using the omnivariance attribute and the K-means clustering. In the last stage, the classification results are submitted to a refinement process based on mathematical morphology theory. The input data is a 3D point cloud acquired by an airborne laser scanning system.

Selection of above ground points
Initially, a filtering process is executed to obtain the digital terrain model (DTM). This stage is performed using the lasground tool from LAStools software. The lasground tool has implemented the progressive TIN (Triangulated Irregular Network) densification approach proposed by Axelsson (2000). Additionally, it provides some options with parameters already defined. In this work, we selected the option "city", which considers the following threshold values: step_size = 25 m, offset = 0.05 m, spike = 0.5 m and stddev = 0.1 m.
In the next step, the nDSM is generated using the lasheight tool from LAStools by evaluating the relative height for each point in available point cloud. This height corresponds to the vertical distance between the point in question and its projection on the DTM.
The selection of above ground points is performed by comparing the height of each point (Hi) with a height threshold (TH). In the following, we show the criterion applied: Yes: p i is above the ground No: p i is not above the ground where Hi represents the height of a given point pi and TH the height threshold. In this work, we adopted TH = 2 m, whose value is based on the minimum height of building roof.

Classification using Omnivariance Attribute and K-Means Clustering
The majority of the points selected in the previous step are sampled on the building roofs and/or trees. As discussed in Chehata et al. (2008), geometric, textural and spectral attributes can be used to classify 3D point cloud. For airborne LiDAR data, local geometric attributes are usually explored, as can be seen in Weinmann et al. (2014Weinmann et al. ( , 2017 and Grilli et al. (2019). In the proposed method we explore the local omnivariance derived from eigenvalues. Additionally, the K-means clustering is adopted for separating above ground points into two classes: building and non-building.

Neighborhood Selection
In the context of 3D point cloud, the neighborhood of a given point pi can be established by using a sphere or a vertical cylinder. Additionally, there is the possibility to define the neighborhood by considering the number of neighboring points (n) near pi, where n ϵ ℕ.
In this method, we adopt the criterion based on n nearest neighbors, however, instead of using a fixed number of neighbors, we establish one criterion to automatically select the optimal neighborhood for each point. For this purpose, we consider a minimum (nmin) and maximum (nmax) value for n, as well as an increment value ∆n. The minimum value is nmin = 10, in accordance to Demantkle et al. (2012) and Weinmann et al. (2014). For maximum value, we select a relatively high number of n, i. e., nmax = 50. As for the increment, it is adopted ∆n = 5.
The variable Ɲi,n represents the neighborhood of a given point pi and its n nearest neighbors. In total, for each point, we estimate nine possible neighborhoods (Ɲi,10 Ɲi,15 Ɲi,20 . . . Ɲi,50). In contrast to Demantkle et al. (2012) and Weinmann et al. (2014), our strategy uses the omnivariance feature instead of Shannon entropy (Shannon, 1948). In the next section, the criterion for choosing the optimal neighborhood is outlined.

Omnivariance Attribute Estimation
The omnivariance is a geometric attribute/feature commonly applied to describe the local 3D structure around an interest point. This attribute was selected based on previous work results (Weinmann et al., 2017). According to West et al. (2004), the omnivariance is calculated using Equation 1: where λ1, λ2 and λ3 correspond to eigenvalues, with λ1 ≥ λ2 ≥ λ3≥ 0, estimated from the neighborhood around each point.
As can be seen in the omnivariance map (Figure 1), low values are usually related to points sampled on flat surfaces such as roof buildings. In contrast, high Oλ values correspond to rougher surfaces, as, for example, ridge lines and vegetation areas. The eigenvalues (λ1, λ2 and λ3) in Equation 1 are estimated from the 3D covariance matrix also known as 3D structure tensor, which is a symmetric positive-definite matrix (Weinmann et al., 2014). This 3D covariance matrix is obtained from the 3D coordinates of the point in question and its neighbors. corresponds to a different neighborhood (Ɲi,n). In this sense, it is relevant to select a unique value of omnivariance for the point in question. Some statistical measures can be considered such as mean, median, standard deviation, minimum and maximum, as mentioned by Santos et al. (2021). In Figure 2, we show an example with different statistical measures computed for building points. Performing a visual analysis, it is possible to observe that the ridge lines appear thinner for minimum Oλ. Then, based on this analysis, the number of n* neighbors for one generic point pi can be selected according the following equation.

Class Definition using K-means
In this stage, we perform the classification using the K-means clustering. As an attribute, we consider the minimum omnivariance estimated from Equation 2. The number of classes set for the clustering method was K = 2, which corresponds to building and non-buildings classes. The main advantages of using K-means are that no training is required and the classes are automatically defined. The centroids corresponding to each class are iteratively defined as described in Johnson and Wichern (2007).
Additionally, we use the Euclidean distance in the 1-D space as the similarity measure. As an output, the K-means provides two centroids. In our context, the centroid with lower value represents the building class, whereas the higher is the non-building class.

Refinement Based on Mathematical Morphology
The result derived from the classification usually has errors, affecting the quality of the building detection. Figure 3a shows the building points detected by the K-means clustering. In this figure, the classification errors (highlighted by arrows) are associated with holes in the ridge lines (omission errors) and points wrongly labeled as building (commission errors). To minimize the classification errors, a refinement based on the concept of mathematical morphology (MM) is adopted.
To allow the application of filtering via MM, a regular grid is generated using the building points. The size of the grid cell (Sgrid) is set equal to the average point cloud spacing (psavg). After the grid generation stage, a 3x3 median filter is applied to minimize noise effects. Then, the refinement using MM is performed, applying an opening followed by a closing filter. According to Pei et al. (1997), this sequential application of filters is known as alternating filter and was introduced by Sternberg (1986). For this purpose, we adopt a square structuring element (SE) of dimension 4x4. The size of the SE is defined empirically.
In the refinement stage, the grid is assumed to be a grayscale image. Let X denote the grid and B the structuring element. The alternating filter (AF) is defined as an opening followed by a closing: where AFB represents the alternating filter response for the SE B, and (∘) represents the opening and (•) closing operation.
To illustrate the effect of the AFB filter on real data Figure 3 is shown. In Figure 3a, we show an example of building detection with some classification errors. As can be observed, there are noise around the building (arrows in cyan) and gaps in roof buildings (arrows in yellow). The results of applying only opening and closing MM filters are shown in Figures 3b and 3c, respectively. In Figure 3d, we show the results derived from applying the AFB filtering, where it is possible to observe that both types of errors are minimized.

STUDY AREAS AND DATASETS
For the analysis of the proposed method, three study areas were selected. Two areas are located in the city of Vaihingen/Germany and the other one in Presidente Prudente/Brazil (Figure 4). The first area (Area 1 - Figure 4a) is composed of high-rise residential buildings that are surrounded by trees. The second area (Area 2 - Figure 4b) is a purely residential region with small detached houses. The third area (Area 3 - Figure 4c) has a high concentration of residential houses (left side) and large buildings with nearby trees (right side).  The airborne LiDAR data corresponding to areas 1 and 2 comes from the ISPRS Test Project on Urban Classification, 3D Building Reconstruction and Semantic Labeling, which is available for download through the ISPRS website (https://www2.isprs.org/commissions/comm2/wg4/benchmark/) . The dataset of area 3 is part of the Unesp Photogrammetric Data Set and is also available to the community (Tommaselli et al., 2018). In Table 1  To facilitate the visual analysis, we include an aerial image and clipping of the point cloud. The cyan arrows indicate the objects that were incorrectly identified. In the first case (Figure 6a, I), a non-building object was incorrectly detected as a building. In the second case (Figure 6b, II), the incorrect detection corresponds to part of the canopy.

RESULTS
In Figure 7, we show an example where the building was not detected by the proposed approach. In this figure, it is also shown the aerial image, point cloud, and one clipping containing only sampled points on the roof.
Additionally, Figures 8a and 8b show buildings partially occluded by nearby vegetation. In both figures, we display aerial and terrestrial images, as well as the point cloud. The buildings in question are highlighted by cyan arrows.
The quantitative analysis was carried out using the following quality parameters: completeness (Comp.), correctness (Corr.) and Fscore (Sokolova et al., 2006;Wiedemann et al., 1998). The reference data for areas 1 and 2 is available in the ISPRS dataset, whereas the reference for area 3 was manually generated using the CloudCompare v2.10.2 software.

Area 1 (a)
Area 2 (b) Area 3 (c) Figure 5. Building detection using the proposed method for the study areas 1 (a), 2 (b) and 3 (b). Point cloud according to height (left column), result derived from proposed method (middle column) and reference map (right column).   According to Hui et al. (2021), the quality parameters can be estimated considering different types of information (or scope): pixel-based or object-based. In this work, these parameters are estimated at the pixel-based level. In Table 2, we show the completeness, correctness, and Fscore for three study areas using the proposed method. In order to perform a comparative analysis with previous works, Table 3 is shown, where the quality parameters are reported for areas 1 and 2. In Figure 9, we show an error map for each study area, containing the spatial distribution of true positives (TP), false positives (FP), and false negatives (FN).  Table 3. Quality parameters estimated for areas 1 and 2 considering different building detection approaches.

DISCUSSION
Conducting a visual analysis of the results ( Figure 5), we can observe that most of the buildings are correctly detected. Despite a low occurrence, it is possible to note the presence of false positives and false negatives, which correspond to commission and omission errors, respectively. Analyzing the error map in Figure 9, it is noticed that area 1 presents higher occurrence of false positives, whereas area 2 presents higher occurrence of false negatives.
In Figure 6, we highlight two commission errors. In the first case, the region of interest (highlighted by the arrow in cyan) is probably the entrance or exit of an underground parking lot, as can be observed in point cloud and height profile. Visually, this object has geometric characteristics similar to a building roof. In the second case, part of the canopy was incorrectly detected. This problem may be related to the scan pattern of the Leica ALS50 laser system at the edges of the strip, where we have an oversampling. This issue is associated with the deceleration and acceleration of the oscillating mirror. Since the omnivariance is locally determined based on n* nearest neighbors (see Equation 2), the oversampling problem affects the attribute estimation. In Figure 5, a car was incorrectly detected as building, since this object typically has similar omnivariance values to buildings. To minimize this problem, a larger structuring element can be adopted, for instance.
For the omission errors highlighted in Figure 7, it is possible to verify an under-sampling problem, probably caused by the roof material. Due to this characteristic, the proposed method did not detect the building. Additionally, in Figure 8, we highlight two buildings partially occluded by nearby trees. In the first case, only one corner is partially covered by vegetation; whereas in the second the building is almost completely covered. Despite the high complexity, both buildings are detected by the proposed method ( Figure 5c).
When analyzing the completeness metric (Comp.) (Table 2), it can be observed that the results for areas 1, 2 and 3 present values around 99%, 94% and 97%, respectively. These results indicate the low rate of omission errors. Considering the analyzed areas, the lowest omission rate occurs in area 1, followed by areas 3 and 2, respectively. Considering the correctness metric (Corr.) (Table 2), we obtained values around 94%, 98% and 98% in areas 1, 2 and 3, respectively. These results also point to the low occurrence of commission errors. Additionally, it is possible to observe that areas 3 and 2 present lowest omission rates followed by area 1.
Analyzing the Fscore metric (Table 2), it is noted that the proposed method presents values around 96%, 96% and 98% in areas 1, 2 and 3, respectively. In general, this indicates the potential of the proposed method in identifying buildings, as well as in separating building class from other types of objects.
Performing a comparative analysis (Table 3), it is possible to observe that the quality parameters of the proposed method are in agreement with previously developed approaches. The proposed approach presents the best values of Fscore for both areas, as well as of completeness and correctness for areas 1 and 2, respectively. These results reinforce the potential of the proposed method.
In summary, the qualitative and quantitative analysis indicate that proposed method is robust in detecting building in an urban environment with different complexities.

CONCLUSION
This paper proposes an automatic building detection method from airborne LiDAR. The method consists of three main steps: selection of above ground points, classification, and refinement. The main contribution lies in the automatic building detection, without requiring training and automated selection of neighbors around each point based on the minimum omnivariance, followed by one refinement step using MM operators. The results indicate that the proposed method has the potential to be applied in urban environment with different complexities, presenting an average Fscore around 97%. Additionally, the proposed method presents results similar to those previously developed. As a limitation, this approach requires the conversion of LiDAR data into a regular grid, losing information about building shape.
For future research, we suggest the development of a refinement process in 3D space, avoiding the need of grid generation. Additionally, we suggest a comparative analysis in terms of processing time and application of the proposed method on photogrammetric point clouds.