Graph-theoretical approach to robust 3D normal extraction of LiDAR data

Low dimensional primitive feature extraction from LiDAR point clouds (such as planes) forms the basis of majority of LiDAR data processing tasks. A major challenge in LiDAR data analysis arises from the irregular nature of LiDAR data that forces practitioners to either regularize the data using some form of gridding or utilize a triangular mesh such as triangulated irregular network (TIN). While there have been a handful applications using LiDAR data as a connected graph, a principled treatment of utilizing graph-theoretical approach for LiDAR data modelling is still lacking. In this paper, we try to bridge this gap by utilizing graphical approach for normal estimation from LiDAR point clouds. We formulate the normal estimation problem in an optimization framework, where we find the corresponding normal vector for each LiDAR point by utilizing its nearest neighbors and simultaneously enforcing a graph smoothness assumption based on point samples. This is a non-linear constrained convex optimization problem which can then be solved using projected conjugate gradient descent to yield an unique solution. As an enhancement to our optimization problem, we also provide different weighted solutions based on the dot product of the normals and Euclidean distance between the points. In order to assess the performance of our proposed normal extraction method and weighting strategies, we first provide a detailed analysis on repeated randomly generated datasets with four different noise levels and four different tuning parameters. Finally, we benchmark our proposed method against existing state-of-the-art approaches on a large scale synthetic plane extraction dataset. The code for the proposed approach along with the simulations and benchmarking is available at https://github.com/arpan-kusari/graph-plane-extraction-simulation.


INTRODUCTION
There have been a proliferation of laser scanners and three dimensional (3D) cameras in domains as varied as mobile robotics (Suger et al., 2015), autonomous vehicles (Urmson et al., 2008) and advanced manufacturing (Joshi and Caputo, 2020) along with traditional domains such as land surveying. These 3D scanners measure 3D locations of points on objects and store them as point cloud data. Analyzing the point cloud data is a fundamental task in mapping and navigation applications. For example, urban and indoor environments usually comprise of a large amount of planar surfaces. The planar information can be collected as 3D point clouds via light detection and ranging (LiDAR) techniques. Identifying planes from 3D point clouds is a non-trivial task in the presence of multiple inlier structures and contamination of observed data with noise (Amayo et al., 2018).
Generally speaking, there are two disparate domains which utilize 3D point clouds and perform plane identification. While Photogrammetry utilizes plane fitting methods such as region growing (Xu et al., 2017), 3D Hough transform (Duda and Hart, 1972) and RANSAC (Fischler and Bolles, 1981) to cluster points belonging to an actual planar surface, Computer Vision typically relies on per point normal estimation that is computed based on neighboring points on the local planar neighborhoods (Jordan and Mordohai, 2014). In this research, we take the view that computing the per-point normal is a lower level function which can then be clustered to derive the planar surfaces as detailed in (Hoover et al., 1996).
Given that 3D point cloud data is inherently unstructured, our proposed approach is established based on graphs which can provide an inherently principled way of connecting adjacent points in a global manner and promote relationship between them based on the given edge characteristics. Although the graphical formulation of 3D point clouds has been attempted in some previous studies (Moosmann et al., 2009, Landrieu et al., 2017 primarily for segmentation of ground and objects, these studies do not provide a rigorous formulation of normal estimation using the graphical approaches. On the other hand, the normal estimation from multi-model fitting has been attempted under an optimization setup (Amayo et al., 2018) which can be shown to be a specific instance of our proposed approach.
In order to develop a fast and robust extraction algorithm, we analyze point clouds through a variant of the proximity graphs, the k-nearest neighbor graph (k-NNG) (Toussaint, 1989). The plane extraction problem can then be formulated as a problem of finding the latent structure in the graph. This stems from an underlying assumption that the data samples can be aggregated on a low-dimensional manifold and this manifold can be represented by its discrete conjugate, the graph (Rubin-Delanchy, 2020). Thus, instead of finding a handful of planar surfaces by segmentation as done by most of the other approaches, we find the normal of each point in point clouds conditioned on the graph. Inspired by analogous research in mining low-dimensional data from high-dimensional data using robust principal component analysis (rPCA) (Shahid et al., 2016), we impose a graph smoothness constraint on the samples.
We further extend the proposed formulation to a weighted version to improve the estimation accuracy of the normals at the points near the boundary between two distinct planar surfaces. In the weighted version, we estimate the normals by formulating an optimization problem with a weighted loss function, and update the weights iteratively. Different weighting strategies are chosen intuitively -the dot product of the estimated normals between neighboring points; the inverse of distance between the neighboring points and the product of dot product between the normals and inverse distance. The weights provide a discriminative feature where the preference is given to closer points with similar normal vectors.
The rest of paper is organized as follows. In Section 2, we will introduce the related work. In Section 3, the proposed method is elaborated for normal estimation. Section 4 demonstrates the effectiveness of the method by using a small simulated dataset composed of points sampled from three orthogonal planar surfaces, as well as a large scale synthetic plane estimation benchmarking dataset, SynPeb (Schaefer et al., 2019). A comparison between the proposed method and other benchmark methods are also provided. The paper then ends with a conclusion in Section 5.

RELATED WORK
Normal estimation for unorganized point clouds can be broadly classified into regression-based, optimization-based, and Voronoibased methods.

Regression-and optimization-based methods
Regression-based normal estimation was first proposed by (Hoppe et al., 1992) where the authors proposed estimating the normal of every individual point in the point cloud as the normal of the local neighborhood of k-neighbors with the assumption that the surface is smooth everywhere (i.e. there are no sudden abrupt normal changes). The normal estimation was performed by fitting a least squares plane via principal component analysis (PCA) which is analogous to minimizing the L2 norm. Given the ease of use and speed of computation of the method, it has been ubiquitously used to estimate the normals of a surface.
However, there are two main disadvantages of this method: it is very sensitive to noise in the data and it fails to preserve edges since it tends to average the normals of two adjoining surfaces. Various modifications have been suggested to make the surface fitting robust to noise: (Fleishman et al., 2005) replaced PCA with Least Median of Squares (LMS), (Lipman et al., 2007) replaced L2 norm with L1 norm which has been theoretically proven to be robust to outliers. Other researchers used weighting to heighten or suppress contribution of neighboring points which has an effect of removing outliers/noise as well such as (Pauly et al., 2002) who used weighting to give more importance to nearer neighbors.
Modification to the second disadvantage has been much less studied. (Fleishman et al., 2005) divided the neighborhood into piece-wise planar surfaces by first applying LMS by selecting points lower than a certain threshold in the neighborhood and then running moving least squares (MLS). The drawback of this method was that LMS was sensitive to noise which made the overall method unstable. Recently, (Sanchez et al., 2020) used a well-known robust m-estimator, Geman-McClure estimator, to solve the issue of noise and also subsequently, perform two independent initializations leading to two different solutions to counter the anisotropy around the edge features. However, mestimator is known to be very sensitive to initial estimate, a problem common to all nonlinear regression procedures (Myers and Myers, 1990), it is sensitive to the scale chosen and finally, convergence with m-estimator is not guaranteed. On the other hand, probabilistic plane extraction (PPE) (Schaefer et al., 2019) explores point association to a finite set of planes found by agglomerative hierarchical clustering. There are two drawbacks which can affect the accuracy of the plane extractionthe clustering results vary wildly based on the kind of linkage chosen and the number of possible planes are fixed a priori.

Voronoi-based methods
An alternate approach for normal estimation is to construct some form of Delaunay triangulation (DT) or its conjugate Voronoi diagram for a given point cloud. This method was first proposed by (Amenta and Bern, 1999) who defined the normal vector of a point as the line through the point to the furthest vertex of the corresponding Voronoi cell, known as polar ball. While this method yielded normals in the noise-free case, in presence of noise it completely broke down. (Dey and Sun, 2006) expanded this method to deal with noise by finding large polar balls to approximate the normal vector. However, these methods are still not robust to noise in real life LiDAR datasets (Sanchez et al., 2020).

Problem formulation
The space of measurement data is denoted by X ⊂ R 3 . Let X = {x1, ..., xm} denote the set of m measurement points in the LiDAR point cloud, where xi ∈ X is the i-th point in X. Throughout the paper, we denote scalars by lowercase letters, vectors by lowercase boldface letters, and matrices, sets of vectors by uppercase boldface letters. Let ni denote the target normal vector at point xi. Our objective is to estimate {ni; i = 1, . . . , m} based on the point cloud data X.
According to the definition of the normal vector (Morvan and Thibert, 2004), we would like ni to be perpendicular to the tangent plane at xi, or equivalently, to be perpendicular to any vector on the tangent plane at xi. Since the tangent plane is unknown, as an empirical approach, we require ni to be perpendicular to the vectors between xi and its k-nearest neighboring points. In this way, the estimation of ni is converted to the following optimization problem: where Xi represents the set of k-nearest neighboring points of xi, · represents the l 2 norm, and ·, · represents the inner product.
However, the point cloud data X is often subject to random measurement noise. In this case, x − xi; x ∈ Xi may not well approximate the vectors on the tangent plane at xi, and the resultantni in Eq.(1) is biased. To improve the robustness of ni, we impose a regularization term to the optimization problem in Eq.(1). Let N be the matrix whose row vectors are ni's, let n (j) denote the j-th column vector of N . The regularized optimization problem is then expressed as follows: where L is the Laplacian that measures the spatial similarity among the points i = 1, . . . , m, and λ is the tuning parameter. The second term on the right hand side of Eq.
(2) forces the resultant normal ni and nj close to each other when the distance between the points xi and xj is small, and hence guarantees the robustness of N with respect to the measurement noise in X.
Now we aim to develop the matrix form of the optimization problem in Eq.
(2) and show that the target problem is a constrained convex optimization problem. Let n denote the vectorized N , that is, n = n T 1 , . . . , n T m T . Let Ri denote the 3 × 3m matrix that projects n to its i-th row vector ni. Here the (3i − 2, 1), (3i − 1, 2) and (3i, 3)-th elements in Ri are 1, while the rest elements are 0. It can be easily validated that Similarly, let Cj be the m × 3m matrix whose (1, j), (2, 3 + j), · · · , (m, 3m − 3 + j)-th elements are 1, and the rest elements are 0. Under this definition, Cj projects n to its j-th column vector as: Let L denote the loss function in Eq.
(2) yields: where Xc,i is the matrix whose j-th row is the difference between the j-th point in Xi and xi. To this end, the loss function is expressed as the summation of two quadratic forms, implying that is a constrained convex optimization problem with respect to n. The optimization problem is well-defined once the Laplacian graph term L is defined. In the next subsection, we will discuss the choice of L.

Choice of graph Laplacian
The first hurdle is to choose a graph Laplacian to adequately represent these points. Since the 1980s, there has been a lot of work done in characterizing the distance in irregular point sets (planar and high-dimensional) which uses the notion of proximity graphs. Proximity graphs are geometric graphs in which any two vertices p, q are connected via an edge epq if and only if there exists a certain exclusion region where no other points are located (Mitchell and Mulzer, 2017). In this research, we utilize the most basic form of proximity graph, k-nearest neighbor graph (k-NNG) to represent the sampled points. k-NNG is a graph where any two vertices p, q are connected if the distance between p and q is among the k-th smallest distances for either p or q.
For a given graph G, an adjacency matrix can be constructed which encodes the weights of the edges. We construct the adjacency graph as As previously mentioned, we utilize graph regularization, with a graph encoding the connections to neighboring data samples, G. Therefore, we construct an adjacency matrix, A and using the adjacency matrix, we construct the normalized graph Laplacian matrix L as L = I − D −1/2 AD −1/2 where D is the degree matrix given as a diagonal matrix of the rowwise sum of the adjacency matrix. Given the graph Laplacian, we can solve the optimization problem.

Solving optimization using conjugate gradient
With the Laplacian graph L well defined, the next step is to solve the constrained convex optimization problem in Eq.(5). Although L is the summation of two quadratic forms, the complex equality constraint Rin = 1; i = 1, . . . , m, makes the optimal solution intractable. Here we adopt the Projected Conjugate Gradient Descent algorithm to obtain the numerical solution. The algorithm iterates two sub-steps until convergence. In the first sub-step, the solution n is updated based on the gradient of L. Specifically, in iteration t + 1, the normal is updated as where In the second sub-step, nt+1 is projected to the space by normalizing each of its rows to a unit vector. Since the optimization problem is convex, the Projected Conjugate Gradient Descent algorithm returns the unique solution. Our results as detailed in the next section show that this method yields acceptable results in a small number of iterations.

Weighted graph-based normal estimation
In this subsection, we introduce a weighted version of the graphbased normal estimation algorithm discussed in the previous subsections. The motivation is to correct the estimation bias when xi is near a boundary of two surfaces that have distinct normals. In this scenario, solving the optimization problem using the loss function in Eq.(5) results in a normal lying somewhere in between the normals of the two surfaces, instead of returning the true normal of the surface the point xi actually belongs to. This is because the loss function requires the estimated normal ni to be perpendicular to the surface determined by the neighboring point set Xi, which can include points on different surfaces.
A natural approach to resolve this issue is to place higher weights on the points on the surface Si where xi belongs to, and place lower weights on the rest of points in Xi. A natural assumption in this case is that for a point xi ∈ Si, where Si represents the true surface the point is sampled from, majority of the other neighboring points Xi also belong to Si. Let W i denote the target weight matrix, the loss function in Eq.(5) can be revised as: which can be seen as similar to the weighted PCA. In this case, the gradient can be computed as Since the true normal of the points in Xi is unknown, we would like to refer to the geometric information of the point set Xi and provide the following three choices of the weight matrix W i. Let w i,jk denote the element in W i which is corresponding to the pairwise distance between the j-th and k-th points in Xi.
• dot product between normals -weight of each pair of points j and k is given by the dot product computed between the normals as: • distance between points -weight of each pair of points j and k is given by the inverse of Euclidean distance as: • combination of dot product and distance -weight of each pair of points j and k is given by the product of dot product and inverse distance as: We provide the results in the following section for each of the weighting strategies and compare against the other competing approaches. As a summary, we present the proposed algorithm as follows.
Algorithm 1: Graph-based Normal Estimation input : a set of point cloud data X initialize neighboring point sets Xi, i = 1, . . . , m graph Laplacian L projection matrices Ri, Ci, i = 1, . . . , m centered matrix Xc,i, i = 1, . . . , m tuning parameter λ, initial normal n0, iteration t = 0, learning rate α while nt − nt−1 ≥ ε do Evaluate W i based on nt for i = 1, . . . , m Evaluate L in Eq.(11) Evaluate ∇nL in Eq.(12) nt+1 ← nt + α∇nL t ← t + 1 end output: the estimated normaln = nt In regard to the computational complexity, the computing time depends on the number of iterations, the evaluation of the loss function and its gradient in Eq.(11) and Eq.(12). In fact, the computational time of Eq.(11) and Eq.(12) is dominated by the matrix multiplication of maximal size 3k × 3k and the summation over all the data points, which is O m × k 3 . Let T denote the total number of iterations, the overall computational complexity of the proposed method is hence O m × k 3 . This indicates that the computational time scales up linearly with respect to the number of points and iterations, which can be applied to high-volume point clouds. On the other hand, a large number of neighbors k dramatically increases the computational load, while improving the estimation accuracy for a non-smooth surface with a high measurement noise level. Given the trade-off between the estimation accuracy and computational time, it is recommended to determine the number of neighbors based on the prior information on the surface smoothness and measurement noise level.

Simulation setup
We will use a toy example in subsection 4.1 to show the effectiveness of the proposed method in extracting normals from a small yet noisy dataset. In subsection 4.2, we will use the Syn-Peb dataset with 250, 000 points to demonstrate the method's applicability to high-volume complicated real data (Schaefer et al., 2019). The combination of the two studies validates the effectiveness of the proposed method under different scenarios.
We introduce the simulation setup in this subsection in which the efficacy and robustness of our proposed approach can be tested and compared to other state-of-the-art methods. We utilize three orthogonal planar surfaces representing the 3D coordinates and sample 100 points from each surface in a grid pattern as shown in Fig. 1(a). The 3D locations of these points are then disturbed via adding the random Gaussian noise with a zero mean and standard deviations at four different levels -{0.025, 0.05, 0.075, 0.1}. The smallest noise level corresponds to the highest signal-to-noise (SNR) ratio, surveying grade LiDAR noise while the highest noise level corresponds to depth cameras such as Kinect as shown in Fig. 1(b) and 1(c).
The simulation study is conducted 30 iterations at each noise level. In each iteration, we generate the dataset, apply the proposed approach to the datasets, and check the estimation performance. To illustrate the estimation uncertainty, we plot both the mean and ±3 standard deviation (corresponding to the 99% confidence intervals) over the 30 iterations. We showcase the training loss in Eq.(11) (in the left panels of Fig. 2 and Fig. 3) and the estimation bias of the normals from the ground truth, n − n (in the right panels of Fig. 2 and Fig. 3). Particularly, we compare the proposed method under the basic optimization setup (referred in the figures as no weight) and under the three weighting strategies (referred in the figures as weight dot prod, weight dist and weight dot prod dist respectively). We compare our proposed approach with our reproduction of the robust normal estimation proposed by (Sanchez et al., 2020) which is illustrated in a horizontal line in the difference plots. The left panels of Fig. 2 and Fig. 3 indicate that the proposed algorithm consistently converges under different choices of weighting strategies. Among them, weighting using a combination of dot product and distance shows a higher convergence speed than the other weighting strategies. The right panels of Fig. 2 and Fig. 3 imply that the weights using a combination of dot product and distance achieves the minimal bias comparing to the other weighting strategies. It is worth noting that the formulation without weighting results in an estimation bias of 1.90 since the normals near the boundaries between two different surfaces are not correctly estimated. With the weighting strategy using a combination of dot product and distance introduced, the estimation bias is reduced to about 0.30, demonstrating the necessity of the iterative weighting procedure.
For all the experiments conducted so far, we utilize a constant tuning parameter λ = 0.01. However, in other learning literature (for e.g. (Do et al., 2007)), it has been mentioned that the value of the penalty is more important than the choice of the regularization. In order to understand the effect of the hyperparameter, we perform a grid search for the tuning parameter λ over the set {0.001, 0.005, 0.01, 0.05} and showcase the loss values and estimation bias over all iterations for 10 randomly corrupted datasets at each level. The results for the loss and bias using the third weighting strategy for the different tuning parameters at all iterations are given in Fig. 4. The figure suggests to use a λ greater than 0.005 to achieve a low estimation bias. The figure also implies that a higher tuning parameter increases the convergence speed with the highest λ value leading to the fastest convergence. This is natural because a higher λ sets more regularization on the similarity of the estimated normals in neighboring points, which reduces the efforts in searching for an individual normal for each data point.

SynPeb dataset results
The SynPeb dataset was released by (Schaefer et al., 2019) as a comprehensive benchmarking dataset for planar estimation from depth images which got rid of some of the glaring problems of the original planar extraction benchmarking Seg-Comp Perceptron Dataset (Hoover et al., 1996), namely, errors in noise and labeling. Also, SynPeb can provide accurate normal information on account of it being a simulated dataset while being very complex. Although (Hoover et al., 1996) presented a cluster-based normal estimation method and showed a good estimation performance, the estimation procedure highly depends on the quality of clustering, which may not always be high in real applications.
Alternatively, our proposed formulation does not require any prior information on clustering. Given the 3D point clouds, we first apply the proposed method to estimate the normals, compute the pairwise dot products among all the estimated normals, and then directly apply any graph-based clustering method (Schaeffer, 2007) by using the pairwise dot product as the similarity metric between a pair of points. We set the threshold as 0.95 and consider a pair of points come from the same cluster when their corresponding pairwise dot product exceeds the threshold. We would like to point out that there might be additional noise based on the clustering algorithm chosen for this task but we report the entire error as part of normal estimation (assuming that the clustering algorithm is perfect). Table 1 provides the results of the SynPeb dataset by the proposed approach and compares it to the state-of-the-art methods that use the same dataset for plane extraction. The table indicates that our proposed approach outperforms the state-ofthe-art methods in terms of correct number of points extracted per plane as well as the root mean square error, including the best state-of-the-art method, PPE. On the other hand, the graph Laplacian regularization avoids the excessively large number of segmentations induced by neighboring normals not being similar. Compared to PPE, our proposed method does not constrain the number of planes and does not start with the clustering which can introduce additional error. Of course, as with PPE, the increased accuracy comes at the cost of increased computational speed and spatial complexity.

DISCUSSIONS AND CONCLUSIONS
We propose a method for estimating normals from LiDAR datasets using a graph-based formulation. We show that the graphbased formulation provides a natural way of extracting normals while being robust to measurement noise. The weighted version of the optimization algorithm also naturally takes the edge preservation into account by giving more preference to points in the neighborhood either with similar normals or those which are closer or a combination of both.
Our proposed method is truly a combination of the regression based methods and Voronoi based methods. We solve a single nonlinear optimization taking the neighborhood of each individual point in the point cloud as a locally planar surface similar to the PCA-based approaches. However, we also add a penalization based on the graph Laplacian of a k-NNG. Delaunay Triangulation, as used in the Voronoi based approaches, is a special case of a proximity graphs group of which k-NNG is also a member. Also, the graph Laplacian can be seen as a measure of the divergence of the gradient which is connected to the polar ball introduced in the second set of methods.
As mentioned previously, our proposed approach also generalizes multi-model planar fitting given by (Amayo et al., 2018). Their loss function is given as: L l=1 Ω ρ l (u, φ l (u)) + λωN R(∇N φ l (u))dΩ + βL (16) where the first term represents the cost of a point supporting a particular model with φ l (u) representing an indicator function for a point being part of a model. The second term promotes homogeneity with R being the penalty function for neighboring points not belonging to the same model. Thus, the second term can be seen as a specific instance of the k-NNG where only the immediate neighborhood is considered.
In this work, only k-NNG graph is explored. But the topic of RNG is very rich with different kinds of proximity graphs promoting different strengths of neighboring connections. Our aim in future work is to compare and contrast different proximity graphs. Another future work would be to utilize graph theoretical methods for registration of point clouds. We believe that this work would promote a larger focus into use of graphs in LiDAR data.  Among the header variables, f raction indicates the fraction of points correctly labeled; correct represents the percentage of correctly labeled points in each plane; RM SE represents how close the extracted planes represent the point cloud; α provides the mean angular deviation; no, nu, nm, and ns represent the number of oversegmented, undersegmented, missing, and spurious planes compared to the ground-truth segmentation. Joshi, U., Caputo, A., 2020. Lidar technology in manufacturing: Think ahead : Social innovation. https://social-innovation. hitachi/en-us/think-ahead/manufacturing/ a-simple-solution-for-complex-problems/.