VOXEL-AND GRAPH-BASED POINT CLOUD SEGMENTATION OF 3D SCENES USING PERCEPTUAL GROUPING LAWS

: Segmentation is the fundamental step for recognizing and extracting objects from point clouds of 3D scene. In this paper, we present a strategy for point cloud segmentation using voxel structure and graph-based clustering with perceptual grouping laws, which allows a learning-free and completely automatic but parametric solution for segmenting 3D point cloud. To speak precisely, two segmentation methods utilizing voxel and supervoxel structures are reported and tested. The voxel-based data structure can increase efﬁciency and robustness of the segmentation process, suppressing the negative effect of noise, outliers, and uneven points densities. The clustering of voxels and supervoxel is carried out using graph theory on the basis of the local contextual information, which commonly conducted utilizing merely pairwise information in conventional clustering algorithms. By the use of perceptual laws, our method conducts the segmentation in a pure geometric way avoiding the use of RGB color and intensity information, so that it can be applied to more general applications. Experiments using different datasets have demonstrated that our proposed methods can achieve good results, especially for complex scenes and nonplanar surfaces of objects. Quantitative comparisons between our methods and other representative segmentation methods also conﬁrms the effectiveness and efﬁciency of our proposals.


INTRODUCTION
Point clouds obtained via laser scanner, photogrammetry, and range imaging cameras are widely used to represent 3D spatiality information of scenes, and applied in a wide variety of fields, including geodesy, geomatics, geology, forestry, and archeology.For all the mentioned applications, the 3D scene reconstruction is drawn increasing attention for many related tasks such as constructing virtual reality, creating digital surface models, or monitoring construction projects.In particular, point clouds have been proved to be a suitable data source for the task of recognizing and reconstructing geometric objects from 3D scenes, as 3D points measured can provide 3D coordinates of objects directly.However, for most of the indoor and outdoor scenes, they normally consist of different objects, combinations of complex structures, surfaces, and sections.Thus, in practical, individual objects are commonly identified from the scene prior to the recognition procedure.
To this end, for unstructured raw point clouds, segmentation are normally adopted to partition the 3D scene into meaningful segments (e.g., the group of points having geometric consistency).An effective segmentation algorithm can facilitate the removal of disturbances and largely release the burden of work, but the performance of conventional algorithms is always restrained by the complex environment of real outdoor scenes.The occlusion frequently occurring in the dataset of outdoor scene also limit the performance of commonly used methods, as most of the segmentation criteria use merely the pairwise information between elements (e.g., normals of points), which is sensitive to missing points and incomplete structures caused by occlusions.Moreover, the data quality is also a leading cause of inferior * Corresponding author segmentations.For instance, outliers and uneven points density can significantly affect the results resorting to point-based geometric features (e.g., normal vector).Hence, apart from the effectiveness, the reliability plays a vital role in the development of segmentation algorithms as well.On the other hand, as the point cloud segmentation is computationally intensive, efficiency is also crucial to the point cloud processing and should be considered when coping with large-scale datasets.
To address those aforementioned problems and to efficiently acquire geometric segments from large-scale point clouds, we present a novel point cloud segmentation strategy combining the voxel structure and graph-based clustering using the perceptual grouping laws, which has not been applied for point cloud segmentation so far.The voxel structure is designed for suppressing negative effects of outliers and uneven distributed densities.We adopt the octree-based voxelization to organize the point cloud, facilitating the traversing of neighborhoods.Using voxel structure to represent points can improve the efficiency of processing as well.The graph-based clustering is to cluster voxels into segments, the connection of each voxel is estimated via the graph model encapsulating the local information of its neighborhood.What is more, a novel strategy is proposed to encode weights of graph edges by adopting the perceptual laws, which also termed as gestalt principles.Based on these ideas, we present two segmentation methods, namely voxel-and graph-based segmentation (VGS) and supervoxel-and graph-based segmentation (SVGS).We evaluate our proposed methods by performing experiments, with qualitative and quantitative results compared with those of the state-of-the-art segmentation algorithms.We also conduct experiments using various datasets, namely laser scanned and photogrammetric point clouds from the same scene, in order to compare and analyze the performance of approaches when coping with datasets from significantly different sources.

Related work
The point cloud segmentation has been studied and explored for decades, with methods and algorithms in different disciplines including computer vision, computational geometry, robotics, photogrammetry, remote sensing, machine learning and statistics exploited (Vosselman and Maas, 2010).Summarily, the relevant point cloud segmentation approaches can be grouped into three major categories: the model-based methods, the region growingbased methods, and the clustering-based methods (Vo et al., 2015).
The model-based methods evaluate the points in terms of their geometric features (e.g., spatial position and normal vector) in a local or global scale using parametric models.The points meeting the criteria of fitting parametric models (either in spatial or parametric domain) are segmented from the point cloud as one individual object.The 3D Hough Transform (HT) (Ballard, 1981) and the RANSAC (Schnabel et al., 2007) are two kinds of widely used algorithms (Vosselman, 2013).The HT and it variations utilize a voting strategy for extracting planes (Vosselman et al., 2004), cylinders and spheres (Rabbani et al., 2006) from the point cloud in the parameter domain.Whereas RANSAC and its extensions directly estimate optimal parameters of the geometric models in spatial domain (Schnabel et al., 2007).The modelbased methods commonly deem robust to noise and outliers and provide optimized parameters for modeling simultaneously.Nevertheless, when dealing with large-scale datasets, they normally require normally a large computational cost caused by the iteration process of robust estimator or the voting procedure, leading to high memory consumptions (Vo et al., 2015).Besides, challenges arise they are used to segment objects having no explicit mathematical expressions like irregular curvature surfaces.
The region growing-based ones iteratively examine points in regions of initial seeds and checks whether they belong to the group of the seed or not via a given criteria.The growing criteria and the selection of seeds are two influential factors for this kind of methods.The normal vectors consistency (Tóvári and Pfeifer, 2005), the smoothness of surface (Rabbani et al., 2006), and the curvatures (Besl and Jain, 1988) of the points are commonly used growing criteria.Recently, in Nurunnabi et al., (2012), the Principal component analysis (PCA) based local features are also used as growing criteria for their saliency and distinctiveness.For the selection of seeds, the density of seeds determines the size of segments while the location of seeds significantly affect the quality of segments.The region with the smallest curvature (Nurunnabi et al., 2012) or the surface with minimal residual of a plane fitting (Rabbani et al., 2006) are frequently identified as seeds, in order to avoid the boundaries and edges.Theoretically, region growing-based methods can keep the boundaries of surfaces well, but they are sensitive to noise and outliers.For example, over-segmentation can easily occur for large curvature objects (e.g., pipes with a long radius elbow joint) although the surfaces of which are smoothly connected (Su et al., 2016).On the other hand, their performances largely resort to the selection of seeds (e.g., the location and distribution of seeds).
The last major kinds are the clustering-based ones.This kind of methods examine the neighboring points in a defined neighborhood by their proximity or similarities in the attribute or spatial spaces on the basis of the geometric characteristics and spatial coordinates.Points having a proximity or similarity lower meeting the acceptable threshold are assessed as connected ones, which will be aggregated into one cluster.Euclidean distance (Aldoma et al., 2012) and normal vector (Vo et al., 2015) are representative instances used as criteria for clustering.For the clustering algorithms, the k-means (Morsdorf et al., 2003), meanshift (Comaniciu and Meer, 2002), and connected relations (Stein et al., 2014) are mostly adopted ones.Unlike region growingbased methods, the clustering-based ones require no seeds.Note that the computational cost of clustering-based methods lies on the complexity of calculating the similarities or proximity of points.Complex clustering criterion will greatly increase the computational burden.Besides, the setting of clustering thresholds is also influential to the granularity of clusters segmented.
Recently, there is a tendency that the clustering of points is also formulated as graph construction and partitioning problems.The graph model can explicitly organize the elements (e.g., pixels or points) with a mathematical sound structure (Peng et al., 2013), encapsulating the contextual information for deducing hidden information from the given observations.Representative examples include the graph-based approaches such as Min Cuts (Golovinskiy and Funkhouser, 2009), and Graph segmentation (Green and Grobler, 2015) and the Markov-based approaches like the Markov Random Field (MRF) (Hackel et al., 2016a) or Conditional Random Field (CRF) (Rusu et al., 2009).For graphbased methods, a large topology radius of constructed graphs can provide better results in segmentation, but a dense and large graph yields a heavier computational cost (Cour et al., 2005).
In addition, the voxel-based segmentation methods draw increasingly attention recently.Instead of using points as basic units, 3D regular cubes occupied by points are used as basic segmentation elements (Wang and Tseng, 2011).The octreestructured voxelization is the most commonly used approach.In Vo et al., (2015), the octree structure and the region growing process are combined for the fast surface patch segmentation.Whereas, the octree-based voxel structure combined with graphbased sub -splitting is applied to segment cylindrical objects in industrial scenes (Su et al., 2016).Using voxel structure apparently reduces the computation cost and suppress negative effects of outliers and varying point densities.Even so, selecting an appropriate resolution of voxel is crucial to the accuracy of segments and preservation of details.Lately, the supervoxel strategy is introduced and applied to the basic voxel structures, better preserving the boundary features of segments and further improving the computation efficiency (Stein et al., 2014, Pham et al., 2016, Ramiya et al., 2016).However, the supervoxel method is merely an over-segmentation of data, how to cluster over-segmented patches into segments is still a challenging task.

Our contributions
The following are the contributions that are specific to this work: 1) A bottom-up point cloud segmentation strategy, combing the voxel structure and graph-based clustering encoding the local contextual information, is proposed.Two novel segmentation methods (i.e., VGS and SVGS) are reported, and they are proved to be effective and efficient for 3D scene segmentation.2) Instead of using conventional criteria, the perceptual grouping laws are adopted to assess geometric cues used in our methods, providing a purely geometric and unsupervised solution for segmentation.3) Experiments using both laser scanned and photogrammetric point clouds of the same scene are conducted.The performance of proposed methods coping with datasets from different sources is analyzed.

OVERVIEW OF METHODOLOGY
Conceptually, the implementation of our proposed segmentation strategy concerns three core steps: the voxelization of point cloud, the calculation of geometric cues, and the graph-based clustering.In the first step, the entire point cloud is voxelized into the 3D grid structure.For the VGS method, the voxel is the basic unit for segmentation, while for the SVGS method, voxels will be further clustered into supervoxels as basic units, having geometric consistency and spatial dependency.In the subsequent step, in order to estimate the geometric cues between basic units (i.e., voxels or supervoxels), the saliency of each basic unit is calculated by the use of points set within it.Depending on these saliencies, geometric cues between basic units are estimated according to perceptual grouping laws, so that the affinity between voxels or supervoxels can be assessed by the homogeneities of geometric cues, which will be further used for weighting edges in the graph model.In the last step, the graph-based clustering is conducted to merge voxels or supervoxels in terms of their affinity under a greedy frame, in order to generate complete segments.The graph model is constructed for each basic unit in its vicinity, encoding the local contextual information in the form of adjacency graph.By applying the graph segmentation algorithm, the connectivity of each unit can be estimated, so that all the connected units can be aggregated into complete segments by a simple clustering.The processing workflow is sketched in Fig. 1, with the key steps of involved two methods and sample results illustrated.The detailed explanation of VGS and SVGS methods will be introduced in the following sections.

VOXEL-AND GRAPH-BASED SEGMENTATION
The VGS method is the basic solution implemented via our strategy, utilizing the voxel structure and the fully connected local graph, reported in our recent work (Xu et al., 2017).

Voxelization of point cloud
In this work, we adopt the octree-based voxelization to rasterize the entire point cloud with 3D cubic grids.Under the octree structure, the nodes have explicit linking relations, which facilitates the traversal for searching the adjacent ones (Vo et al., 2015).It is noteworthy that selecting the size of voxels is a tradeoff between the efficiency of processing and the preservation of details.Generally speaking, the smaller the voxel, more details will be kept.In our work, the size of voxel is determined according to the demands of application empirically.

Calculation of geometric cues
Geometric cues stand for the geometric relations between two voxels, including two steps: voxel saliency estimation and geometric cues using perceptual laws.

Voxel saliency estimation
The saliency of each voxel can be regarded as the unary feature of each voxel delineating the points within it, including three factors: the spatial location, the geometric features, and the normal vector of the points.
The spatial position refers to the spatial coordinates of the centroids X of points within a voxel V .For geometric features, the eigenvalue based geometric features (Weinmann et al., 2015) are used, delineating the 3D properties of points inside a voxel, related to the local shape features encapsulating the linearity Le, the planarity Pe, the scattering Se, and the change of curvature Ce (Weinmann et al., 2015).These four feature sets are calculated via eigenvalues e1 ≥ e2 ≥ e3 ≥ 0 from eigenvalue decomposition (EVD) of the 3D structure tensor (i.e., covariance matrix) of points coordinates.As stated in Weinmann et al. (2015), Le, Pe, and Se represent 1D, 2D, and 3D features of points, respectively, whereas Ce reflects the curvature of the surface.For the normal vector N of points within V , it is obtained from the eigen vectors of the points.Considering noise and outliers always existing in point clouds, the estimation of eigenvalues and eigenvectors will be susceptible to such disturbances.We adopt the weighted covariance matrix proposed in (Salti et al., 2014), assigning smaller weights to distant points in the covariance matrix of coordinates.

Geometric cues using perceptual laws
Perceptual grouping laws has a long history of use in the field of computer vision for recognizing objects from the scene, refering to determination of regions of the visual scene belonging to the same part of higher level perceptual units (Richtsfeld et al., 2014).Three representative principles of the grouping laws are selected as our clustering criterion: proximity, similarity, and continuity.
The proximity principle states that elements are likely to be categorized into a same group if they are close to each other.
Whereas the similarity principle claims that elements tend to be summed into a group when they resemble each other.For the continuity principle, it indicates that the oriented elements are considered to be integrated into one part in case that they can be aligned with each other.
To measure the proximity of Vi and Vj, we utilize the Euclidean distance D s ij = || Xi − Xj|| between the centroids Xi and Xj of Vi and Vj.Since the shape similarity denotes the conformity between the shapes of points within voxels, the stronger the similarity between the geometric features of voxels, the more similar the points within the voxels are.For the D s ij between Vi and Vj in this 4 dimensional feature space is calculated using the histogram intersection kernel (Papon et al., 2013).For the connectivity, it corresponds to the smoothness (Awrangjeb and Fraser, 2014) and convexity criterion (Stein et al., 2014) formed by the points surfaces of adjacent voxels.In Fig. 2, we illustrate three typical connections between voxels.The smoothness is defined by the angle difference of normal vectors Ni and Nj.The convexity criterion stands for the 3D concave or convex relationship connecting surfaces formed by the points of two adjacent voxels, inferred from the relation of Ni and Nj and the vector dij joining their centroids Xi and Xj.As shown in Figs. 2, the angles αi and αj are calculated, where dij = ( Xi − Xj)/|| Xi − Xj||.If αi − αj > θ, the surface connectivity is defined as a convex connection, where θ is the threshold for convexity judgement.Otherwise, it is a concave connection.Similar to work reported in (Stein et al., 2014), we also assume that for one object the convex connection should be preserved while the concave connection should be disconnected on the basis of the degree of the convexity criterion.The "stair-like" surfaces (see Fig. 2a) are highly likely to be parts of different objects and should be disconnected.Considering these three situations, the surface connectivity D c ij is calculated according to Eq. 1, giving the blunt convex or smooth connected surfaces a higher proximity value, while for the concave connected surfaces a constant penalty so that they are likely to be determined as disconnected.θ is calculated by a sigmoid function determined by the difference of αi and αj, following the description in (Stein et al., 2014).

Graph-based clustering
In many former work, the connection of voxels are merely identified by the relation between two adjacent voxels, with their similarity or normal vector used (Wang and Tseng, 2011), (Papon et al., 2013).However, due to the complex environment of the 3D scene, the assessment of connections considering only information a voxel pair seems unreliable.To that end, we introduce the graph theory to assess the connections of a center voxel considering all the neighboring voxels in a neighborhood of the center voxel simultaneously.Thus, a fully connected local graph G = (V, E) is constructed as shown in Fig. 3.

Fully connected local graph
For the fully connected local graph, voxels are set as vertices V while the edges E are linked between all the vertex pairs.For the central voxel, its adjacent voxels belonging to the same group after the graph segmentation are regarded as the connected ones.The weight wij ∈ [0, 1] between Vi and Vj is defined by integrating affinities Dij between voxels calculated via a multiplicative form as they are independent: where λs, λe, and λc denote the bandwidth of the Gaussian kernel, controlling the importance of the spatial distance, the geometric similarity, and the surface connectivity, respectively.In our cases, all of them are set to 0.1 equally.

Graph-based segmentation
Once the graph is constructed, we can achieve the connection of voxels by optimization method, namely the partition of the graph.For this purpose, the graph-based segmentation method is introduced by adapting the algorithm proposed in (Felzenszwalb and Huttenlocher, 2004).
Here, the segmentation C is to partition voxels V (i.e., the vertices in the graph) into segments S ∈ C equating with the connected components in the graph.As the initial step, every vertex Vi is regarded as one segment Si.The edges are sorted in ascending order according to their weights.Then, the graph is partitioned via a recurrently process by comparing the weight w of an edge with the maximum internal difference Ii of a segment Si.For vertices Vi ∈ Sm and Vj ∈ Sn of an edge Eij, if the weight wij is larger than the threshold τmn, then the Sm and Sn will be merged as one segment.Here, the threshold τmn is estimated as follows: where |S| denotes the size of the segment S and δ is a constant parameter setting the initial threshold value.In the extreme case, if |Sm| = 1 and |Sn| = 1, then τmn = δ .This merging process is performed repeatedly by traversing all the edges.In Algorithm.
of the graph partition, in the neighborhood of a center voxel, its connections can be identified by the group of nodes in the graph.

Clustering of connected voxels
Once the connections of all the voxels are identified, the connected voxels are merged into one segment.This merging process is performed repeatedly by traversing all the voxels, with a depth-first strategy.In the neighborhood of a center voxel, its connections can be identified by the group of nodes in the graph, then all the connected voxels are aggregated into one segment.In addition, a cross validation process is carried out to ensure the correctness of connections.For adjacent Vi and Vj, after segmenting the graph of Vi, if Vi is identified as connected to Vj, then in the segmentation of graph of Vj, Vj should be connected to Vi in turn.Otherwise, they are disconnected.

SUPERVOXEL-AND GRAPH-BASED SEGMENTATION
The SVGS method is an improved solution utilizing the supervoxel structure and the local affinity graph, improved from our former work (Xu et al., 2016).It has three significant differences compared with the VGS method.Firstly, the supervoxels are used as basic units for clustering into segments, instead of directly using voxels.Secondly, the definition of graph is different.We define a local adjacency graph rather than the fully connected graph used in VGS.At last, the clustering of connected basic units, here, the aggregation of supervoxels is conducted resorting to the merging of adjacency graphs.

Supervoxel generation
The generation of supervoxels is carried out by the Voxel Cloud Connectivity Segmentation method (VCCS) (Papon et al., 2013), clustering the voxels of points in terms of the distance between the seed and candidate voxels in a feature space, involving geometrical features, and RGB colors (Papon et al., 2013).Slightly different from the way described in (Papon et al., 2013), we merely use normal vectors and spatial coordinates of voxels to define the distance, which is related to the proximity and continuity principles.The VCCS we used is implemented and tailored from the Point Cloud Library (PCL) (Rusu and Cousins, 2011).One of the most significant advantage of VCCS is the boundary preservation performance (Papon et al., 2013), so that we can obtain the supervoxels sharing same boundaries with the major structures of objects in the scene.Note that, the size of the voxel and the resolution of seeds can greatly affect the performance of VCCS.The former one determines the details preserved in the scene, while the later influences the effectiveness of keeping boundaries.Empirically, we set these factors according to the densities and the varying range from the sensor to the objects.

Local adjacency graph
To apply the graph model to the supervoxel structure, we define a local adjacency graph for each supervoxel encoding all the neighboring supervoxel in a local vicinity, so that the connectivity of two adjacent supervoxels can be assessed in a context-aware way.In detail, for each supervoxel Vi, all its n neighbors with a spatial distance between centroids smaller than a given radius Rc are counted as the candidate ones for building the contextual graph Gi = {V, E}, which is represented in the form of nodes.
The spherical space defined by Rc is termed as the local context of each supervoxel.For each node, only these edges connecting its adjacent ones will be considered.The weights of edges E are estimated by the use of aforementioned geometric cues in a same way like VGS.The partition of the local adjacency graph G is the same like that of VGS, using the graph-based segmentation (Felzenszwalb and Huttenlocher, 2004), by which the segmented graph G + can be obtained.In G + , the connected nodes representing the connectivity of supervoxels.

Aggregation of supervoxels
To aggregate the supervoxels, all the segmented local adjacency graphs are traversed and checked.For these segmented graphs having common nodes (see Fig. 4, the node V k shared by the graphs G + i and G + j ), they will be merged into one large graph G, encoding the connection information of nodes within it.At last, for each merged graph G, all the supervoxels represented by the connected nodes will be aggregated into a complete segment as shown in Fig. 4.

Experimental datasets
To test our proposed methods, point clouds acquired from two different scenes are used.One is a general outdoor building facades scene (see Fig. 5a), which is part of the terrestrial laser scanning point cloud from the large-scale point cloud classification benchmark datasets published by ETH Zurich (Hackel et al., 2016b).The other one is about a construction site (see Fig. 5b) located in the downtown of Munich, Germany, having both laser scanned and photogrammetric point clouds (see Figs. 5c  and 5d).The testing area of which is around 320 m 2 , including the foundation pit, ground objects, construction equipment, et al.The terrestrial LiDAR point cloud is surveyed via Leica HDS 7000, while the photogrammetric point cloud is generated from a structure from motion (SfM) system and multi-view stereo matching method (Tuttas et al., 2014), using a Nikon D3 DLSR camera with 105 images.The statistical outlier removal filtering (Rusu and Cousins, 2011) is applied to the point clouds prior to the main processing.The sizes of LiDAR and photogrammetric point clouds are both around nine million points.To evaluate the performance of our method, four representative segmentation algorithms, including the Euclidean distance and difference of normal (DON) based clustering (Ioannou et al., 2012), the smoothness based region growing (RG) (Rabbani et al., 2006), and the Locally Convex Connected Patches (LCCP) (Stein et al., 2014) are used as reference methods, implemented by the use of Point Cloud Library (PCL) (Rusu and Cousins, 2011).The quantitative evaluation is conducted by comparing the segments against the manually segmented ground truth (see Fig. 6) using the approach described in Awrangjeb and Fraser (2014) and Vo et al. (2015).Three standard metrics, P recision, Recall, and F1 -score, which are calculated via the true positive (TP), the true negative (TN), the false negative (FN), and false positive (FP), are introduced to assess the quality of segmentation.

Results of building facade scene
In Fig. 7, segmentation results of VGS and SVGS using the LiDAR point cloud in the building facade scene is illustrated, with segments rendered with different colors.Seen from the figures, the ground and wall surfaces, decks, fences, and window sills are segmented from the whole scene as individual objects.
Comparing the results of these two methods, it is clear that the result of VGS method tends to be the over-segmented one, namely the details of a complete structure are segmented as individual parts.In contrast, the result of SVGS method is more like the under-segmented one, which prefers to keep the large object as a complete segment, for example, the neighboring surfaces of the same facade are recognized as one planar surface.It is also noteworthy that in the result of SVGS, many small details are merged as larger objects and preserved in the output, for example, the window frames.However, for that of VGS, the oversegmented objects consisting of merely one voxel are removed as outliers from the output.Of cause, this is counterproductive to the completeness of the output results.To carry out a quantitative evaluation, we compare our methods with reference methods using the manually segmented ground truth data, consisting of 33 segments.Here, the voxel resolution used in VGS, SVGS, and LCCP is 0.  For the test in the scene of construction site, segmentation results of point clouds generated by LiDAR and photogrammetry are illustrated in Fig. 8, with VGS and SVGS methods used.Similarly, different segments are rendered with varying colors.The parameters of methods are same as the ones used for the case of building facade.It appears that, the environments of the construction site scene is much more complex than that of the building facade, which significantly increases the difficulties of segmentation.This can also be proved from the results, which are obviously inferior to the result of building facade case.Comparing the results of using LiDAR and photogrammetric datasets, we can easily find that, for segmenting the major structures of the given point clouds, the result of using LiDAR data is much better than that of using photogrammetry.One of the possible reason is that, unlike the LiDAR points, the positions of photogrammetric points normally have larger errors due to the stereo matching process, which may decrease the accuracy of spatial positions of these points.Moreover, the VGS method shows better performance using photogrammetric dataset, when compared with that of SVGS methods, especially in the preservation of concave and "stair-like" connections.This is because for the SVGS method, the generation of supervoxels are sensitive to the higher percentage of noise and outliers existing in the photogrammetric dataset, as they are clustered by the use of normal vectors.
For the quantitative evaluation, as listed in the Table .2, our VGS and SVGS methods can outperform the other methods, with F1 -scores larger than 0.7, for both LiDAR and photogrammetric datasets.Interestingly, for the testing sample point cloud, the testing results of photogrammetric datasets are even better than those using LiDAR ones, for both VGS and SVGS methods, according to the F1 -scores.One of the possible explanation for this phenomenon is due to the ground truth we used.Since the manually segmented ground truth of photogrammetric dataset is rougher than that of LiDAR dataset, it may influence the correcteness of the evaluation.For the photogrammetric dataset we used, it is difficult to manually segment the point cloud even for our human vision because of its quality.This phenomenon can also be observed from the results comparisons of using other reference methods.Therefore, in our future work, for providing more convincing evaluation results, a reliable ground truth is necessary.But then again, although the comparison using different ground truth datasets is not appropriate, the evaluation using the same ground truth can still support the superior performance of our proposed methods.

CONCLUSION
In this paper, we report a strategy for point cloud segmentation, using voxel structure and graph-based clustering with perceptual laws, which allows a learning-free and completely automatic but parametric method for segmenting 3D point cloud.The experiments using different datasets have demonstrated that our proposed methods can achieve segmentation results effectively and efficiently, especially for complex scenes and nonplanar surfaces of objects.In addition, quantitative comparisons between our method and other representative segmentation methods also validate the superior performance of our methods.

Figure 1 .
Figure 1.Workflow of voxel-and graph-based segmentation strategy

Figure 5 .
Figure 5. (a) LiDAR point cloud of the building facade scene.(b) Real scene of the construction site.(c) Photogrammetric and (d) LiDAR point clouds of the construction site scene.
1 m, equaling to the radius of normal vecotr estimation in RG and the small radius of normal estimation in DON.The seed resolution of supervoxel in SVGS and LCCP is 0.2 m, equaling to the graph size used in VGS and the large radius of normal estimation in DON.The graph size of SVGS is 0.4 m.As shown in

Figure 8 .
Figure 8. Segmentation results of construction site using (a) VGS and (b) SVGS methods with LiDAR dataset, and using (c) VGS and (d) SVGS methods with photogrammetric dataset.

Table 1
, our proposed methods can outperform other reference methods according to the F1 scores, with the value reaching around 0.81.It is noteworthy that the result of RG method is comparable with those of our methods, but when it comes to the execution time, our methods are more efficient.

Table 1 .
Evaluation of segmentation results of the building facade dataset 5.3 Results of construction site scene

Table 2 .
Evaluation results of the construction site dataset