AUTOMATED COARSE REGISTRATION OF POINT CLOUDS IN 3D URBAN SCENES USING VOXEL BASED PLANE CONSTRAINT

: For obtaining a full coverage of 3D scans in a large-scale urban area, the registration between point clouds acquired via terrestrial laser scanning (TLS) is normally mandatory. However, due to the complex urban environment, the automatic registration of different scans is still a challenging problem. In this work, we propose an automatic marker free method for fast and coarse registration between point clouds using the geometric constrains of planar patches under a voxel structure. Our proposed method consists of four major steps: the voxelization of the point cloud, the approximation of planar patches, the matching of corresponding patches, and the estimation of transformation parameters. In the voxelization step, the point cloud of each scan is organized with a 3D voxel structure, by which the entire point cloud is partitioned into small individual patches. In the following step, we represent points of each voxel with the approximated plane function, and select those patches resembling planar surfaces. Afterwards, for matching the corresponding patches, a RANSAC-based strategy is applied. Among all the planar patches of a scan, we randomly select a planar patches set of three planar surfaces, in order to build a coordinate frame via their normal vectors and their intersection points. The transformation parameters between scans are calculated from these two coordinate frames. The planar patches set with its transformation parameters owning the largest number of coplanar patches are identiﬁed as the optimal candidate set for estimating the correct transformation parameters. The experimental results using TLS datasets of different scenes reveal that our proposed method can be both effective and efﬁcient for the coarse registration task. Especially, for the fast orientation


INTRODUCTION
In the last decade, point clouds acquired via terrestrial laser scanning (TLS) are frequently utilized for various applications like urban planning, transportation management, 3D modeling, virtual reality, creating digital surface models, and environment monitoring (Vosselman and Maas, 2010).Especially for the task of the 3D reconstruction in large-scale urban areas, the point cloud has been recommended as the most suitable data source, as measured 3D points can provide spatial coordinates of surfaces directly, largely simplifying the following surface modeling process.However, to fully cover a large-scale urban scene, more than one laser scan of the scene is normally required, especially considering the occlusion of observations caused by the street buildings and the limited locations of scanner (Yang et al., 2016).Thus, prior to the further application of point clouds, a co-alignment of scans is theoretically mandatory in order to guarantee the full coverage of the research area.
Generally, for point clouds acquired by a well-calibrated TLS, a common registration method should solve two major issues: the extraction of geometric features and the identification of corresponding feature pairs (Habib et al., 2010).In theory, if the corresponding feature pairs are identified, the transformation parameters (i.e., three shifts and three rotations, without the consideration of scale changes) between reference frames can be well estimated.Nevertheless, the occlusion caused by the complex urban environment and low overlapping rate between scans make the automatic registration of point cloud a challenging problem.
Small overlapping areas and inferior data quality can cause a failing selection of geometric features.For example, outliers and missing points can significantly affect the reliability of the extracted features.For the TLS dataset itself, the points acquired can exhibit huge variations in density, decreasing quadratically with the increase of observation distances (Hackel et al., 2016), which may influence the extraction of geometric features as well.Besides, the complex and similar structures of urban scenes may also arouse the mismatching of corresponding feature pairs, due to similar building shapes and street scenes.On the other hand, since the registration of point clouds is computationally intensive, apart from effectiveness, efficiency is also vital to the point cloud processing and should be considered when coping with largescale datasets.
To tackle aforementioned problems, we propose an automatic marker free solution for coarsely aligning two scans of point clouds in the same scene, utilizing the geometric constraint formed by planar surfaces of building facades and ground surface under a voxel structure.The voxel structure is designed to suppress negative effects of outliers and uneven distributed density of points.Besides, using voxels instead of points to represent the data can largely improve the efficiency of processing as well.Compared with methods using points or lines as the features for registration, the planar feature which is a higher level geometric structure, deems more robust than the single key point or feature lines, since the determination of planar surfaces requires more geometric constraints (i.e., at least three co-planar points).To be specific, the determination of a key point or a line feature needs merely one and two points respectively, which corresponds to larger degrees of freedom (DOF).As we know, the Figure 1.Workflow of the proposed coarse registration strategy larger the DOF of the feature, the lower the reliability of the extracted feature will be.Moreover, in the case of urban scenes, the buildings and ground have plenty of planar features, facilitating the use of RANSAC strategy for searching feature pairs between corresponding scans.Considering all of these factors, the core concept of our proposal is to combine the advantages of the voxel structure and planar features, so that we can achieve a good balance between the efficiency and effectiveness for a coarse registration providing a good initial alignment.

Related work
To date, there is a wide variety of literatures providing solutions for registering different scans by the use of geometric features of point clouds.Generally, these registration methods can be classified into two essential categories according to the type of features used: the point-based approaches and the primitive-based ones.
For the point-based approaches, their basic concept is to find corresponding point pairs from different scans.For instance, the Iterative Closest Point (ICP) and its variants are the representatives, which are based on minimizing point-to-point distances in the overlapping areas between different scans (Besl and McKay, 1992, Habib et al., 2010, Al-Durgham and Habib, 2013).Similarly, the 4-point congruent sets (4PCS) and its variants, utilizing particular sets of congruent points with affine invariant ratio, are also typical methods (Aiger et al., 2008, Mellado et al., 2014, Theiler et al., 2014).The ICP-or 4PCS-like methods have been proven to be effective in terms of accuracy in plenty of applications.However, the matching of corresponding points requires is an time-consuming iterative process if no good initial alignment obtained.Instead of using all the points as candidates, the selection of key points is a solution that largely reduces the computation cost.For example, the SIFT key points (Böhm andBecker, 2007, Weinmann et al., 2011), DoG key points (Theiler et al., 2014), virtual intersecting points (Theiler and Schindler, 2012), FPFH key points (Weber et al., 2015), and semantic feature points (Yang et al., 2016).Although all these methods are generally able to align point clouds, the point-based methods are still sensitive to varying point density and noise, and they normally have problems in terms of efficiency when dealing with large-scale datasets.
For the primitive-based approaches, instead of using points, the geometric primitives formed by points (e.g., lines (Habib et al., 2005) or planes (Xiao et al., 2013)) are extracted as geometric features for the registration.As we have stated, the use of higher level geometric features can increase the robustness of identifying corresponding feature pairs.Lines, planes, and curved surfaces are the representatives of geometric features.Large numbers of studies using line features to register point clouds have been reported.For instance, the intersection lines of neighboring planes (Stamos and Leordeanu, 2003), 3D straight-lines (Habib et al., 2005, Al-Durgham andHabib, 2013), and spatial curves (Yang and Zang, 2014) are used as matching primitives.Whereas, plane correspondences (Dold and Brenner, 2006, Von Hansen, 2006, Xiao et al., 2012) and surface correspondences (Ge and Wunderlich, 2016) are frequently used as geometric primitives for alignment as well.Compared with the point-based methods, both lineor plane-based methods require abundant linear objects or smooth surfaces as candidates, which largely depends on the content of the scanned scenes.For the scene with only few buildings, they may meet problems when finding appropriate candidate features.The accuracy of extracting lines and planes will affect the registration result at same time.Besides, for the plane-based methods, the extraction of planar surfaces with region growing or modelfitting algorithm can be rather time consuming and unreliable, which largely limit the performance of the registration as well.
Recently, the number of researches using voxel structure in the field of point cloud processing is rapidly increasing.For the segmentation (Vo et al., 2015) or 3D modeling (Gehrung et al., 2016), plenty of studies has illustrated the advantages of using a voxel structure to represent the dataset.With respect to the registration task, in the recent work of Wang et al. (2016), the EGI features of voxel clusters are employed as correspondences for a coarse registration, showing effective results for aligning indoor scenes.This is an inspiration for us to use the voxel structure instead of points to organize the dataset, so as to get a good balance between effectiveness and efficiency.

Our contributions
The contributions and innovations of this work are listed as follows: 1) A voxel-based coarse registration frame is proposed, using a octree-based voxel structure instead of points to organize the entire point cloud, which largely decrease the computational cost and the robustness of extracted geometric features; 2) Based on the voxel structure, a geometric constraint using small planar patches is designed to rapidly estimate the local coordinate frame of a scan in the urban scene.Compared with conventional ways of extracting planar surfaces with model-fitting and region growing, our proposed planar constraint can be generated in a more efficient way with lower computational cost; 3) A novel RANSACbased strategy using planar patches sets between scans is developed, which overcomes the disadvantage of conventional planebased registration method in the search of corresponding plane pairs and improves the processing efficiency simultaneously.

METHODOLOGY
Conceptually, the implementation of our proposed coarse registration method consists of four core steps: the voxelization of the point cloud, the approximation of planar patches, the matching of corresponding plane sets, and the estimation of transformation parameters.In the first step, point clouds of individual scans are voxelized into the 3D grid structure with cubics.Then, the points set within each voxel is fitted with an approximated plane model.The planarity of all the patches is estimated and evaluated.Only those deemed as planar patches are selected for the following matching step.In the subsequent step, a RANSAC-based strategy is applied to randomly select candidate planar patches of different scans respectively.For two sets of candidate planar patches from different scans, a matching score will be estimated.Only the sets obtaining the highest score are assessed as the corresponding plane sets.In the last step, the transformation parameters are calculated by comparing the coordinate frames formed by the corresponding plane sets.The processing workflow is sketched in Fig. 1, with the key steps of involved methods and sample results illustrated.The detailed explanation on each step will be introduced in the following sections.

Voxelization of point clouds
To voxelize the point cloud of the scan, we adopt the octreebased voxelization to decompose the entire point cloud with 3D cubic grids.The reason of using the octree structure has been fully discussed in our former studies (Boerner et al., 2017, Xu et al., 2017b), namely the nodes of an octree structure have explicit linking relations, which can facilitate the traversal for searching the adjacent ones (Vo et al., 2015, Xu et al., 2017a).Note that in our method, all the voxels of one scan are of equal resolution (i.e., voxel size) (see Fig. 2), but the voxel sizes of different scans can be various.Besides, selecting the appropriate number of voxels generated from one scan is a trade-off between the efficiency of processing and the preservation of details.Generally speaking, the smaller the voxel, the more details kept.In this work, the number of voxels is determined according to the demands of our application empirically.

Approximation of planar patches
For the points set P = {p1, p2, ..., pn} within each voxel Vi, an implicit plane representation (Dutta et al., 2014) (see ( 1)) is adopted to fit the surface of points, which assumes an approximate plane via the normal vector Ni and the centroid Xi of Pi inside Vi.
< Ni, Xi > −d c i = 0 (1) where d c i stands for the distance from the origin to the approximate plane.Here, Ni is calculated via eigenvalues e1 ≥ e2 ≥ e3 ≥ 0 from eigenvalue decomposition (EVD) of the 3D structure tensor (i.e., covariance matrix) of the points coordinates.Once the approximated planar surface of each voxel is achieved, the planarity Pe of the surface (Weinmann et al., 2015) is assessed by the use of eigenvalues from EVD. Theoretically, the larger the Pe is, the better the planar surface is fitted.Here, a threshold α is introduced to select well fitted planar patches.

Matching of corresponding plane sets
The matching between corresponding plane sets is the most important part for estimating correct transformation parameters.However, in the case of urban scenes, the abundant planar surfaces (e.g., parallel facades) are highly likely to arouse the mismatching of plane pairs.To solve this problem, instead of traditional ways of finding the corresponding plane pair individually, similar to the work presented in (Theiler and Schindler, 2012), we use a triple of planar patch set, which forms a 3D corner in the urban scene (see Fig. 3), as a constraint to define a coordinate frame determining six degrees of freedom.Here, the normal vectors of the three planar patches of the corner are firstly calculated.Then, the coordinate frame is estimated by the cross product of these three normal vectors.Compared with the work of Theiler and Schindler (2012), we utilize only one triple set of planes as the constraint and our rotation parameters are estimated per the normals of planes rather than the set of intersecting points.By finding a pair of corners from different scans, we can define the transformation from one scan to another.By the use of two coordinate frames for different scans, the calculation of the transformation parameters is converted into two tasks: the estimation of rotations and a subsequent determination of shifts, which will be elaborated in the following section.
Theoretically, if we can find the corresponding corner pairs, namely the corner pairs representing the corresponding plane sets, the correct transformation can be calculated.To identify the correct corresponding corner pairs, we make an assumption that if the majority of the planar patches in one scan can be matched Figure 2. Octree decomposition of the 3D space Figure 3. Coordinate frame estimation from the corner of planes to those in another scan, we believe that the transformation parameters are estimated correctly from these corresponding corner pairs.In detail, a RANSAC-based strategy is applied, sampling a plane sets of three planar patches from two scans concurrently in each random selection round.The plane sets should satisfy that they are non-parallel planar patches.For each set of triple planar patches, a coordinate frame can be generated, an illustration of which is shown in Fig. 3.
In each round of the RANSAC process, we can obtain a set of transformation parameters.We apply the estimated transformation parameters to transform all the selected planar patches in the target scan.Then, a statistic of the number of coplanar patches is carried out.The coplanar patches PA and PB are assessed by two criteria in a given neighborhood of size d: angle difference θ and averaged distances (hA +hB)/2.Only if the planar patches meet both criteria simultaneously, they are regarded as a corresponding coplanar pair.In Fig. 4, we give an illustration of the coplanar criteria between patches.During the entire RANSAC process, the plane sets having the largest number of corresponding coplanar patches are identified as the optimal corresponding planar patch sets, and the estimated transformation parameters will be used as the final transformation parameters.

Estimation of transformation parameters
As we have stated in the previous section, the estimation of transformation parameters consists of the determination of rotations and the determination of shifts.For rotations, the angle differences α, β, and γ of three axis are calculated by comparing the vectors of axis in corresponding coordinate frames Fs and Ft (see Fig. 5).Then, the rotation matrix R is obtained as follows: (3) Then, the shift matrix T is calculated as follows: By applying R and T to the point cloud of the source scan, the source scan can be aligned to the target scan.

Refinement (optional)
By the use of the initial result of our proposed coarse registration method, the fine registration algorithms (e.g., ICP) can be applied to perform a fine registration between the transformed source and target scans, to obtain more accurate transformation parameters.

Experimental datasets
The proposed method for performing automated registration is tested using TLS datasets of various scanners and scenes.Specifically, two datasets are tested (see Fig. 6).The first dataset is from the TLS point cloud from the large-scale point cloud classification benchmark datasets published by ETH Zurich (Hackel et al., 2016).Whereas the second dataset we used is the TLS point cloud from the ThermalMapper project acquired by the Jacobs University Bremen (Borrmann et al., 2013).The detailed information of the datasets we used is listed in Table .1.  Table 1.Information of experimental datasets

Registration results
In Figs. 7 and 8, the coarse registration results of two experimental datasets are displayed.In these tests, the size of voxel is set to 1m, and the threshold for selecting the planar patches is set to 0.85.The maximum iteration number of RANSAC process is limited to 1 million times.In Table 2, we give the number of extracted and matched planar patches in different scans.By visual investigation, we can find that the scans of two scenes are aligned.Especially, the source scans are well oriented to the target scans, indicating a well estimated rotation parameters, which greatly facilitate the following fine registration process.However, when it comes to the translation parameters, we find that there is likely to occur a shift error between the target and transformed source scans (see Fig. 9a).One of the possible reasons for such a shift error is caused by the parallel planes existing in a urban environment.When forming a corner constraint with randomly selected planar patches sets, the planar patches extracted from parallel planes will result in ambiguities when searching corre-sponding plane pairs.In Fig. 9b, we give a sketch to illustrate such situations.In fact, for a coarse registration method, such a shift error can be easily corrected by a fine registration step.Thus, in our future work, we should add additional constraints (e.g., using two corners instead of only one corner as constraint) to remove such an ambiguity.Beside the shift error, for the proposed method, we note that two essential factors can significantly influence the quality of the reg-istration, in particular for the rotation estimation.The one is the geometric resolution of each voxel.Generally, the larger the voxel, the faster the registration will be.However, a large voxel size will lead to a more difficult selection process of qualified planar patches, because the surface of points within the voxel may be complex and therefore became non-flat.Another influential factor is the thresholds we have mentioned in Section 2.3, which is designed for selecting planar patches.In theory, the larger the thresholds, the less planar patches are extracted.It is true that less candidate planar patches can accelerate the RANSAC process for searching correspondences.However, a strict criterion for selecting planar patches may also decrease the representativeness of these patches, because some of the planes (e.g., rough building facade) in a urban scene may not have a strict planar surface.A high threshold will result in a negligence of such surfaces, which is counterproductive to the original intention of using the large amount of planar surfaces as constraints for aligning scans of urban scenes.To further quantitatively evaluate the performance of the rotation estimation, we create a set of point clouds with different known rotation parameters, in order to use them as ground truth.Since the majority of the TLS scans only have rotation differences in X-Y directions, the manually rotation we made is only around Zaxis with an incremental angle of 15 • , ranging from -30 • to 60 • .In Fig. 10, two examples of testing point clouds are shown.In this test, the classical ICP method, the FPFH feature based alignment (FA) (Holz et al., 2015), and the Normal Distributions Transform (NDT) algorithm (Magnusson, 2009) are used as baseline for comparison.The initial angle difference between scans for the ICP and NDT methods is set to 0 • .For our method, the size of voxel is fixed to 1m, and the planar threshold is set to 0.9.For the FA and NDT methods, the point cloud is downsampled with voxel grid filter (Rusu and Cousins, 2011) of 0.1m resolution.For the FA method, the radii for the estimation of the normal vector and the FPFH feature are 0.2m and 0.5m, respectively.For the NDT method, the minimum transformation difference and grid resolution are 0.01 and 0.5m, respectively.The baseline methods are implemented by the use of Point Cloud Library (PCL) (Rusu and Cousins, 2011).Our method is implemented in C++ on the basis of the existing octree functions of PCL.All the experiments are conducted on a computer with an Intel i7-4710MQ CPU and 16GB RAM.In Table 3, we give the results of tests using aforementioned manually rotated datasets.As seen from the table, our proposed method is more efficient than the classical ICP method and the FA method, especially for the cases with large rotation angles.While the NDT method shows the fastest speed of alignment.For our method, its accuracy also outperforms those of other methods, with the error of the estimated rotation angle smaller than 5 • .For a small ground truth rotation angle between -30 • and 30 • , our proposed method can achieve an estimation error of smaller than around 2 • , but for larger true rotation angle (e.g., 60 • ), the error grows.In contrast, the estimated angle errors of ICP and NDT increase drastically when the true rotation angle is larger than 15 • , when reveals an inferior reliability for aligning scans with large rotation angles.This is because that both the ICP and NDT methods are normally used as the fine registration method, requiring normally a good initial alignment, which is eligible as a fine registration step after the use of our method.Table 3. Experimental results using manually rotated datasets

CONCLUSION
In this work, we report an automatic marker free method for fast and coarse registration between point clouds using the geometric constrains of planar patches, combining the voxel structure and a RANSAC-based strategy for selecting the corresponding plane pairs.The experimental results using TLS datasets of different scenes have proved the effectiveness and efficiency of our proposal.Especially, for the task of the fast orientation between scans, our proposed method can be much more efficient than the classical ICP method and outperform some other baseline methods.The proposed method is a combination of the advantages of voxel structure and planar features, which achieves a good balance between the efficiency and effectiveness for a coarse registration providing a good initial alignment.However, there is still some problems that needed to be further investigated, for example, the reliability of the RANSAC process has not been considered, which is related to the finding of the max number of matches.Besides, the minimum requirement of overlap area between scans is not tested, which should be further investigated in our following work.In the future, we will correct the ambiguity caused by the parallel planes, and conduct thorough experiments to fully analyze the potential of combining the voxel structure and plane constraints for a fast alignment of scans.Comparisons to other registration methods on 3D descriptor and other planar patch based point cloud registration schemes will be conducted to explore the potential of the proposed technique.

Figure 4 .
Figure 4. Measuring of coplanar criteria between two patches

Figure 6 .
Figure 6.(a) Source and (b) target scans of Bremen datasets.(c) Source and (d) target scans of ETH datasets.

Figure 9 .
Figure 9. Illustration of ambiguities when identifying the translate parameters.(a) The translation error.(b) A sketch of the cause of ambiguity.