Towards Eﬀicient Indoor/Outdoor Registration Using Planar Polygons

: The registration of indoor and outdoor scans with a precision reaching the level of geometric noise represents a major challenge for Indoor/Outdoor building modeling. The basic idea of the contribution presented in this paper consists in extracting planar polygons from indoor and outdoor LiDAR scans, and then matching them. In order to cope with the very small overlap between indoor and outdoor scans of the same building, we propose to start by extracting points lying in the buildings’ interior from the outdoor scans as points where the laser ray crosses detected fac¸ades. Since, within a building environment, most of the objects are bounded by a planar surface, we propose a new registration algorithm that matches planar polygons by clustering polygons according to their normal direction, then by their offset in the normal direction. We use this clustering to ﬁnd possible polygon correspondences (hypotheses) and estimate the optimal transformation for each hypothesis. Finally, a quality criteria is computed for each hypothesis in order to select the best one. To demonstrate the accuracy of our algorithm, we tested it on real data with a static indoor acquisition and a dynamic (Mobile Laser Scanning) outdoor acquisition.


INTRODUCTION 1.1 Context
The Building Indoor/Outdoor Modeling (BIOM) project aims at automatic, simultaneous indoor and outdoor modelling of buildings from heterogeneous data.The heterogeneity is both in data type (image and LiDAR) and acquisition platform (terrestrial indoor/outdoor and aerial).State of the art approaches generally deal with either the indoor or the outdoor, and often use strong priors of parallelism and orthogonality that are not necessarily verified.
Terrestrial laser scans can be acquired in static or dynamic mode, i.e., from static stations or a mobile mapping system.In order to cover all the faces of the object, several points of view are necessary.In this work, we assume the outdoor (mobile mapping) scan to be the reference as its georeferencing system has direct access to GNSS data, so our problem is to register the indoor scans together and with the outdoor scan.The indoor scans are each defined in a local reference frame relative to the laser scanner.The registration consists in referencing all these point clouds in the same coordinate system.The goal is therefore to determine the geometric transformation (rotation, translation) necessary to bring the data in coherence.According to (Monnier et al., 2013) the registration technique can be decomposed in: 1. Feature extraction from each dataset.
2. Feature matching 3. Using matched features to determine the optimal transformation to apply.
The main contribution of this paper is to leverage on the prior that man made object present large planar parts which give two rotational and one translational constraint, such that matching three pairs of polygons (with independent normals) is enough to recover a 6D (rotation+translation) transform between two scans.

State of the art
The ICP (Iterative Closest Point) algorithm is considered the most used approach in the registration of point clouds (Besl, 1992).ICP starts with two overlapping point clouds and initial guess.The transformation parameters can be iteratively estimated by generating pairs of corresponding points and minimizing the error metric.The major disadvantage of this method is the convergence towards a local solution if the initial data are not spatially close or if the initial transformation is poorly estimated.
Several variants of the ICP algorithm have been proposed to improve its robustness such as using a point-to-plane error metric.At each iteration of the algorithm, the relative pose that gives the minimal point-to-plane error is usually estimated using a standard nonlinear least-squares methods.(Low, 2004) proposes an approximation of the nonlinear optimization problem with a linear least-squares one that can be solved more efficiently.An extension of the ICP framework to nonrigid registration that uses the same convergence properties of the original algorithm was proposed in (Amberg et al., 2007).Another approach (Sharp et al., 2002) proposes to use Euclidean invariant features in a generalization of ICP registration of range images.
To find the correspondence of 3D range camera, the authors proposed to use either spherical harmonics or the second order momentum.A new method for detecting uncertainty in pose has been introduced in (Gelfand et al., 2003), where the transformations that can cause unstable sliding in the ICP algorithm have been estimated using a sampling strategy and the points that best contain this sliding have been picked.Other noniterative methods have dealt with the problem of registration of 3D point clouds, such as the method proposed in (Theiler et al., 2012), where the authors used virtual tie points generated by the intersection of three non-parallel planes in two different scans.The objective of this registration algorithm is to search for the assignment which preserves the geometric configuration of the largest possible subset of all tie points.The authors of (Forstner, Khoshelham, 2017) have proposed methods to efficiently register point clouds by introducing new optimal and sub optimal direct solutions based on plane-to-plane correspondences for determining the relative motion.A fast 3D point cloud registration method has been proposed in (Xiao et al., 2012), where the objective is to maximize the spherical correlation on S 2 .A big planar patches have been employed as attributes to find the maximum using a novel search algorithm.
Other works based on deep learning techniques have been proposed in the literature, such as the method proposed in (Elbaz et al., 2017), where the authors selected super points using a Random Sphere Cover Set and then matched them.A deep neural network auto-encoder has been used to encode local 3D geometric structures.

Data
The data used for this study was acquired by two different means (Mobile LiDAR Scan for the outside and static scans for the inside) on the Zoological Museum of Strasbourg.

Outdoor data
The outdoor data used to experiment our method is a Mobile LiDAR Scan (MLS) (cf Fig. 1 acquired with the Stéréopolis II Mobile Mapping System (MMS) (Paparoditis et al., 2012).The acquisition system gives access to the sensor topology inherent to such MLS acquisitions, that is usually lost during export to formats such as .lasor .ply.The data was collected from three streets, north, south and east of the museum, the west fac ¸ade facing a park inaccessible to the MMS.Each outdoor scan contains approximately 3 million points.

Indoor data
The indoor data used in our study is composed of 30 static LiDAR scans of the inside of the Musée Zoologique, one or two per room.Each scan consists of roughly 500 million points, which were downsampled to around 2 million points for practical reasons.

Contributions
In this paper, we propose a new method for the indoor/outdoor registration that consists in first, extracting indoor and outdoor  polygonal parts from the data, and then matching these polygons.The interest of polygons relative to planes is that they have a spatial extent limited to the areas where they have supporting points in the input data, so they form a simple and compact summary of our LiDAR scans.Given the specificity of LiDAR data that pass through windows, we propose to start with the extraction of the buildings' interior points captured from external scans; these are the points where the laser ray crosses the fac ¸ades through apertures, mostly windows.Afterwards, we perform the registration.The work carried out has confirmed that the environment and the type of data drive the choice of the registration algorithm.For example, in man-made environments, where most objects are bounded by planar surfaces, the ideal is to choose a method of registration based on plane correspondences (Theiler et al., 2012) or primitive correspondences.The basic idea of our contribution is to perform an extraction of the planar polygons, from both the indoor and outdoor data, then to group them into three clusters according to their normals, which will be used subsequently for the estimation of the transformation.
The remainder of this paper is organized as follows.Section 2

PLANAR POLYGONS EXTRACTION
Due to its robustness to noise and outliers, RANSAC became the most popular method for LiDAR point cloud segmentation.
Despite this success, it can generate false segments consisting of points from several nearly coplanar surfaces.False planes made up of points from different planes or roof surfaces represent a real obstacle for RANSAC (Xu et al., 2016).In order to exceed the limitations of RANSAC, we have exploited two methods depending on the nature of the data.

RANSAC Based on Sensor Topology
For the outdoor scans, we have access to the sensor topology (adjacency between successive pulses in the same line and between lines) so we can use it to enhance the polygon RANSAC detector.We use for that a recent method (Guinard et al., 2020) that exploits the sensor topology to extract compact planar patches instead of planes: • Sample selection: as we are looking for compact planar patches, once a first sample point is drawn randomly from the point cloud, the other two are drawn in a local neighborhood (defined based on the sensor topology).• Region growing: instead of computing the distances of all the points of the cloud to the hypothetical plane, a region is grown starting at the first sample in order to recover a compact planar patch.
At each iteration, the planar patch with most inliers is selected and approximated by a single polygon using the α-shape algorithm (Edelsbrunner et al., 1983).Figure 6 shows the inliers of the detected planes from an outdoor scan.

Polygon MSAC
As we did not have access to the sensor topology for the indoor scans, we could not use the aforementioned method for the extraction of planar polygons.This is why, we have proposed a straightforward adaptation of MSAC (M-estimator Sample Consensus) which is a RANSAC extension that provides a potential solution to the spurious planes problem (Torr, Zisserman, 2000).While RANSAC gives the same unit score to all inliers: MSAC gives each point a penalty score measuring how well the point corresponds to the model: where i is the distance of LiDAR point Pi to the current hypothetical plane and t is a distance threshold.
As RANSAC, MSAC produces hypothetical planes by randomly selecting three input points.The score of the sample plane is simply the sum of (2) over all the points Pi of the input scan.When the scores have been computed for all planes, the one with the highest score is extracted, its inliers (points Pi such that i < t) removed from the point cloud, and the process is iterated until the score of the best plane is below a given threshold, as detailed in Figure 7.For each detected plane, we project its inliers to the plane and extract planar polygons by computing the α−shape (Edelsbrunner et al., 1983) of these projected inliers.Figure 8 shows the inliers of the detected planes from an indoor scan and the polygons computed by the α-shape for the plane with the most inliers (corresponding to the ceiling).

INDOOR/OUTDOOR REGISTRATION
A crucial step of the BIOM project is achieving a registration of indoor and outdoor scans with accuracy close to the scan accuracy.To achieve such accuracy, we rely on matching planar polygons extracted from both indoor and outdoor data.This is quite difficult because obviously most polygons detected from one scan will not be visible in the other.In order to facilitate this matching, we will detect points from the outside scan that are inside the building and extract planar polygons only from these points.

Detecting points inside buildings in the outside scan
As the LiDAR beam usually passes through windows as shown in Figure 9, we propose to start by detecting fac ¸ades from the outdoor scan, then detecting indoor points as points behind a fac ¸ade by ray tracing.

Fac ¸ade detection
For fac ¸ade detection, we decided to use the planar polygon detection of Section 2.1 to detect the fac ¸ades as large vertical polygons.Thus we will keep only the detected planar polygons which are sufficiently vertical (deviation below 3 • ).
The choice of the threshold on inlier distance is a crucial factor as a bad choice can cause important under or over detection of  (Bughin, Almansa, 2010), we can define the error as being the probability that a significant existing planar region is not detected.Another possibility is the probability that the number of detected planes is less than the number of undetected planes: The probability p(K ≥ F | σ), can then be upper bounded by the tail of the binomial law of parameter P defined by: where diag is the diagonal diameter of the bounding box of the 3D point cloud and V its volume.Then: 2. Find the intersection points P j i of Ri with the supporting planes Pj of each fac ¸ade polygon Fj.

Test if P j
i is inside the polygon (using the CGAL library. 4. In order to remove ouliers close to the fac ¸ade plane, we additionaly require the point to be at least 1 m away from the intersected polygon: (dist(Pi, P j i ) > 1m).
5. If one P j i satisfies both criteria (inside the polygon and sufficiently far from the fac ¸ade), tag it as an indoor point.
A result of this indoor point detection method is presented in Figure 11.Finally, the polygon detection algorithm of Section 2 is run only on the points detected as indoor from the outdoor scan, yielding a limited number of polygons.

Matching planar polygons
Selection of correspondences is a crucial step for the registration.If we have at least three correct correspondences of polygons with independent normals, it is possible to find the relative rotation/translation between the indoor and outdoor scans.We will start by presenting a simple threshold based matching in Section 3.2.1 in order to introduce the three main criteria used to match polygons, then propose a more robust matching based on clustering algorithm in Section 3.2.2

Threshold based matching Let us call:
• P1 and P2 detected planar 3D polygons from the two LiDAR scans: indoor scan (Scan1) and outdoor scan(Scan2) • n1 and n2 the corresponding plane normals • g1 and g2 the corresponding centroids.
• P 1 and P 2 the projections of P1 and P2 on the plane P1,2.
We propose first a simple filter based on three measures: 1. Angle: Angle(P1, P2) = n1, n2 2. Distance: as there is not a standard definition for this distance, we have chosen to define it as the sum of distances from the centroid of each polygon to the bisector plane: 3. Overlap: where |.| denotes the area.Note that we did not use the union on the denominator because a planar part of the inside scene is seen through an opening from the outside so only a very limited portion of it can be detected which would result in very low overlap with a more standard definition.
In practice, finding appropriate thresholds for these three criteria is tedious and leads often to multiple or no matchings.This is why we propose a more robust approach.

Cluster based matching
We propose a more robust matching for polygons based on three main principles: clustering the polygons (by direction then by offset), enumerating match hypotheses between the clusters and evaluating which hypothesis is the best.

Algorithm 1 Greedy direction clustering
Input: Pi a set of planes with normals ni and number of inliers ni. a tolerance angle (typically π/4rad).2: Clusters initialization: • C1 = {P1} where P1 is the plane with most inliers where P2 is the plane with most inliers among planes for which ni.n1 < cos( ) where P3 is the plane with most inliers among planes for which ni.n1 < cos( ) and ni.n2 < cos( ) Mark P1, P2 and P3 as processed and all other Pi as unprocessed 4: Each cluster C k has a normal c k computed as the weighted centroid of all the normals of the planes in C k Let Pcur be the unprocessed plane with the most inliers.Mark Pcur as processed.
Call C h and tag as horizontal the cluster for which 1−| z. c k | is minimum.10: Call C v 1 and C v 2 and tag as vertical the two remaining clusters.
Direction clustering: For each input scan, we greedily cluster planes Pi according to their normals ni by decreasing number of inliers ni as detailed in Algorithm 10 that produces 3 clusters C h , C v 1 and C v 2 for each scan.

Direction cluster matching
We associate C h for the two scans.We associate each remaining cluster (C v 1 and C v 1 ) to the vertical cluster from the other scan with the smallest angle.
Alignment: we compute the rotation that aligns the three clusters: the vertical C v 1 and C v 2 and the horizontal clusters using the method described in Section 3.3 Hypotheses enumeration: for each associated direction cluster, we enumerate possible plane matches: • For each of the three associated main cluster pair Am we will call Im the set of pairs of planes • for each (J1, J2, J3) in I1 × I2 × I3, compute the translation that aligns the planes in (J1, J2, J3) using the method described in Section 3.4.
• note that I3 is extracted from the horizontal clusters which cover the displacement along z,I2 is extracted from the vertical clusters which have the smallest angle with the y axis and I1 is extracted from the vertical clusters which have the smallest angle with the X axis.
• keep the translation calculated by the best hypothesis.

Selection of the best hypothesis:
To asses which hypothesis is the best, we define a criterion that ( 1) is robust to outliers as many planes detected in one scan have no counterpart in the other (2) favors important polygon overlaps and (3) favors small distances over these overlapping parts.For robustness, we need a distance threshold d thr above which a polygon is just considered an outlier.Then for two polygons P1, P2 with centroids O1, O2 in the same cluster, we define a robust error: We see that the distance is 0 when P1 is completely overlapped by polygons of Scan2 with a distance of 0 (the perfect case) and increases as distance augments and overlap decreases up to |P1| when the overlap is empty and/or the distance is over d thr .
In practice we make the approximation: Note that we do not use our modified relative Overlap function anymore as a large overlap surface should be favoured to a small one as it correponds to more input points.Finally our global criteria writes: We consider that the best hypothesis is the one that maximizes this criterion.

Rotation estimation
Assuming that we have an association between three directional clusters of two scans: • C h 1 and C h 2 the horizontal clusters of scans 1 and 2, c h 1 and c h 2 their normals.
• C v 1 1 the first vertical cluster of scan 1 and C v 1 2 the associated vertical cluster of scan 2, c v 1 1 and c v 1 2 their normals.
• C v 2 1 the second vertical cluster of scan 1 and C v 2 2 the asociated vertical cluster of scan 2, c v 2 1 and c v 2 2 their normals.
We want to find the rotation that best aligns these three pairs of directions.Let u1= c h 1 and u2= c h 2 .We are looking for the two vector: We want to create a first orthogonal basis M1 from u1 and v1, and a second orthogonal basis M2 from u2 and v2, so we have: • q1=the projection of v1 on the orthogonal plane to u1 • q2=the projection of v2 on the orthogonal plane to u2 we have: and We can calculate the rotation as the base change matrix between M1 and M2: M2 = RM1 (18)

Translation estimation
We have defined a hypothesis as three pairs of planes.Each pair is extracted from two matched clusters (Ci ∈ scanA, Cj ∈ scanB).
For each hypothesis we have: ), P lane j,l (N j,l , d j,l )}.
where l represents the pair index in the hypothesis and i,j represent the indices of the two matched clusters.For each pair we have the following properties: therefore according to the two equations above and the definition of a hypothesis, we can deduce the following linear system: by solving this linear system, we can find the translation.

EVALUATION AND DISCUSSION
Our proposed method has been tested on real data to evaluate its effectiveness.The proposed algorithm works without any constraints on the initial position of the two scans, unlike iterative methods which require the correct estimation of the initial position to be able to converge to a global solution.The key step of our algorithm is the estimation of planar polygons which was carried out using two methods depending on the nature of the data.
The first evaluation of our algorithm was carried out on two interior scans from which a perfect registration was done manually.Each input scan was subsampled to around 260 000 points.
Starting from this perfect position, we have altered one of the two scans with an initial translation error ranging from a few centimeters to a few meters and an initial rotation error ranging from a few degrees to a few tens of degrees as shown in Table1.The chosen setting allowed us to estimate 22 planes from the first scan, and 23 planes from the second scan.The obtained results show that our algorithm is able to register the two scans within a reasonable calculation time regardless of the initial error.Afterwards, we performed the registration of indoor and outdoor scans.As we do not know the ground truth we only considered the visual results.We consider that the obtained results demonstrate our algorithm's ability to efficiently register indoor and outdoor data as shown in Figure 12.
Achieving precise results requires fine-tuning of the algorithm's parameters.The iterations number of MSAC must be calculated according to the number of points of the scan to ensure the robustness of the algorithm.The number of estimated planes depends on the inliers threshold and the minimum size of a planar region, whereas the number of extracted polygons is reliant on the value of α.If we estimate more planes, we can generate more hypotheses and use more polygons for the evaluation of each hypothesis, and therefore we can find a more precise result.
Implementation: Our algorithm was implemented in C++.All geometric calculations were carried out using the CGAL library https://www.cgal.org/).In addition we used the Boost library https://www.boost.org/ to calculate polygons intersection, Eigen library https://eigen.tuxfamily.org/dox/ for matrix calculation and The Point Cloud Library (PCL) https://pointclouds.org/ for point cloud processing.

CONCLUSION AND FUTURE WORKS
In the present work, we have proposed a method based on polygon detection and matching to address the challenging problem of indoor/outdoor registration whereas state of the art approaches tackle either the indoor or the outdoor registration problem.To the best of our knowledge, no method has proposed a joint indoor/outdoor registration with a unified formalism.Our preliminary tests have highlighted the potential of the proposed method.Our main perspective concerning this work is to extract outdoor points from the indoor scans in order to get additional information, and to precisely detect windows outline in order to introduce additional constraints in the registration.

ACKNOWLEDGMENT
This work has been supported by the Building Indoor/Outdoor Modelling ANR-17-CE23-0003 BIOM project.The authors would like to thank Mrs. Tania LANDES who provided them with the indoor data.

Figure 1 .
Figure 1.Outdoor MLS scan acquired with an MMS

Figure 2 .
Figure 2. Indoor scan acquired in static mode inside the Zoological Museum of Strasbourg.

Figure 3 .
Figure 3. Indoor scans of the ground floor of the Zoological Museum of Strasbourg.

Figure 4 .
Figure 4. Indoor and outdoor scans acquired in static mode at the Zoological Museum of Strasbourg: blue(outdoor scans), RGB(indoor scans).Indoor LIDAR scan Outdoor LIDAR scan Points inside buildings Polygons extration Polygons extration Clustering Hypotheses generation Optimal transformation Quality criteria Best Transformation

Figure 6 .Figure 7 .
Figure 6.Inliers of the estimated planes from an outdoor scan computed with Sensor topology based RANSAC

Figure 8 .
Figure 8. Top: Inliers of the estimated planes from an indoor scan, Bottom: Polygons extracted from the prominent plane.

Figure 11 .
Figure 11.Indoor points detected from an outdoor scan, pink: the detected points, blue : inliers of the vertical plane that represents the fac ¸ade 3.1.2Ray Tracing Once the fac ¸ades are detected, we detect indoor points by ray tracing:

Table 1 .
Result of indoor/indoor registration tests