APPROXIMATE REGISTRATION OF POINT CLOUDS WITH LARGE SCALE DIFFERENCES

3D reconstruction of objects is a basic task in many fields, including surveying, engineering, entertainment and cultural heritage. The task is nowadays often accomplished with a laser scanner, which produces dense point clouds, but lacks accurate colour information, and lacks per-point accuracy measures. An obvious solution is to combine laser scanning with photogrammetric recording. In that context, the problem arises to register the two datasets, which feature large scale, translation and rotation differences. The absence of approximate registration parameters (3D translation, 3D rotation and scale) precludes the use of fine-registration methods such as ICP. Here, we present a method to register realistic photogrammetric and laser point clouds in a fully automated fashion. The proposed method decomposes the registration into a sequence of simpler steps: first, two rotation angles are determined by finding dominant surface normal directions, then the remaining parameters are found with RANSAC followed by ICP and scale refinement. These two steps are carried out at low resolution, before computing a precise final registration at higher resolution.


INTRODUCTION
With the emergence of highly automated recording techniques, automatic point cloud registration has attracted great interest in past years.While researchers from different disciplines, including geodetic metrology, photogrammetry, computer graphics and computer vision have proposed different registration algorithms, only few attempts have been made to register point clouds with large scale differences.For point clouds with the same scale, such as individual laser scans, the iterative closest point algorithm (ICP) (Chen andMedioni, 1992, Besl andMcKay, 1992) is widely used.Since ICP is a local method and requires good approximations to converge to a correct result, different methods have been proposed to find approximate registration parameters using point correspondences.Algorithms such as 3D SIFT (Li and Guskov, 2005), and SPIN images (Johnson and Hebert, 1997) extract features and match them to obtain correspondences between point clouds.These methods have in common that the two point clouds must have similar point spacing and almost the same scale.A different approach is the 4-points congruent sets (4PCS) algorithm (Aiger et al., 2008), which does not attempt to extract features.Rather, it randomly picks four coplanar points with large baselines and matches them based on their geometric configuration in order to obtain the registration parameters.The method also requires a known scale, and that the point clouds have similar and relatively uniform point density.(Zinßer et al., 2005) proposed a method which incorporates a scale estimation into the ICP algorithm, but again is a local refinement which requires that the two models are already roughly aligned.
To overcome the limitation w.r.t.scale (Corsini et al., 2013) have recently proposed a method that automatically registers photogrammetric and laser point clouds without any approximations, including automated scale estimation.In their method, a sparse photogrammetric point cloud is first roughly registered with the laser point cloud using a combination of 4PCS and Variational Shape Approximation (VSA) (Cohen-Steiner et al., 2004).Then the orientation of each image is individually refined using mutual information.
Here we propose a different approach to fully automatic registration.Our approach is based solely on 3D geometry and does not use image or laser intensities.Instead, our method uses a dense photogrammetric point cloud, which nowadays can be generated reliably.Furthermore, the exterior orientation of the camera positions is fixed unlike in (Corsini et al., 2013) which keeps the rigidity of the photogrammetric point cloud intact.Additionally we do not need the image information, which makes it possible to apply our method on point clouds solely generated with laser scanners or similar active scanning devices.

MOTIVATION
A laser point cloud and a non-georeferenced photogrammetric point cloud exhibit significant differences in scale, rotation, translation and point density.Additionally, scans can differ in object coverage, and have different levels of noise and outliers.Fig. 1 shows a real example for the often neglected difference in coverage and detail.Laser scanners deliver metric scale, since they measure distances in world units.On the contrary, images are oriented based only on ray directions, thus the scale is undetermined and set arbitrarily at some point of the computation.This leads to the situation that the scale difference between the point clouds can be very large, i.e. a factor 50 or more.The arbitrarily large scale difference is a challenge, since standard registration algorithms do not accout for such large scale differences.Next to the scale differences, rotation differences are another major challenge.Terrestrial laser scans mostly have a known gravity vector since the instrument is levelled before recording, whereas the photogrammetric network again is not connected to the world coordinate system and can even be completely upside down.Finally, different acquisition techniques yield different point densities, as can be seen in fig. 2. The latter difference is problematic because it causes conventional 3D feature matching procedures to fail, since feature responses depend on point density.Translational differences, which can be very large, are only a minor challenge.

REGISTRATION METHOD
Figure 3: Registration workflow Fig. 3 shows the proposed point cloud registration approach that proceeds in steps, rather than solving for all transformation parameters at once.It is inspired by the way a human operator interactively performs this task: since the operator has to work with a 2D view of the point cloud, he first rotates and translates the point clouds to get a similar perspective for both in which he can see similar shapes or features (e.g. a birds-eye view), then adjusts the remaining parameters (translation, inplane rotation, scale), and finally refines the registration.
The proposed registration algorithm proceeds in a similar manner.First, the point clouds are pre processed, to reduce the point density and processing time.Next, the dominant normal directions of both point clouds are estimated and aligned, effectively solving for two of the three rotation angles and reducing the registration problem to a 2.5D rasted DEM (heightfield) matching process.The remaining parameters (3D translation, in-plane rotation, scale) are then solved by robust fitting with RANSAC (Fischler and Bolles, 1981), followed by fine registration with exhaustive local search and ICP.The involved parameters have been determined empirically using ten datasets with various properties, such as very high resolution, low resolution and high noise levels.

Preprocessing
Initially, a statistical outlier filter (Rusu and Cousins, 2011) is applied to both point clouds in order to reduce noise which saves computation time in later steps.In addition, both point clouds are downsampled with a voxel grid filter.On the one hand this yields a uniform point density, and on the other hand it reduces the point cloud to a manageable size for coarse initial registration.More precisely, the filter replaces all points that fall within a given voxel by a single point, in our case defined to be the centroid of those points.The

Finding dominant normal directions
After pre-processing, the dominant normal directions are detected.Since many scenes do not have an unique and unabiguous dominant normal, we find up to five normals and test all of them in subsequent steps.The normal directions of the two point clouds are then aligned to determine two rotation angles (similar to the way 1D rotation is accounted for in invariant image descriptors).To find dominant normal directions, point normals are estimated for both point clouds and clustered on the unit sphere.Clustering is done with nonparametric density estimation using a circle on the sphere with radius r k = 0.035 (corresponding to a polar cap with 2 • of latitude) as kernel.A similar approach is described in (Makadia et al., 2006).However, instead of unwrapping the normal sphere into a equirectangular projection we sample the normals directly on the sphere.This has the advantage that the sampling kernel always covers the same area on the sphere, without increasing/decreasing in size towards the equator/poles.The downside to this is an increased processing time.Next, normal directions are found by greedily picking the clusters with the highest number of normals, enforcing a minimum distance of 5•r k between clusters for non-maximum suppression.After computing the mean point count m of the clusters and its standard deviation σm, clusters with less than m + 2.5 • σm members are discarded.We keep at most five largest clusters, since few scenes will have more than 5 similarly dominant normal directions.More directions are typically only found on objects similar to platonic solids.Since registration of such objects is ill-posed, they are not treated separately.

Conversion to Heightfields
Having found the dominant normal direction(s), the point clouds are projected along the normal to a 2.5D heightfield, the normal being aligned with the z-axis of the 3D coordinate system.The points are projected onto the x, y-plane and discretized to a regular grid with step width d G , set to ten times the 2D median point spacing in the projection plane.To generate the heightfield, points with normals pointing away from the viewer are discarded ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume II-5/W2, 2013 ISPRS Workshop Laser Scanning 2013, 11 -13 November 2013, Antalya, Turkey as back-faces, and of the remaining points the highest one is retained.The projection results in a 2.5D heightfield (Fig. 5), like the ones often used to represent digital terrain models or range camera images.These heightfields have the advantage that the scale difference does not affect them.Rather, the point spacing imposes a change in scale, which makes the scale differences on these heightfields much smaller.To remove remaining blunders and simplify processing, the heightfields are median-filtered and rescaled to 8-bit images.It has proved beneficial for further processing (see below) to keep two versions, one rescaled globally to fit into the 8-bit range and one normalized locally in 32×32 pixels tiles.

Heightfield matching
Once the 2.5D heighfields have been created, they need to be matched.In case of multiple dominant normal directions, we test all pairwise combinations and keep the one with the best score.
Matching is again decomposed into two steps, namely • rough registration with RANSAC • refinement with exhaustive local search.
To obtain a rough initial registration, we first cut down the set of possible rotations to a small number (recall that only one rotation angle around the vertical is still unknown).To that end, we compute gradients for both heightfields (through Sobel-fitering) and build magnitude-weighted histograms over gradient directions.This procedure is identical to the one used to determine the canonical orientation in SIFT (Lowe, 1999) and other rotationinvariant descriptors.However, rather than settling on a single orientation, our method again proceeds conservatively: we compute the mean m κ and standard deviation σ κ of the histogram counts and keep all candidate directions with a count > mκ + 1.5 • σ κ .
The 2D Helmert transformation between the two heightfields is then obtained by RANSAC, using a minimum sample of 2 points.The sampling is constrained to a sensible parameter range of • rotation near one of the candidate directions • Point sampling difference (heightfield scale factor) between 0.1 and 10 • heightfield overlap > 25% in both directions For each such random sample, the median absolute differences (MAD) are computed twice, once between the globally normalized heightfields and once between the locally normalized ones.
If both are below a threshold (set to 30, respectively 50 graylevels), the transformation is refined with ICP.Refined solutions are accepted only if the ICP solution has an inlier ratio of at least 95%.The sampling process is repeated until the RMSE of the refined solution drops below 1.25 • dG, or a maximum number of iterations is reached.Finally, the solution is further polished by exhaustively testing small changes of the transformation parameters in a local neighborhood around the estimated values.

Scale refinement
The principal remaining source of discrepancies is the inaccuracy of the estimated scale, which proves to be the most brittle parameter.We thus search through range of scales around the scale found by the preceding steps, to minimize the RMSE of ICP registration.

Final Registration
Having found an optimal solution at low resolution for all pairs of dominant normal directions, the ones with reasonable accuracies (below the mean RMSE plus 3 standard deviations) are passed to high-resolution final registration.At this point, we revert to the high-resolution point clouds generated with a finer voxel grid (Y • d N N ), and polish the solution by again running ICP with exhaustive scale testing, this time using smaller scale steps.The solution with the lowest RMSE is our final solution.

EXPERIMENTS
To generate reference data for experimental evaluation, we have registered all data sets interactively, by clicking four common points to obtain 3D Helmert transformation parameters and refining the solutions with a non-scaled ICP, using the same stopping criterion as in the automatic registration method.Note that our algorithm is indeed fully automatic.Scale-and density-independent thresholds are kept fixed, and those depending on the properties of the specific point clouds are set in a data-driven fashion.The fixed parameters are: Other parameters, such as RANSAC stopping points (both "hard" and "soft") as well as downsampling ratios are variably defined according to the point cloud sizes.Both manual and automatic registration use identical point clouds, namely the high-resolution voxel grid filtered ones, to obtain consistent scores.The resulting RMSE as well as the ICP inlier ratio, number of used points, and ratio of laser and photogrammetry points used for registration are shown in tables 1, 2, 3, 4 and 5.The inlier ratio is based on the correspondence estimation of the ICP algorithm.It is defined as the amount of "good" correspondences over the total amount of found correspondences.The ratio of laser and photo points is then the amount of good correspondences over the amount of points in the laser/photogrammetric point cloud used for the registration.Additionally, the estimated scale is shown.It is important to note that the manual scale is not the "correct" scale as it is derived from manually measured point correspondences.It is therefore an approximation and serves mainly as an indicator whether the automated method obtained a reasonable scale factor.

DATASETS HXE
The HXE dataset consists of a high resolution, terrestrial laser scan and a photogrammetric point cloud obtained with an unmanned aerial vehicle (UAV).The images have been taken in a classic nadir acquisition setup.This dataset is a fairly "standard" as it does not have a lot of noise and features well defined manmade structures.Fountain-P11 The fountain dataset was kindly provided by the dense multiview stereo repository of the École Polytechnique Fédérale de Lausanne (Strecha et al., 2008).The photogrammetric point cloud was kept "as is" meaning that the estimated scale factor should be exactly "1", since the data was oriented using ground control points.The laser point cloud was downsampled beforehand as it was far too large to be handled in a reasonable manner.The photogrammetric reconstruction was performed with CMVS/PMVS (Furukawa and Ponce, 2010).The registration result of the HXE dataset is visually correct.However upon closer inspection there is a scale error which translates into differences up to 3 cm.The manual registration however creates slightly worse results, mainly due to the difficulties to pick four homologous points accurately enough.

Niederwald
Manual  The automated registration result of the Niederwald dataset is also successful but shows small scale errors.It succeeds to obtain better results than the manual registration.
The Erechtheion dataset seems to be correct from a first glance.However the scale is not estimated properly.This is mainly due to the long and planar wall as well as the fact that the laser dataset also covers the inside of the building walls.This, combined with the fact that the laser point cloud is rather sparse.results in a worse result than the manual registration.The Lucy dataset has been registered successfully.The difference to the manual registration is fairly small, mainly because picking four homologous points in this dataset is rather easy.This is owed to the fact that the datasets are essentially the same ones except for the simulated differences in downsampling.The estimated scale from the automated part deviates from the real value by about 500 parts per million (ppm).For the manual registration it is about 2500 ppm.
The fountain dataset is also registered successfully and again the difference to the manual registration is fairly small.Like in the Lucy dataset, the object coverage is pretty much the same and the point densities are similar.The scale differences to the real value of "1" are a bit larger though.For the automated part it is 3000 ppm, for the manual 5600 ppm.

DISCUSSION
The results show that the automated method manages to obtain approximations for all datasets.Furthermore, the proposed method offers a completely automated processing chain, which avoids any interactive point measurements.In most cases the result is even better than the manual registration method.The only exception is the Erechtheion dataset where the scale estimation is slightly worse than the manual result.This can be mainly attributed to the long and planar wall as well as the backwall of the interior.
The limitation of the proposed solution is that the datasets have to have a common dominant normal direction.This is in most cases true, but makes it difficult to apply the algorithm on datasets with very low overlap.A possible solution might be to resolve the ambiguity with the help of extended Gaussian images (Makadia et al., 2006).Another drawback is the completely automated, but nevertheless purely empirical setting of the parameters.Additionally, reducing the problem to 2.5D heightfield matching greatly simplifies the problem, but this comes at a price.Height fields are a lot smoother and have much less high-frequency detail than optical images, and are thus harder to match with standard image matching techniques.An approach using higher-level shape information rather than just per-pixel heightfield differences could potentially yield a more reliable registration pipeline, and also handle datasets with even smaller overlaps.
We also point out that, other than for example the recent (Corsini et al., 2013), our method views the photogrammetric reconstruction as a rigid object, keeping the network geometry unchanged.This can be seen as positive or negative.On the negative side, one sacrifices the possibility to remedy inaccuracies in the photogrammetric orientation.On the positive side, inaccuracies in either of the two point clouds will cause systematic registration errors and can be detected (e.g.model deformations in photogrammetry, or wave patterns due to vibrations of the laser scanner mechanics).Finally, there are still small scale differences.As pointed out by (Zinßer et al., 2005), scale selection with ICP is not very reliable.It may be possible to further improve the scale by employing a different fine registration method such as (Zinßer et al., 2005, Gruen andAkca, 2005).

CONCLUSION AND OUTLOOK
The fully automated registration of point clouds from different sensors allows for a better coverage of objects.By using different platforms and acquisition times, it is possible to cover an object in more detail both in terms of geometry as well as colour information.The results show that the proposed method manages to obtain approximate registration parameters reliably for four out of five datasets.For cultural heritage, archaelogical, architectural and even certain low-accuracy surveying applications these approximations might already be accurate enough.
Compared to the manual measurements, the automated results yield lower RMSE values for four out of the five datasets, which means that the automated scale estimation is more accurate than calculating it via four manually measured points.However, at least 25% of overlap in both directions is necessary for the procedure to work properly.Future research in heighfield matching and scale estimation in the final registration part would further improve the suggested method both in terms of accuracy and reliability.

Figure 1 :
Figure 1: Differences in detail.Top: Laser scan point cloud, Bottom: Photogrammetric point cloud While the laser scanner has only acquired a small part of the railway tracks, it was able to capture the catenary support as well as the catenary wires completely.The photogrammetric point cloud

Figure 2 :
Figure 2: Differences in point spacing.Left: Laser point cloud, Right: Photogrammetric point cloud.
voxel size is determined by estimating the median nearest-neighbor distance d N N in the point cloud, using a random sample of 5% of the points.Each point cloud is downsampled twice, once to X • d N N and once to Y • d N N , to generate representations at two different downsampling factors.The clouds with a higher downsampling factor (X • d N N ) are then used for coarse registration, while the ones with the smaller downsampling factor (Y • d N N ) enable accurate fine registration at the end.The downsampling factors are automatically calculated for each dataset.

Figure 4 :
Figure 4: Normal directions on an unity sphere with the circular kernel marked in red

Figure 5 :
Figure 5: Top: Globally normalized photo heightfield, Bottom: Locally normalized photo heightfield.The red square indicates the same region in both heightfields

Figure 6 :
Figure 6: HXE -Top: Laser, Bottom: Photogrammetry Niederwald The Niederwald dataset consists of a terrestrial laser scan and a photogrammetric point cloud which again was obtained with an unmanned aerial vehicle (UAV).The laserscan contains a large amount of outliers, most probably caused by a scanner malfunction or a slow subsidence of the laserscanner tripod because of thawing ground.The images have been taken in two stripes, one nadir and one in a 45 • oblique view.

Figure 7 :
Figure 7: Niederwald -Top: Laser, Bottom: Photogrammetry Erechtheion The Erechtheion dataset was obtained with a terrestrial laser scanner and terrestrial photogrammetry.The original laser scan data consists of over five billion points and covers the outside, as well as the inside of the Erechtheion.The acquistion is described in (El-Hakim et al., 2008) and was kindly provided.From this data a five million triangle VRML model was created which was converted into a point cloud.The photogrammetric point cloud, which is denser than the laser point cloud, covers one side of the Erechtheion, which includes the porch of the Caryatids.Lucy The Lucy dataset, kindly provided by the Stanford University Computer Graphics Laboratory, has been recorded using their large statue scanner and represents the laser dataset.The photogrammetric dataset has been simulated by rescaling, downsampling, rotating and translating the statue.The scale factor was set to 0.3 of the original model, which translates to a scale of 1 0.3 = 3.3333 that should be estimated.

Table 4 :
Lucy results