ACCURATE REGISTRATION OF AERIAL IMAGES AND ALS-POINTCLOUD VIA AUTOMATED JUNCTION MATCHING AND PLANAR CONSTRAINTS

Accurate geometric registration of images and pointclouds is the key step of many 3D-reconstruction or 3D-sensing tasks. In this paper, a novel L-junction based approach is proposed for semi-automatic accurate registration of aerial images and the airborne laser scanning (ALS) point-cloud in urban areas. The approach achieves accurate registration by associating the LiDAR points with the local planes extracted via L-junction detection and matching from multi-view aerial images. An L-junction is an intersection of two line-segments. Through the forward intersection of multi-view corresponding L-junctions, an accurate local junction-plane can be obtained. In the proposed approach, L-junction is manually collected from one view on the flat object-surfaces like walls, roads, and roofs and then automatically matched to other views with the aid of epipolar-geometry and vanishing-point constraints. Then, a plane-constrained bundle block adjustment of the image-orientation parameters is conducted, where the LiDAR points are treated as reference data. The proposed approach was tested with two datasets collected in Guangzhou city and Ningbo city of China. The experimental results showed that the proposed approach had better accuracy than the closest-point based method. The horizontal/vertical registration RMS of the proposed approach reached 4.21cm/5.72cm in Guangzhou dataset and 4.46cm/4.34cm in Ningbo dataset, which was much less than the average LiDAR-point distance (over 25cm in both datasets) and was very close to the image GSDs (3.2cm in Guangzhou and 4.8cm in Ningbo) and the a-priori ranging accuracy of the ALS equipment (about 3cm). * Corresponding author


INTRODUCTION
The recent two decades have witnesses the rapid development of the digital cameras and the Light Detection and Ranging (LiDAR) sensors. With the acquisition expense gets lower, the integration of both data sources (images and LiDAR pointclouds) becomes popular in dealing with 3D-sensing tasks like topographic mapping, auto-pilot, indoor Simultaneous Localization and Mapping (SLAM), 3D reconstruction, 3D object detection (Huang, et al., 2019), etc. For whatever tasks, the accurate geometric registration of the two data sources should be ensured at the beginning (Mishra and Zhang, 2012;Zhang and Lin, 2016). The geometric registration of both data sources is mainly implemented through two types of methods, i.e., the 2D multi-modal image-matching methods and the 3Dfeature-registration methods.
In the methods based on multi-modal image-matching, the 3D information in LiDAR data should be projected onto the 2D image-space at the beginning. In (Wong and Orchard, 2008), the intensity of the LiDAR reflections was projected onto the image-space to form the intensity map, and then the registration was achieved by extracting point-matches between the intensity map and the aerial images. In (Shorter and Kasparis, 2008), the edges of the buildings were extracted from the LiDAR pointclouds and projected onto the image-space to be matched with the aerial images through phase correlation. In (Barsai, et al., 2017), the edges of the buildings were extracted from both the LiDAR point-clouds and the aerial images, then the registration was achieved by minimizing the overall distances of the two sets of 2D-edges without establishing explicit correspondences. The above-mentioned methods transform the 3D alignment problem into a 2D multi-modal image-matching problem or a 2D structure-registration problem, and as a result, the alignment accuracy is strongly related to the average LiDAR-point distance and the ground sampling distance (GSD) of aerial images.
In the methods based on 3D feature registration, structures like points, line-segments, or planes are extracted from both the images and the LiDAR point-clouds. The structures extracted from the images need to be back-projected to the 3D-space through the forward-intersection process before the 3Dregistration is processed. For the aerial images and the airborne laser scanning (ALS) data acquired over urban areas, the widely existing corners, lines, and planes should be used to get better registration accuracy. Because the airborne laser scanning (ALS) data usually has better ranging accuracy (2cm-5cm) than its ground sample distance (GSD,. Extracting structures from LiDAR pointcloud usually begins with extracting planes. Two planes are needed to get a line-segment and three are needed to get a corner point. Line-segments and corner points can be easily extracted from the images, while planes cannot. Although a plane can be determined by three or more tie-points extracted from multi-view images, it is not easy to judge whether this plane is on an existed flat object surface. As a result, many researchers achieved accurate 3D registration through corners and line-segments. In (Gruen and Akca, 2005;Akca, 2007), the point-cloud extracted through dense-match of the aerial images were used for the registration with the LiDAR point-cloud. In (Zheng, et al., 2013;Huang, et al., 2018), the 3D points extracted from the images were constrained with the planes extracted from the LiDAR data through normal-shooting strategy. In (Costa, et al., 2017;Costa and Mitishita, 2019;Peng and Zhang, 2019), the line-segments or the corner points of the buildings, extracted from the LiDAR data through intersecting the point-cloud planes, were registered with those extracted from the images. In (Armenakis, et al., 2013), the author discussed the potential of directly using the planes extracted from both the images and the LiDAR data as the constraints for 3D registration. The above-mentioned 3D-registration methods utilize different geometric structures to deal with different tasks, scales, and targets.
In this paper, a new plane-based approach is proposed for the geometric registration of the aerial images and the LiDAR point-clouds in urban areas. The new approach extracts planes from the aerial images through obtaining the L-junctions from multi-view aerial images and then achieves accurate 3D registration through associating the junction-planes with the nearby LiDAR points. The novelty of the approach is in two aspects: 1. A new method of automatically matching the L-junction of the line-segments is proposed, with the aid of epipolargeometry and the vanishing-point constraints. 2. A new plane-constrained bundle block adjustment model of the aerial images is designed to achieve the accurate registration of the aerial images and the ALS pointcloud.
The workflow of the proposed approach is illustrated in Figure  1. The process has three main steps. First, the initial orientation models of the aerial images are prepared through tie-point matching and POS-aided bundle adjustment. Second, the Ljunctions are manually collected from the aerial images and then automatically matched to other views. The collected Ljunctions should correspond to real-existed planar objectsurfaces. Thus, the 3D planes can be extracted through the forward intersection process of the multi-view L-junctions, and by associate the planes with the nearby LiDAR points, the planar constraints are established. Finally, a plane-constrained bundle block adjustment is operated to the image orientation parameters to achieve accurate 3D registration between the two data sources.
The reminder of this paper is as follows. In Section 2, the method and the related theory of automatic L-junction matching and is introduced. In Section 3, the way of linking the LiDAR-points and the junction-planes is described and the planar constrained block adjustment model is introduced in detail. In Section 4, two datasets are used to demonstrate the proposed approach. In Section 5, conclusions are drawn.

EXTRACT PLANES FROM AERIAL IMAGES
As in Figure 1, three steps are needed for extracting planes from aerial images, i.e., collect the L-junctions manually, match the L-junctions automatically, and obtain the 3D junction-plane via forward intersection of the multi-view L-junctions.

Detect L-Junctions
Although there are many well-designed methods for automatically detecting the corners, line-segments, or junctionlike structures, this approach chooses to get the first imagecorrespondence of an L-junction manually and then find its correspondences from other views automatically. Because currently, no automatic detection methods can ensure its results satisfying the following rules.

Rules of L-junction Collection:
1. The plane yielded by the two branches of an L-junction must correspond to a real existed flat object-surface. 2. Each of the branch of an L-junction should be either horizontal or vertical.
The first rule ensures that the collected L-junctions and their correspondences can be used to extract real existed planes for 3D-registration (see Figure 2), while the second rules ensures that the L-junctions are easy to match under the vanishing-point constraints. The images-space L-junctions is a two-branch junction, which is denoted as: where c is the junction-centre, p and q are two branches. The branch of a L-junction is a line-segment from the junctioncentre to its terminal point (denoted as p and q).

Match the L-junctions
It is quite easy for a person to find an L-junction from an aerial image that satisfies the Rules of L-junction Collection in Section 2.1. However, it is quite difficult to find correspondences for an L-junction from unordered aerial images. Even the images have been bundle-adjusted, a well-designed graphical user-interface is needed to manually match them and the job is still very time-costing because one L-junction may have tens of image-correspondences in a set of oblique aerial images. Thus, an automatic method is proposed.
In this approach, the purpose of matching the L-junctions is to extract the 3D-planes yielded by the junction-branches. Thus, will be treated as correspondences when: 1. The junction-centre c and  c are corresponding points; 2. The junction-branch p and  p are on two corresponding straight-lines; 3. The junction-branch q and  q are on two corresponding straight-lines; As is described above, the terminate points of the corresponding branches do not need to be corresponding points. During the automatic matching process, the correspondence (denoted as J  ) of the junction J can be searched under the following constraints (see Figure 3).

Constraints for L-junction Matching:
1. The junction-centre c and  c should follow the principle of epipolar-geometry. 2. The straight-lines on which the corresponding junctionbranches lies should pass the corresponding vanishing points.
As a result, the epipolar-geometry and the constraints of two vanishing points can be used to aid the matching of L-junctions. Since the orientation model of the aerial images can be treated as an apriori in the proposed approach, the epipolar-geometric relationship of any image-pairs can be easily deduced. Denote the fundamental matrix between image I and  I as F, the epipolar-geometric-based constraint is: where dist(.) means the distance between the point  c and the epipolar-line Fc ; 1  is the threshold of the distance.
In order to match the L-junctions with the branch-direction constrained, we need to firstly locate the vanishing points for the branches of the artificially collected L-junctions, and then find the correspondences of the vanishing points from the other views. Vanishing point is the projection of a 3D infinite point. According to the Rules of Artificial L-junction Collection in Section 2.1, the branch of an L-junction is either horizontal or vertical. Thus, the vanishing point of a branch should be either the image-space nadir point or the intersection of the branch (the straight line that contains the branch) and the vanishing line (see Figure 4). Denote    (2) and (5). If more than one L-junctions satisfy the constraints, manually check is needed to find the correct one.

Forward Intersection and Plane Extraction
An object-space 3D L-junction is composed by a 3D point and two associating directions. Thus, at least seven parameters are needed to express an 3D L-junction, which is denoted as:

 
, , , , , , where X, Y, and Z are the coordinates of the centre point;  p and  p express the direction of the branch P ;  q and  q express the direction of the branch Q . The forward intersection of L-junctions has two steps, i.e., obtaining the initial value 0 and then optimize it through least-square-based multi-view forward intersection. The initial coordinate of the junctioncentre can be obtained by direct forward intersection with any two of the image-space junction-centres, while the initial branch-direction can be obtained from the 3D infinite point ( ) p and ( ) q used in L-junction matching.
The projection of the 3D L-junction is in the following steps. First, project the junction-centre (X, Y, Z) onto the image-space. Then, project an arbitrary 3D point on each of the 3D-branch onto the image-space and thus we can get the 2D branches.
Denote the projected junction-centre on the j-th image as ( ) The problem of optimizing the 3D L-junction is denoted as: . The 3D alignment between the aerial images and the LiDAR points will be operated by forcing the LiDAR points near ( ) P to be on it.

PLANE CONSTRAINED REGISTRATION
There are three steps to operate the plane-constrained 3D registration. First, the LiDAR points near the 3D junction-plane are searched to form the planar constraints. Second, a bundle block adjustment model with additional point-to-plane cost functions is formed and solved to refine the orientation model of the aerial images, so as to achieve the accurate registration. Finally, the registration accuracy is assessed by the independent check junctions (ICJs).

Point-to-Plane Cost Functions
The parameters of the 3D junction do not involve the length of the branches because they will not be used in the forward intersection process and the following block adjustment. However, estimating the branch-length can help us find the related LiDAR points. With the junction and two branchlengths ( ) L p and ( ) L q , the LiDAR points within the 3D cube formed by the junction-plane and a thickness threshold H  are related with the junction to build the point-to-plane constraint (see Figure 6). Figure 6. The points (green) within the buffer of the junctionplane will be constrained to the L-junction.
The point-to-plane constraint can be denoted as the following cost function with the t-th LiDAR point t and the i-th 3D L- is the point-to-plane distance; the weight value , it W will be fixed as the one at the first iteration of the block adjustment, and in the latter iterations will be modified according to the point-to-plane distance calculated with the updated image orientation model: is the junction-plane of i calculated by the orientation parameters updated in the last iteration.

Plane-constrained Bundle Block Adjustment
Combining the cost function of L-junction projection, tie-point projection, and the point-to-plane constraints, a mixed bundle block adjustment model is denoted as: where ε denotes the cost functions; x denotes the variables to be optimized; j t denotes the exterior orientation parameters of the j-th image; i is the coordinate of the i-th tie-point; i denotes the parameters of the i-th L-junction; denotes the interior orientation parameters and the distortion parameters of the aerial camera; T i ε denotes the projection-cost of the i-th Ljunction as in (7); Compared with the conventional bundle adjustment, the model in (10) does not involve the cost functions for ground control points and POS data, but involves the cost function for Ljunction projection and for plane constraints.
Since the aerial images have been pre-bundle-adjusted before their 3D registration with the LiDAR points, the pre-adjustment results (tie-point locations and image orientation parameters) can be used as initials. The L-junctions can be treated as control points that however can only control one direction (the normal direction of the junction-plane) of the 3D space. Thus, the block adjustment problem (10) can be solved by iterative linearization and variable update as we do for conventional bundle adjustment. During the iteration, the weight values of the (junction-and tie-point-) projection costs are fixed, while the weight values of the point-plane-constraining costs are updated according to (9). With the L-junctions carefully selected according to the Rules of Artificial L-junction Collection, the solving process can converge with several iterations.

Plane-based Accuracy Assessment
The L-junctions not used as control junctions (CJs) can be used as ICJs. In the accuracy assessment, the RMS of the distances between a (horizontal or vertical) junction-plane and the associated LiDAR points can be used to evaluate the (horizontal or vertical) registration error near this L-junction.

EXPERIMENTS AND ANALYSIS
In this section, the proposed approach was tested with two datasets acquired over Guangzhou and Ningbo of China (see Figure 7). The detailed information of the aerial images and the LiDAR data is given in Table 1. Both the LiDAR point-clouds have been geometrically rectified before the test and had better than 10cm RMSE. The accuracy of the Ningbo LiDAR data has been evaluated with the standard 1:500 topographic maps. The results showed that the horizontal RMSE was about 6.9cm and the vertical RMS was about 7.9cm. The ranging error of the LiDAR sensor was about 3cm. The images and the LiDAR data were acquired at different time.

Location Guangzhou Ningbo
Aerial Images

L-Junction Collecting and Matching
When matching the junctions on the aerial images of the two datasets, 1  (the threshold for junction-centre, see condition (2)) were set 3 pixels and 2  (the threshold for junction-branch, see condition (5)) were set 5 degrees. In the Guangzhou dataset, 42 L-junctions were manually collected and automatically matched. The screenshots of six L-junctions collected and matched in Guangzhou dataset are shown in Figure 8. The matching accuracies of the six illustrated junctions (Figure 8) are given in Table 2. In Ningbo dataset, 28 L-junctions were manually collected and automatically matched.  Table 2. The matching accuracy of the junctions in Figure 8.
As is shown in Table 2, five out of six L-junctions had subpixel reprojection errors. All the L-junctions used in the two datasets had less-than-1.5-pixel RMS, among which 35 in Guangzhou, and 22 in Ningbo had subpixel RMS. Furthermore, the Ljunction can even be accurately matched on the views where the junction-planes were invisible (see the images in row-1/column-5, row-2/column-2, row3/column-2, row-4/column-3, and row-4/column-5 in Figure 8). This is because the proposed matching strategy did not utilize the texture information of the images.

3D Registration Accuracy
When searching the LiDAR points associated with an Ljunction, the threshold H  was set 3m. During the bundle block adjustment of the proposed approach, the threshold p  was 0.15m. The method based on the closest-point principle and the collinearity equations (Huang, et al., 2018) was used for comparison. The tie-points used in the pre-process (POS-aided bundle adjustment) were also used in the 3D registration, so we did not need to obtain them again.  Table 3. The horizontal registration accuracy was evaluated with the VICJs (because they had vertical junction-planes with horizontal normal vectors) and similarly, the vertical registration accuracy was evaluated with the HICJs.
The registration accuracy is given in Table 4. As is shown, the L-junction based approach had better accuracy than Huang's method by the registration RMS of both the HICJs and the VICJs in both Guangzhou and Ningbo dataset. In Guangzhou dataset, Huang's method had lower maximum vertical accuracy (by HICJs) but higher maximum horizontal accuracy (by VICJs). In Ningbo dataset, the L-junction based approach achieved much better horizontal registration accuracy (by VICJs). The reason is that most of the LiDAR points in Ningbo dataset were on horizontal planes, thus the horizontal constraint was too weak for Huang's method. The L-junction based approach, on the other hand, can balance the constraints by controlling the numbers of HCJs and VCJs.

Discussion
With either Guangzhou dataset or Ningbo dataset, the whole process of the proposed approach took about half-an-hour manual works in searching the appropriate L-junctions for match, while other tasks (junction matching, LiDAR point searching, plane-constrained block adjustment, accuracy assessment) were fully automatic.
The junction matching accuracy was very high, the reprojection RMS of the junction-centre and the junction-branches can reach subpixel level for most L-junctions, which is very hard even for experienced workers. The experiment results of Guangzhou and

Artificially Collected Automatically Matched
Figure 8. The screen-shots of the artificially collected L-junctions (the first column) and the automatically matched L-junctions.
Ningbo datasets demonstrated that our L-junction based approach can reach very high accuracy in aligning the aerial images and LiDAR pointclouds.

CONCLUSION
In this paper, a novel L-junction based approach for semiautomatic accurate registration of the aerial images and the LiDAR pointclouds is proposed. The approach uses multi-view corresponding L-junctions to extract local planes and thus, form the planar constraint. The method of L-junction matching, using epipolar-geometry and vanishing-point constraints, achieves very high matching accuracy. The registration model is formed by adding the junction-projection costs and the plane-constraint costs to the conventional bundle block adjustment model. The registration accuracy in both horizontal and vertical directions is very high, being very close to the ranging accuracy of the LiDAR sensor and the image GSD. The proposed approach is demonstrated to be able to reach very high registration accuracy with very sparse plane-constraints (30 L-junctions for the 2415 images in Guangzhou and 17 L-junctions for the 1541 images in Ningbo), which means that users do not need to collect too many L-junctions manually.
The junction-based planar constraint is very suitable for aligning the very-high-resolution images to the sparse but accurate LiDAR data in urban areas, which is mostly the case of low-attitude (UAV/oblique) aerial photography and airborne LiDAR data. Because it is easy to find appropriate L-junctions in the urban area, and the junction-plane can be accurately extracted from the high-resolution images. For the rural areas, this approach is less suitable because selecting appropriate Ljunctions will be very difficult. When aligning the aerial images with the terrestrial laser scanning data, the full-automatic iterative-closest-point (ICP) based method is a better choice because the point density will be very high and will not influence the accuracy.
The proposed approach treats the LiDAR points as reference data and adjust the image orientation parameters. However, with the junction-based planar constraints, one can also adjust the LiDAR orientation parameters when the aerial images are accurate enough to be used as reference data or adjust the orientation parameters for both the data sources concurrently when accurate ground control points are available. Besides, there are also many works to do to improve the proposed approach. Future works include: 1. Testing the different adjustment model with the junctionbased planar constraints 2. Theoretically analysing the relationship between the junction-matching thresholds ( 1  and 2  ) and the orientation accuracy of the aerial images; 3. Robust plane detection method that can eliminate the LiDAR points which do not belong to the junction-plane. 4. Automatically collecting the appropriate L-junctions lying on the real-existed flat object surfaces.