FUSION OF PHOTO WITH AIRBORNE LASER SCANNING

Photogrammetry and Laser-Scanning are usually considered as complementary. Integration of these two observation methods has the potential to blend their individual advantages. The resulting benefit is likely to be higher in drone airborne mapping, which payload capacity (and thus the quality of the embedded IMU) is limited. Thus, the trajectory computed by the IMU is subject to important time-dependent errors: even if the global attitude is less adequate, it is self-coherent locally. For this reason, we propose a close integration of Photogrammetry with Laser-Scanning based on the correction of time-dependent error of the trajectory with the help of the image observations acquired by the camera. Apart from the trajectory, this hybridization requires optical correspondences between image and Laser measurements. Such full set of input data is rigorously fused together in a Bundle-Adjustment in order to better determine the trajectory, and thus the resulting point-cloud. The presented theory was practically evaluated in an airborne case against a reference solution.


INTRODUCTION
Photogrammetry and Laser-Scanning have been successfully used for countless mapping applications such as land usage analyses, cultural heritage site preservation, civil engineering infrastructure inspections, and so on. The exponential development of small drones 1 over the last decade has allowed for the transposition of plane and helicopter airborne photogrammetry into drone airborne photogrammetry, with the advantages of mapping small and inaccessible areas at a reduced cost (Pfeifer et al., 2012), (Rehak et al., 2014). More recently, the miniaturization of LIDAR sensors has allowed them to also be embedded in micro-UAVs (e.g. Velodyne Puck (Velodyne Lidar, 2020) and Riegl Vux (RIEGL -RIEGL VUX-1 UAV, 2020)).
The two mapping methods, Photogrammetry and Laser-Scanning can be seen as complementary (Table 1). This complementarity opens potential for the method of fusion to improve the final mapping product, in aspects as geometric precision, radiometric precision and exhaustiveness 2 . The complementarity could also be advantageous in further analyses, such as segmentation, classification and object recognition. In particular,several decades of experience in image processing ( (Fua, Hanson, 1987), (Fua, 1989), , (Achanta et al., 2012), (Sun, 2015), (Achanta et al., 2018)) has enabled the possibility to automatically recognize and segment geometrics features such as polygons. The precise and rigorous knowledge of the images IO and EO with respect to the points acquired by the LIDAR sensor could open a wide range of new possibilities in segmenting the point-cloud thanks to the images, and thus help the 3D CAD model reconstruction.
Data from several sensors are said to be loosely coupled when substancial pre-processing is performed separately for each sensor before integration. Data are said to be closely coupled when the data are fused together at an earlier stage. In the scope of Photo-LIDAR fusion, (Guidi et al., 2004) proposes a loosely coupled integration as the traditional photogrammetric processing chain is independent from the LIDAR one; the point-cloud generated via photogrammetry and the one from LIDAR are merged only after each is created separately. Photo-LIDAR closely coupled data integration is possible, however, and permits the use of the benefits from one method to help the other, and vice-versa. In particular, LIDAR measurements are taken one after the other, where each measurement represents a single point of an object and is considered independent from the next point, as the sensor moved between the subsequent measurements. Conversely, photos taken by global shutter cameras of photogrammetry give a coherent representation of [a sub-set of] an object at a given time. Direct geometric relationships can then be established between different points visible on the same image. This global coherence of any single image can give coherence both to the reconstructed 3D model and to the trajectory of the platform.

Photogrammetry
Laser scanning (Passive sensor) (Active sensor) + Sufficient for − Direct Sensor orientation indirect orientation dependent + Precise in planimetry − Less precise in altimetry + Precise in altimetry − Needs overlap to create + Direct measurement of 3D: (sensitive to 3D points (less sensitive to object occlusion) object occlusion) − Sensible to illumination + Insensible to illumination − Needs texture + Insensible to object texture − Problematic in + Penetrates tree canopy high vegetation Table 1. Comparison Photogrammetry/LIDAR ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume V-1-2020XXIV ISPRS Congress (2020 Forms of closely coupled integration are proposed in (Mandlburger et al., 2017), (Glira et al., 2019), (Glira et al., 2016), (Glira et al., 2015b) and (Glira et al., 2015a). In addition to these approaches, there is a possibility to adapt Bundle-Adjustment to interfere image observations (raw data of photogrammetry) with LIDAR raw observation. This requires establishing links between raw observations of both acquisition types. The principle and the weakness of links used in the works cited above will be discussed in section 2 together with the proposition of a stronger type of link.
The motivations for closely integration photogrammetry with LIDAR are as follows. a) IMU adapted for micro-UAV are small & low-cost, and generally not sufficiently precise for direct orientation of the platform in the scope of laser scanning. b) However, the trajectory is locally coherent i.e. the relative orientation of temporally close values is precise. c) Thus, the local shape of the trajectory described under point b) could be used under two well-oriented photos.
The paper contributions are organised as follow. First, different forms of optical correspondences are reviewed in terms of proposed sensor hybridization, where the goal is to improve the global redundancy and accuracy. Second, a rigorous interpolation is presented between subsequent photos and inertialderived relative orientation. This will allow, along with model description, the rigorous merge of the whole set of observations into the Bundle Adjustment: (Section 3). This method will then be tested on low quality IMU data and compared to high-quality results: (Section 4).

Tie-points: Link image-image
The prerequisite to Photogrammetry is the ability to recognize similar points (or other types of geometric features) between images representing the same real object. The coordinates of these points on the images are the image observations. The matching between a point observation on one photo and a corresponding point observation on another photo makes this point a tie-point (first column of table 2). This link between the observations is needed to produce the 3D position of the point itself, EO and IO 3 .

Coplanarity: Link LIDAR plane-LIDAR plane
In the scope of airborne LIDAR, the first link between observation sets acquired at different times is the plane to plane link (Skaloud, Lichti, 2006) and (Hebel, Stilla, 2011). There, the goal was to link planes representing the same building roof from different flight-lines to calibrate the boresight-matrix and possibly other parameter in LIDAR and IMU sensors (column 2 of table 2). The weakness of constraining two planes to be coplanar is that one could slip on the other, hence, planes of different slope and aspect needs to be present.

Point to Patch: Link LIDAR point-LIDAR pointcloud
The plane to plane method described in sec. 2.2 has been generalized in (Kersting et al., 2012) to generic surfaces. The established correspondence aims to constraint a point acquired by the laser scanner during one flight line (blue point of column 3 of Table 2) to a surface defined by several points acquired during another flight-line (red point-cloud of column 3 of Table  2).

Homologous point: Link LIDAR point-LIDAR point
Two LIDAR points taken at different moments could also be matched (represented by a double arrow in the third column of table 2). The naïve method of selecting them manually in a point cloud is not rigorous since the object surfaces sampling is somewhat random due to the nature of acquisition.
To input a match between two LIDAR points that are geometrically close but temporally spaced, the detected difference vector must be considered relative to LIDAR EO at the time of measurement of one of the two points.
The following methods (1-4) permit the matching of LIDAR points.
1. (Jayendra-Lakshman, Devarajan, 2013) suggests to first rasterize the pre-processed LIDAR point-cloud, then apply raster-based point detection and matching algorithm such as SIFT.
2. Target could be placed by the operator on the ground, and be detected within point-cloud.
3. Cloud-to-cloud registration algorithms such as ICP can be applied on two small point-cloud samples representing the same object, in order to find a correspondence. Adapted versions of the ICP algorithm have been used in (Glira et al., 2015b) and (Glira et al., 2015a) for data adjustment.
4. (Gojcic et al., 2019) proposes performing the cloud sample registration with a neural network.

Tie-Point to LIDAR point: Link Photo-LIDAR
Close photogrammetric-laser scanning integration needs links between photo and LIDAR data. This link could be established between a point acquired by the LIDAR and the corresponding 2D point in an image. The methods previously described for homologous point registration can be adapted, in particular for the use of rasterized point-cloud (Kumar Mishra, 2012).
Another method to match a 3D point with its corresponding 2D point on a photo is by using building roof vertex corners. This shape is characterized by the exact intersection of three planes (usually oblique). In the pre-processed point-cloud, the three planes could be fit with robust algorithms and then intersected.
In the image, the 2D point is the over-intersection of three lines.
2.6 Tie-Point to LIDAR point cloud: Link Photo-LIDAR (Glira et al., 2019) proposes to match tie-points (or points from photos computed by dense matching) to the LIDAR pointcloud. The proposed link aims to constrain the tie-point to be on the surface described by the LIDAR point-cloud (for clarity in the following explanation, we will consider this surface to IMU LIDAR Camera Figure 1. Description of the complete trajectory based on camera pose parameters only. No photos are taken at time t. Γt is not a parameter but could be computed from Γi and Γj via interpolation of double differences be close to horizontal). This constraint acts perpendicularly to this LIDAR point cloud surface, i.e. vertically (represented by a double arrow in the last column of Table 2). The tie-point could thus slip horizontally on this surface.
A tie-point in airborne photogrammetry is generally more precise in planimetry than in altimetry. This implies that altimetry could easily be modified by other information, such as the constraint of lying on a (close to) horizontal surface. The corollary of this property indicates that this constraint does not bring much information. The image-point to LIDAR-point link 2.5 is thus preferable for photo-LIDAR fusion because it contains more information.

METHODS AND MODELS
The links described in section 2 will be used as additional constraints in a Bundle-Adjustment in order to determine the most probable values of the so-called parameters, describing the mapping process. The choice of the parameters is an extremely important part of Bundle-Adjustment design (3.1) together with observation models 3.2, 3.3, 3.4 and 3.5 is the relation between the measurements (or observations) to the parameters.

Parametrizing variables
The first parameters to be considered are the 3D points on the ground, as well as other terrain-object modelling parameters. For example, the primitive geometrical feature of point P on Figure 1).
The trajectory could be described by the position and the orientation of the IMU at each IMU measurement. However, this will lead to a tremendously high number of parameters to determine, and thus to a high complexity. We propose to parametrize the trajectory only by using the position and the orientation of the camera while a photo is taken.
The position T (camera perspective center in local frame) and the orientation R (from camera to local frame) of the camera are aggregated into the matrix Γ ∈ SE3 using homogeneous formalism (Equation 1).
The entire trajectory has been pre-processed using the INS/GNSS integration (top of Figure 1). This pre-processed trajectoryΓ is subject to time dependent errors that must be corrected in the Bundle-Adjustment. This pre-processed trajectory will act as a measurement between two successive poses by virtue of relative orientations (Sec. 3.3), and allows the determination of the position and the orientation Γt of the camera at any time t between two poses. This Γt is related to the position and the orientation of the LIDAR sensor by the lever-arm and boresight matrix encapsulated in the SE3 matrix Γ bs .

Camera model: collinearity equation
The camera model is based on a corrected pinehole camera model: a tie-point P projects 4 to the image observation tp on a photo. Γ describes the position and orientation of the camera at the time of the photo.

Aerial control
The position measurement acquired by the embedded GNSS antenna (or INS/GNSS integration point) must be translated to the camera with the lever-arm − → a (from GNSS antenna phase center or IMU-centre respectively to camera perspective center in camera frame).
The IMU position and orientationΓ computed by the preprocessed IMU trajectory can also be input as measurements by virtue of relative measurements (Rehak, Skaloud, 2016).Γi andΓj correspond to IMU pre-processed trajectory for two consecutive poses i and j , whose camera pose parameters are Γi and Γj.
Note that this method of relative orientation permits the removal of the boresight matrix between the camera and the IMU.
The relative observations are weighted accordingly to their precision computed with (Rehak, Skaloud, 2016). 4 The collinearity equation 4 needs a so-called projection matrixΠ, whose aim is to remove the last unitary coordinate used by homogeneous coordinates formalism.Π The projection function π : R 3 → R 2 models the pinhole camera model while the function ξ : R 2 → R 2 models the camera interior orientation: effect of the principal distance, principal point, skewing parameters, radial and tangential distortions.

LIDAR model
While images are acquired at discrete moments (low frequency < 1 Hz), LIDAR measurements are acquired continuously (high frequency > 20kHz). The measurement LIDAR acquired at time t must be associated with the pose determination Γt. The pre-computed IMU position/orientation determination from IMU measurementsΓt must be corrected with the knowledge of the camera pose preceding (of index i) and succeeding (of index j) the LIDAR observation. The ratio τ ∈ [0, 1] quantifies the time difference between the lidar measurement event and the image i. It is null when the lidar measurement time coincides with photo i and one when the lidar measurement time coincides with photo j.
This ratio τ permits the interpolation 5 of the double-differences of relative orientations computed from the camera EO and IMU measurements.
5 The interpolation of SE 3 matrices are performed in the tangent-space of SE 3 denoted se 3 . The • operator transforms a R 6 vector into an element of the tangent-space se 3 .
The lidar measurement LIDAR (position of the point in the LIDAR sensor frame) could be expressed in the frame of the camera due to the boresight/lever-arm matrix Γ bs from the camera to the LIDAR sensor. This Γ bs could be known from previous calibration, or re-determined in the Bundle-Adjustment.
This approach is inspired by (Glira et al., 2016) and (Glira et al., 2015b), but is more rigorous as the 6 components of the trajectory (3 position and 3 orientation) are considered together (using lie-group formalism and theory).

GCPs
The GCP observation model is trivial as it relates directly the parameters P of a given point on the ground to a direct measurement of this same point with terrestrial independent methods (terrestrial GNSS, tacheometry, etc.). GCP = P (12)

Bundle-Adjustment
The real measurements related to the observation models presented in 3.2, 3.3, 3.4 and 3.5 are subjects to inherent errors due to the sensor and measurement processes.
The observations are going to be confronted with each other in an optimization process called Bundle-Adjustment in order to determine the most probable value of the parameters. In the scope of this work, the least-square criterion has been used, but other criterions (more robust ones for example) could be used.

RESULTS
A typical application for the proposed method of LIDAR/Photo data fusion is airborne mapping. The following test on orientation improvement with respect to low-cost INS-GNSS attitude determination has been set up in order to study the benefit of photo-LIDAR links. We propose the following practical benchmark. We embark high performance aerial control and navigation sensors on the same platform as low-cost IMU (UAV-LIDAR) sensors and fly them on a helicopter at altitude and speed mimicking UAV flight. Then, we study the gain on orientation with respect to the reference. We hypothesize that there is more room for progress for LIDAR sensor miniaturization than for IMU. This justifies the use of the same LIDAR sensor for experimenting and ground-truthing, while two different IMU will be used.
The chosen test site is situated in a hilly area at the corner of five towns: Bremblens, Romanel-sur-Morges, Aclens, Vufflens-la-Ville and Bussigny (VD, Switzerland). It presents a variety of terrain types: bare ground, crops, forest, and industrial buildings, and it is also crossed by roads, train railways and a hightension power line.
The N-S extension of the mapping area is 1.6 km while the E-W extension is 1.7 km (Figure 2).
The sensors were embedded on a helicopter as described in . The flight velocity was about 13 m/s (min: 8 m/s, max: 18 m/s) which is compatible with the velocity of a quadcopter or a fix wing drone. The flight height was between 250 m and 300 m above ground. The following subset sensors was used in this study.
• Laserscanner: Riegl VQ480U, using a rotating polygon mirror technology. The datasheet characteristics are 25 mm of accuracy with a Laser Beam Footprint of 9 mm at 300 m (Riegl, 2015).
• IMU: a low-cost MEMS-IMU (NavChip V1/2011, Thales) mounted on a gecko board (Kluter, 2012). The datasheet characteristics for the drift in run bias stability is 15 • /hr while switch-on bias variation is not specified. The noise is estimated to be 0.3 • / √ hr. Ground Truth point-cloud (in blue on Figure 3) has been computed from LIDAR data using LIEO (Skaloud, 2017) and the trajectory given by the navigation grade IMU (AIRINS) together with GNSS measurements. The same method has been used to compute the point-cloud that would have been produced with the consumer grade IMU (in pink on Figure 3).
This generated point-cloud originated directly from the INS/GNSS trajectory, is subject to time-dependence errors as analysed in  and (Vallet et al., 2020). Figure 3 represents the point-cloud acquired on an overlap zone: the same object has been measured in two different flightlines (thus, at two different moments in time). The ground truth point-cloud is coherent (blue), while the point-cloud generated with the consumer Grade IMU shows that the same object duplicated (pink).  The reference point-cloud data permits the simulation of 27 links between LIDAR measurements and image observations (corresponding on the terrain to the blue dots on Figure 2). In a practical application, these links would have been determined based on the methods described in 2.5. Observations from the camera, the LIDAR, GNSS and consumer grade IMU have been processed together in a Bundle-Adjustment renforced with these links as additional observations and 11 GCPs. The resulting point-cloud (in yellow on Figure 3) is self-coherent, unlike the one generated without fusion of photos (in pink on Figure  3). It has been compared to the Ground Truth. Figure 4 displays the absolute errors in point-cloud due to orientation errors of low-cost IMU (1 uncalibrated unit) after INS/GNSS fusion. The adjusted point-cloud (after fusion with photo) is displayed in the lower portion of the same figure. It can be seen that on the overlapped areas, the orientation errors were mitigated, so the vertical (and also the planimetric) errors are reduced below 15 cm. The errors in resulting DTM are likely to be even lower (< 10 cm) due to the effect of averaging.
The benefit of the fusion with photogrammetry is quantified by comparing the distance between the Ground Truth and the produced point-cloud, both without (first row of Figure 4) and with (second row of Figure 4) photogrammetric fusion. The improvement ratio of the fusion can be defined as the ratio between the true-error of the results of the fusion and the true-error of the results without fusion (where the true-error is computed as the distance from Ground Truth). This ratio is equal to 1 if the fusion brings no improvement, is below 1 if the fusion worsens the results, and is above 1 if the fusion mitigates the orientation errors. The improvement ratio varies with the location on the terrain. For example, on the area shown on Figure 3, the improvement ratio is approximatively 5. Figure 5 is the histogram of the ratio for each LIDAR points of the survey. The quantile at 0.5 % of the ratios is 1/2 and the quantile at 99.5 % is 7. Thus, 99% of the ratios belong to the interval [ 1/2 − 7 ]. The ratio is inferior to 1 only for less than 7 % of the points. Indeed, these points for which data-fusion worsen the results belongs to areas which are far from any Photo-LIDAR link. The geometric mean of all ratios is 2. This shows that globally, the fusion between photogrammetry and LIDAR greatly improves the precision of the final point-cloud. Further analysis has been proceeded with the trajectory computed from a SIMU (Synthetic IMU) composed from 4 calibrated IMU as described and analyzed in  and (Vallet et al., 2020). The improvement ration of the fusion of SIMU with LIDAR and photo is inferior (1.75) to the improvement ration of a single IMU with LIDAR and photo. The benefit of photo-LIDAR fusion decreases as the quality of the IMU increases.

CONCLUSION
We have proposed a new method for GNSS, IMU, LIDAR and photo data fusion utilising strong correspondences between photo and LIDAR as additional observations in a Bundle-Adjustment. The additional information provided by photogrammetry permits the improvement of the absolute trajectory determination over the period of time between photos during which relative IMU observation are self-coherent, and thus builds a point-cloud with smaller geometrical deformations. Experiments are performed on real data, but with simulated correspondences between photo and LIDAR observation (links), showed that even a small number of these links can globally improve the absolute 3D point-cloud precision by a factor of 2. With the long awaited miniaturization of LIDAR sensors, the proposed method could have an important impact on drone LIDAR mapping by reducing IMU volume and weight, improving the geolocalization and thus increasing the area that can be mapped.

FUTURE OUTLOOK
This study raises the need for accurate automated links between photo and LIDAR data. It focuses on points but the presented fusion method could be extended with different links between Photo and LIDAR based on other geometrical such as lines. Indeed, a line could be detected either in a pre-processed pointcloud (via plane intersection) or with images (as in Figure 6 from (Cledat, 2019)). (Kumar Mishra, 2012), (Chen, Shibasaki, 1998) and (Taillandier, Deriche, 2012) give the theory for linebased photogrammetry while (Pujol-Miro et al., 2017) proposes a method of image registration with respect to point-cloud of a particular type of line: Contour Cues. A preliminary study of line-based Photo-LIDAR fusion has been presented in (Cledat, 2019).
The presented approach is based on a pre-processing of IMU data to generate a trajectory. It aims to correct the time dependent trajectory errors up to a certain extent. The integration of IMU data could be more closely achieved from Photos and LIDAR data with the help of Dynamic Network (Cucci et al., 2017).
Finally, this method could be applied using different camera mounts (for example, an oblique camera or an horizontal camera could help determining the azimuth of the system), different types of camera (fish-eye camera) on different platforms (terrestrial mobile mapping handle by robots or humans).

ACKNOWLEDGEMENT
The data acquisition and its initial processing were achieved by Dr. Julien Vallet: the director of Helimap System. We would like to warmly thank him for his tremendous contribution for the inputs needed for the experimental evaluation of the method. Most of all, we would like to thank Annie Guillaume for her nice and precious help in proofreading the language.