MOUNTING CALIBRATION OF A MULTI-VIEW CAMERA SYSTEM ON A UAV PLATFORM

: Multi-view camera systems are used more and more frequently for applications in close-range photogrammetry, engineering geodesy and autonomous navigation, since they can cover a large portion of the environment and are considerably cheaper than alternative sensors such as laser scanners. In many cases, the cameras do not have overlapping ﬁelds of view. In this paper, we report on the development of such a system mounted on a rigid aluminium platform, and focus on its geometric system calibration. We present an approach for estimating the exterior orientation of such a multi-camera system based on bundle adjustment. We use a static environment with ground control points, which are related to the platform via a laser tracker. In the experimental part, the precision and partly accuracy that can be achieved in different scenarios is investigated. While we show that the accuracy potential of the platform is very high, the mounting calibration parameters are not necessarily precise enough to be used as constant values after calibration. However, this disadvantage can be mitigated by using those parameters as observations and reﬁning them on-the-job.


Motivation and Goal
In many fields, such as autonomous driving or crash tests, a large field of view is necessary for cameras observing the environment, which can be achieved by using a multi-view camera system. Often, it is not possible to guarantee overlapping views due to cost, energy or physical capacity reasons. In order to be able to capture accurate and reliable measurements using a multi-sensor system (MSS), a calibration of the whole system is needed as a first step. Assuming a rigid platform, this exterior sensor calibration is achieved by geometrically referencing and synchronizing the sensors with respect to each other. In other words, the 6 degrees of freedom (6DoF) transformation, consisting of three translations and three rotations, and information about time synchronization for all sensors with respect to each other must be determined.
For the calibration, we define a so called platform coordinate system (CS) and find the 6DoF of the sensors with respect to this CS. A least squares estimation of the unknowns is performed based on a Gauß-Markov model (GMM). The adjustment model for various scenarios is described in detail and is tested in a set of experiments. Specifically, we study the precision of the results in terms of the variance-covariance matrix of the unknowns, when varying the number of cameras, the number of different positions of the platform in the lab (we call those positions "stations" in this paper), and the effect of considering the platform CS in the adjustment.

Multi-Sensor System
Our MSS comprises two cameras, a laser scanner and a GPS/IMU positioning system; it is installed on a rigid aluminium platform * Corresponding author to be mounted on a hexacopter UAV (unmanned aerial vehicle) for surveying applications. It is advantageous to carry out the calibration with the platform being attached to the gimbal of the UAV due to complicated installing and un-installing procedures. In this paper, we are only concerned with the geometric calibration of the cameras. However, a similar procedure can be applied to laser scanners as well, e.g. (Hartmann et al., 2017).   1 shows the sensor platform. Note that the two cameras are attached with viewing directions differing by approximately 100 • , which results in a very narrow overlap between the two viewing cones only. We use Basler acA2500 − 60uc cameras with a sensor having 2592 x 2048 pixels and a pixel size of 4.8 µm x 4.8 µm. The lens has a focal length of 6 mm. The interior orientation of the cameras is determined in a separate pre-processing step and is considered to be constant.
In this multi-sensor system, time synchronization is achieved using a hardware trigger based on the GPS time signal, which is then multiplied and provided for the cameras with a frequency of 10 Hz. This, however, is not a criterion to be fulfilled in the calibration procedure described here, because the calibration is performed in a static environment and therefore does not require time-synchronized data between different sensors.

RELATED WORK
There exist numerous approaches in the literature to determine the mounting calibration of multi-camera systems with nonoverlapping fields of view. Xia et al. (2018) summarize several of these methods and divide them into different categories. The most important ones are discussed in the following paragraphs.
One category uses measuring devices with superior accuracy such as a laser tracker or theodolite. Kitahara et al. (2001) performed a large-scale space camera calibration. In order to realize a large calibration board, a small calibration checkerboard was scanned at various locations in the room using a laser scanner. Ortega et al. (2009) calibrated an outdoor distributed camera network using a laser range finder (LRF). Firstly, by registering the LRF data to an aerial image of the site, the coarse location, orientation and field of view (fov) of the cameras were estimated. Also, having the camera fov, the corresponding LRF data is found and a plane segmentation is done. Then, 3D lines are computed as intersections of perpendicular planes. In a second step the initial camera parameters were refined in a semi-automatic way by matching 2D and 3D features in a non-linear optimization.
The second category uses large-scale calibration targets. In (Liu et al., 2011a), the calibration field is divided into sub-targets for the non-overlapping views. The 6DoF between the sub-targets need not be known as long as the whole system is rigid. A minimum of four images of the target are acquired from different angles and the transformation parameters between the cameras and the global coordinate system are estimated, where the global coordinate system is based on one camera being selected as the reference sensor. In (Liu et al., 2011b), a calibration method applicable in a large area or narrow space based on 1D targets is proposed. In this work, each camera is paired with a neighboring camera and their relative rotations are calculated. Then, based on the feature point distances on the 1D target, the translation vector is determined. Strauss et al. (2014) present an approach which combines the calibration of interior and exterior orientation parameters of the cameras using coded targets and image sequences. Non-linear least squares optimization is used in the estimation procedure.
In the third category planar mirrors are employed. In this way Lébraly et al. (2010b) create overlap between different camera views. Sturm and Bonfort (2006) use a planar mirror to compute the position and orientation of one camera without direct view of the target placed side by side to the camera. Xu et al. (2015) use a similar approach by setting up one or more mirrors and calibrating cameras by solving the transformations between the cameras, including the mirrored virtual camera.
The fourth category is based on motion models. It uses image sequences to track movements and establish corresponding relationships for different fields of view. As these methods do not need any external devices, they are more cost efficient and flexible. Caspi and Irani (2002) calibrate a set of two non-overlapping cameras with a short baseline installed rigidly with respect to each other. The two sets of image sequences are then aligned using the assumption of similar changes over time within the two sequences. It is assumed that the cameras share the same center of projection, which restricts the achievable accuracy. Esquivel et al. (2007) process image sequences of nonoverlapping cameras by a structure from motion (SfM; called structure and motion, SAM, in that paper) algorithm, which estimates the positions and orientations of the cameras. The camera rig is again assumed to be rigid; however, a more general case is solved, not requiring the cameras to share a common projection center. The authors calculate the orientation, position and scale of each camera separately in an iterative adjustment procedure. In a different and mobile application, Pagel (2012) uses an approach to solve challenges of hand-eye calibration. Lébraly et al. (2010a) propose a calibration method for non-overlapping cameras aboard a vehicle observing a static scene and determine all unknowns using a bundle adjustment. Micusik (2011) proposes a solution to the relative 6DoF problem of non-overlapping surveillance cameras observing a moving object (person). The lack of conjugate points between two images is compensated by assumptions that the person is moving linearly and to a known gravity vector for both cameras. Then gravity vector can be estimated e.g. by an inertial sensor. These assumptions lead to a quadratic eigenvalue problem that is solved to obtain the unknown parameters.
The other categories are based on laser projection, e.g. (Liu et al., 2012(Liu et al., , 2013, and visual measuring instruments, e.g. (Dong et al., 2016;Birdal et al., 2016;Gong et al., 2017). Further literature can also be found on the subject. A SLAM-based automatic exterior calibration technique is presented by Carrera et al. (2011) in a two and four camera configuration, where the camera rig is fixed. A final relevant paper is (Borkar et al., 2011), which presents a method to find the positions and angles of two cameras relative to a reference point by transforming images using Inverse Perspective Mapping (IPM) and acquiring bird-eye views.
In our work we take advantage of an accurate external measuring device, namely a laser tracker, and a static environment, thus our method falls into category 1. As the calibration of the platform is part of the pre-processing stage of our research, for which a high accuracy is desired, it is established in a lab environment. While this paper deals only with the cameras, a joint system calibration of the cameras and laser scanner of our MSS is planned as one of the next steps, initially again in a controlled environment.

CALIBRATION PROCEDURE
In our work, the exterior calibration of the multi-sensor system, i.e. a 6DoF transformation for each sensor with respect to a common 3D platform CS, is estimated via non-linear least squares adjustment. The six parameters are also called mounting calibration parameters, or mounting calibration for short. In this paper, we restrict ourselves to estimating the mounting calibration of the two cameras of our system, for which the interior orientation (calibrated focal length, image coordinates of the principal point and lens distortion parameters) are assumed to be known. Considering the cameras separately, the task amounts to a spatial resection with respect to the platform CS. As mentioned before, the cameras only have a very small common field of view, therefore stereo models do not exist.
The platform CS is realized via three signalized points at three corners of the rigid platform. One point serves as origin of the platform CS, the second point determines the direction of one of the coordinate axes (here x). A plane is then defined containing this coordinate axis and the third point (and thus also the second axis (y), which is perpendicular to the first axis), finally the plane normal points into the direction of the third coordinate axis. In order to determine the mounting calibration, we position the platform in a lab, where the cameras can see a number of targets placed in different heights on the lab walls. These targets, which were measured by the laser tracker, serve as ground control points (GCPs), cf. fig. 2.

Relationship between Coordinate Systems
The relationship between the coordinate systems used in the calibration is described in the following.
1. Placing a laser tracker in the lab, we can determine the coordinates of the GCPs and the three platform points (P1,P2, P3) in the CS of the laser tracker, which we call the object CS. As a result, the coordinates of all GCPs and of the three points are available in this system. Fig. 2 shows the GCPs, the three points on the platform, as well as the object, platform and camera coordinate systems. 2. Using the three points on the platform as identical points in object and platform coordinate systems, the parameters of a 6DoF transformation between the two systems can be computed, and the object coordinates of the GCPs can be transformed into the platform CS as shown in eq. 1.
where X object = (XO, YO, ZO) T : 3D coordinate vector of a point in the object CS X platform = (XP , YP , ZP ) T : 3D coordinate vector of a point in the platform CS R 0P : Rotation matrix to rotate from the platform CS into the object CS X 0P : 3D translation vector between the origins of the object and platform CS.
3. As mentioned earlier, the mounting calibration (exterior orientation of the cameras in the platform CS) can be determined via spatial resection, based on the GCP coordinates in the platform system. In order to do so, we can use another 6DoF transformation and the standard collinearity equations, see eqs. 2 and 3. Here, the camera system is a right-handed system, where the y-axis points downwards and the z-axis in the viewing direction.
where Xcamera = (XC , YC , ZC ) T : 3D coordinate vector of a point in the camera CS R PM : Rotation matrix to rotate from the camera CS into the platform CS (camera viewing direction, part of the mounting calibration) X PM : 3D translation vector between platform and camera CS (camera projection centre, part of mounting calibration).
where x, y: Image coordinates of a point XC , YC , ZC : Point coordinates in the camera CS x0, y0, c: Parameters of interior orientation of the camera (principal point and calibrated focal length) ∆x = f∆x(x, y, k1, k2, k3, p1, p2) ∆y = f∆y(x, y, k1, k2, k3, p1, p2) ∆x and ∆y are the lens distortion corrections, k1, k2, k3 are radial distortion parameters, and p1, p2 are tangential distortion parameters of the camera. The distortion model of OpenCV is used in this context 1 .
For R 0P the angles ω, φ, κ were used according to the representation of rotations often found in photogrammetry, however, in the rotation R PM of the mounting calibration, in order to avoid a possible gimbal lock the angles α, ζ, κ are used to represent the rotation from the platform to the camera CS. Here, α is the rotation around the Z axis, ζ around the (rotated) Y axis and κ is a second rotation around the (rotated) Z axis. These three angles define the camera viewing direction (α, ζ) and rotation around the viewing direction (κ) with respect to the platform CS.
Pixel coordinates of all visible GCPs are observed in the images. To ensure a geometrically stable solution, the targets should be well-distributed in the images. To determine initial values for the mounting parameters used in the subsequent adjustment, one can first determine approximate values for the pixel coordinates of four targets, preferably lying on the four corners of the image, either manually or automatically. In our case, due to the small number of points needed, this step was carried out manually. Spatial resection can then be carried out based on the Müller-Killian algorithm, discussed in (Müller, 1925) and (Killian, 1955).
Having thus determined initial mounting parameters for the camera and the 3D coordinates of all targets in the platform CS, their initial pixel coordinates can be determined by projecting the 3D coordinates into image space. Pixel positions can then be refined using any method for precise corner point location (we use the one described in (Förstner and Gülch, 1987)). Subsequently, a least squares adjustment is carried out to estimate the mounting parameters (see section 3.2 for details of the used functional model).
4. Obviously, the whole procedure can be repeated various times by moving the platform to another station in the lab. Such additional measurements offer a means to check the computed mounting calibration and to increase the precision and reliability of the unknowns. We have carried out measurements from two different stations and have processed the data separately as well as simultaneously (see section 4 for details).

Adjustment Approach
The determination of the mounting parameters is based on an extended bundle adjustment in the form of a GMM. Non-linear observation equations are derived by introducing eq. 1 into eq. 2, and the result into eq. 3, giving eq. 4.
We use direct observations for the GCP coordinates in the object CS and for the object coordinates of the three points on the platform. For the latter, we also introduce their coordinates in the platform system in order to be able to position the platform CS with respect to the object CS. In order not to have to define the platform coordinates with superior precision, we select the weights for the observations of the three platform points in the platform CS to allow for a determination of the transformation without constraints: the three coordinates of Point P1 obtain very high weights, the Y and Z coordinates of P2 and the Z coordinate of P3 also, whereas the other coordinates obtain low weights. In this way the six transformation parameters can be determined and the other coordinates of these points do not affect the parameters in a negative way.
The resulting non-linear observation equations are shown in eq. 5 for the image coordinates, eqs. 6 and 7 for the 3D coordinates of the GCPs and the three platform points in the object CS, and eq. 8 for the points on the platform in the platform CS respectively. v denotes the residuals and k is the number of GCPs which are visible in the images.
We thus use the following observations: • Image coordinates (x, y) of GCPs • 3D coordinates of GCP targets in the object CS (X GCP j ,object , Y GCP j ,object , Z GCP j ,object ); j ∈ {1, ..., k} • 3D coordinates of platform points Pi in the object CS (X P i ,object , Y P i ,object , Z P i ,object ); i ∈ {1, 2, 3} • 3D coordinates of platform points Pi in the platform CS (X P i ,platf orm , Y P i ,platf orm , Z P i ,platf orm ); i ∈ {1, 2, 3} to determine the following unknowns: • Camera mounting calibration (X0, Y0, Z0, α, ζ, κ) as elements of X PM and R PM for each camera (see eq. 2) • 6 parameters for the transformation between the object and the platform CS contained in X 0P and R 0P for each station (see eq. 1) • 3D coordinates of GCP targets in the object CS (X GCP j ,object , Y GCP j ,object , Z GCP j ,object ); j ∈ {1, ..., k} • 3D coordinates of platform points Pi in the object CS for each station (X P i ,object , Y P i ,object , Z P i ,object ); i ∈ {1, 2, 3} In the stochastic model, the variance-covariance matrix is assumed to be diagonal. For all observations, appropriate variances, and thus weights, are selected (numerical values are given in Section 4). The solution is based on the well known formulae of GMM in the nonlinear case, see eq. 9. Here,∆x is the difference between the vector of estimated unknowns in the last two iterations, ∆l is the difference between the observation vector and the observations which are acquired by the initial assumptions of the parameters. The Jacobi matrix A contains the derivatives of the observation equations with respect to the un- The adjustment is carried out iteratively according to the standard formulae. The stopping criterion is based on the relative change rate of the sum of the weighted squared residuals Ω between the previous and the current iteration: ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume V-1-2021 XXIV ISPRS Congress (2021 edition)

Goals
In this section, we report the precision of the adjustment results, and in particular of the mounting calibration parameters, in different scenarios. Specifically, we vary the number of cameras, the number of stations, and the effect of introducing the platform CS into the adjustment.
The background of this study is the fact that the platform must be small to be flown on a UAV, and thus the three points defining the platform CS are located within a few centimeters of each other (10 -15 cm). For the transformation between the object and the platform CS, these three points serve as identical points. While the platform coordinates of these points can be defined as mentioned above, the object coordinates are determined using a laser tracker from a distance of several meters. As a result, the elements of the rotation matrix R 0P between object and platform system can only be determined with a limited precision.

Data Acquisition
Data acquisition for the exterior calibration was done in a lab with reference targets on the walls and ceiling which serve as GCPs. The object coordinates of these targets as well as those of the three platform points were accurately measured using a laser tracker. Initial values for the transformation parameters between object and platform CS were determined using the commercial software SpatialAnalyzer 2 . The platform coordinates of the three points on the platform are shown in The UAV platform was placed in the middle of the room with the two cameras tilted about 30 • upwards. The focal length of 6 mm and the relatively long distance to the targets allow for a wide field of view and a minimum of 10 targets to be seen in each image (two of which appear in both cameras). After capturing two images of the scene in station 1, the platform was slightly rotated and moved (approximately 20 • and 15 cm, respectively) and another two images were taken. The images can be seen in Fig. 3 for the left camera and in figure 4 for the right camera.
One pixel in an average distance for camera to objects of 5 m covers an area of 4x4 mm 2 in object space. The actual distance of the cameras to the targets varied from 3 to 6 m. Targets closer to the platform cover an area of up to 90x90 px, while the (few) targets further away were depicted in slanted view and with as few as 20x30 px. In addition, a number of targets were also viewed from a slanted direction, which might increase the standard deviation of the corresponding image coordinates.

Assumptions about the Stochastic Model
Uncorrelated observations and a priori standard deviations for each observation type are assumed as follows.
For the platform coordinates of the three platform points, this high weight was only used for the six coordinates defining the transformation between the object and platform CS (i.e., those with a value of 0.000 in table 1, while a very low weight was chosen for the other three coordinates. Furthermore, the a priori standard deviation of the weight unit is set to 1.

Results
As mentioned before, we compare the precision of the adjustment obtained in different scenarios: 1. Cases 1a and 1b: the images of one camera only, but of both stations are used (1a refers to the left camera, 1b to right camera); 2. Case 2: the images of both cameras and both stations are employed simultaneously in one adjustment; 3. Case 3: same as case 2, but the transformation from the object to the platform system is assumed to be constant and error-free. For this experiment we transformed the object GCP coordinates to the platform CS using the parameters determined via SpatialAnalyser, and then considered the platform CS to be the object CS. Consequently, the three points on the platform were not needed any longer, and we dropped equations 7 and 8 from the adjustment.
4. Case 4: a special variant of case 2, where only one station is used.
5. Case 5: a special variant of case 3, where only one station is used. In other words, case 5 is similar to case 4, with the difference that the object CS is the platform CS.
Case RMS val. of unit weight RMS value of reprojection error per image [px] Max. residual [px] Precision of 3D coordinates [mm] Station  Table 2. Adjustment results. The first columns show the image-based data, namely the unit weight error of the adjustment ( Ωcurr/(n − u) ; where n is the number of observations and u the number of unknowns), the reprojection errors, for each image separately and for all targets, gathered as one value by the root mean square (RMS), and the maximum residual (highest reprojection error in image plane) seen in the adjustment (in x or y, however not specified in The results are presented in tables 2 and 3, in which the precision values are determined on the basis of the estimated RMS error of the unit weight and the inverse of the normal equation matrix. They can be interpreted as follows: 1. The a posteriori RMS value of unit weight is smaller than 1 in all cases except case 3, where it is 1.08. Thus, the introduced standard deviations (see section 4.3) are somewhat too large. However, the results shown in table 2 are acceptable, and fine-tuning these standard deviations, e.g.
via variance-covariance estimation, is beyond the scope of this paper.

2.
A similar behavior is found in the RMS values of the reprojection errors, which in most cases are smaller than the assumed value of 0.8 pixel. The general range of the reprojection errors is not surprising considering the employed tool for image coordinate determination; a better precision can be reached with dedicated measurement algorithms. The values for the left camera (case 1a and upper line in cases 2 -5) are smaller than those for the right one (case 1b and lower line in cases 2-5). A possible reason is that the interior orientation parameters of the right cameras might have changed slightly between interior calibration and mounting calibration procedures. The reported values are rather consistent across the different cases, perhaps with the exception of case 3, for which the left camera has slightly larger values. This can partly be explained by the fact that any inconsistencies of the adjustment can no longer be absorbed by the transformation parameters from object to platform CS, and are thus more visible in the image coordinates.
3. The maximum residuals amount to a little more than 2 pixels. The maximum residual is located, in all cases except case 1a, in the right image and belongs to the small, slanted target, which is not surprising.
As the values discussed so far are image-based, major differences between the different cases were not expected, and are not visible in the results.
4. The estimated precision of the 3D coordinates varies (last column of table 2), but this variation is mainly due to the change of the variance factor of the unit weight. The largest part of it can be explained by a pure scaling of the inverse of the normal equation matrix by the RMS value of unit weight due to the high accuracy of the laser tracker measurements in comparison to the camera measurements in object space.

5.
The estimated values of the mounting calibration parameters are nearly all consistent in all cases, taking into account their precision.
6. While the standard deviations for the components of the translation vector between platform and camera are nearly identical in all cases, differences can be found in the angular values. The difference of the standard deviations in case 1 between the left and right camera can be explained again by the difference of the RMS value of the unit weight (see also point 4). Comparing case 1 to case 2 we find contradictory results: for the right camera, case 2 delivers more precise results, which is expected, as the redundancy for the mounting calibration parameters is higher, the inverse is true for the left camera. At this stage the reason is not clear, but a comparison with table 2 suggests that, perhaps, the results of case 1a are a little too optimistic.
7. The potential of the MSS can be seen when looking at the precision for the rotation angles of case 3. Here, compared to cases 1 and 2, the achieved standard deviations are smaller by a factor of 3 to 5, demonstrating the high accuracy potential of the developed system. As a consequence, when using the system for high accuracy applications, the rotation angles of the mounting calibration should not be considered as constant values, but as observations with the precision obtained in the calibration phase, similar to a self-calibration scenario. Consequently, some scene information must be available to improve the precision of these mounting angles.
8. Case 4, where only a single station was used, shows a somewhat higher standard deviation compared to case 2; which is a consequence of having fewer observations and therefore a lower redundancy. However, the differences are not significant; therefore introducing more than two stations is not expected to yield much improvement in terms of precision; however, in that case, detection of blunders can be improved and the setup will be more reliable.
In general, case 5 shows the same behavior to case 4 as case 3 does to case 2, which is expected.

CONCLUSIONS AND OUTLOOK
In this paper, we have presented an approach to determine the mounting calibration of an MSS with a focus on two cameras with non-overlapping fields of view. To do so, we have employed a static lab environment with GCPs and have computed the mounting calibration relative to a platform assumed to be rigid. Hence, the 6DoF of one sensor relative to another can be calculated easily, as long as it remains constant over time.
Being based on single camera resection, this approach works with any number of cameras. If cameras have a common field of view, the resulting ray intersections are automatically taken into account, as the approach is based on photogrammetric bundle adjustment.
The results show that while image coordinate determination still has a potential for improvement, the calibration procedure works well, and the obtained precision for the calibration parameters have reached the expected values. However, as they are determined indirectly using a laser tracker, additional object space information should be available for the mounting angles in order to be able to exploit the full accuracy potential of the multi-sensor system. A general benefit of using a laser tracker is the high accuracy with a validation of the results by reprojection. It can therefore be assumed that the estimated precision is close to the accuracy of the calibration.
Future work will address the common calibration of the cameras and the laser scanner in both, static and dynamic environments. Also, we will carry out georeferencing experiments of the platform based on building models similar to (Unger, 2020), but with input from the cameras and the laser scanner.