CAMERA CALIBRATION USING THE DAMPED BUNDLE ADJUSTMENT TOOLBOX

Camera calibration is one of the fundamental photogrammetric tasks. The standard procedure is to apply an iterative adjustment to measurements of known control points. The iterative adjustment needs initial values of internal and external parameters. In this paper we investigate a procedure where only one parameter — the focal length is given a specific initial value. The procedure is validated using the freely available Damped Bundle Adjustment Toolbox on five calibration data sets using varying narrowand wide-angle lenses. The results show that the Gauss-Newton-Armijo and Levenberg-Marquardt-Powell bundle adjustment methods implemented in the toolbox converge even if the initial values of the focal length are between 1/2 and 32 times the true focal length, even if the parameters are highly correlated. Standard statistical analysis methods in the toolbox enable manual selection of the lens distortion parameters to estimate, something not available in other camera calibration toolboxes. A standardised camera calibration procedure that does not require any information about the camera sensor or focal length is suggested based on the convergence results. The toolbox source and data sets used in this paper are available from the authors.


INTRODUCTION
Camera calibration is one of the fundamental photogrammetric tasks.The standard procedure is to apply an iterative adjustment to measurements of known control points.The iterative adjustment needs initial values of the internal and external orientation, and optionally any object points.The theory and formulation of radial and de-centring lens distortions from (Brown, 1971) have been adopted in close range photogrammetry as well as in e.g. the popular free Computer Vision Camera Calibration Toolbox for Matlab 1 and in the commercial Computer Vision System Toolbox for Matlab 2 based on the work by Zhang (2000).The aid of coded targets is now the norm for off-line automatic camera calibration (Fraser, 2013).Automatic camera calibration using self calibration is a standard in most of the image based dense matching tools available (Snavely et al., 2008;Furukawa and Ponce, 2009), whatever the kind of camera used, but unfortunately detailed quality results from the adjustment are usually not available for the end users.
In this paper we focus on the progress of the bundle adjustment during the calibration process.Initial values for the internal parameters can be obtained from several sources, including the camera manufacturer, EXIF information in JPG images, or prior knowledge.We investigate a procedure where the focal length is the only parameter that is given a specific initial value; the other parameters are given standard initial values or computed from the data.Our target is to investigate the pull-in range of this procedure using the set of bundle adjustment techniques available in the free Damped Bundle Adjustment Toolbox for Matlab, described in Börlin and Grussenmeyer (2013a), and to suggest a 1 http://www.vision.caltech.edu/bouguetj/calibdoc 2 http://www.mathworks.com/products/computer-visionstandardised procedure that does not require any camera knowledge.

CAMERA CALIBRATION
If the lens distortion (LD) parameters are included, estimating the internal camera parameters is a non-linear problem, usually performed by bundle adjustment (Luhmann et al., 2006, Ch. 7.2).After the images have been acquired and the measurements have been performed, initial values of the parameters to be estimated are given to a bundle adjustment (BA) algorithm which is allowed to iterate "until convergence".In addition to the internal parameters (internal orientation -IO), the parameters to estimate usually include the camera positions and orientations (external orientation -EO) and possibly also object point (OP) coordinates.

Main algorithm
The following high-level algorithm was used for the camera calibration: 1. Decide which of the radial K1-K3 and tangential P1-P2 LD parameters that should be included in the calibration.
2. Decide the initial estimate f0 of the focal length.This will be discussed in detail in Section 3..

Use standard values for the other IO parameters:
(a) The principal point was set to the centre of the sensor.
(b) All lens distortion parameters were set to zero.
(c) The pixel aspect ratio was assumed to be unity.
(b) With more then 3 CPs available, choose the 3 triplets covering the largest measured image area.
(c) As the initial values, choose the resection that produces the smallest re-projection error of all measured CPs.
5. Based on the assumed IO and EO parameters, compute the position of any non-control OP with forward intersection.
6. Fine-tune the initial values with a BA algorithm.
7. Analyse the resulting parameter values for high correlations, statistical significance, etc.
Note that the algorithm does not make any additions or removals of LD parameters during the bundle iterations.Instead, the bundle is allowed to converge, and any decision of whether to add or remove any LD parameters for a subsequent run of the main algorithm is decided in step 7 after the convergence.
Throughout this paper, the pixel aspect ratio and skew were kept constant and were not estimated.

Bundle adjustment algorithms
The bundle adjustment (BA) algorithms used in this paper are from the free Damped Bundle Adjustment Toolbox3 , described in Börlin and Grussenmeyer (2013a): GM The classical Gauss-Markov bundle adjustment algorithm.
GNA The Gauss-Newton algorithm with Armijo line search.
LM The original Levenberg-Marquardt algorithm.
LMP The Levenberg-Marquardt algorithm with Powell dogleg.

Data sets
One calibration data set for each of five different camera-lens combinations was generated.The lenses were conventional, i.e. no fish-eye lenses were used.Details about the cameras are given in Table 1.Three data sets used the 2D Photomodeler calibration sheet consisting of an approximately 1-meter-square, 10-by-10 grid of black circular targets, including 4 coded targets used as control points.One data set used a 5-by-6-by-2 meter 3D calibration coded targets indoor test field.The final data set used a 60-by-35-by-20 meter array of control points measured on the INSA Building in Strasbourg (Börlin and Grussenmeyer, 2013b).
Images and the complete camera networks are shown in figures 1 and 2, respectively.
Except in data set 5, the object and control points were measured by the automatic circular target measurement technique in Photomodeler Scanner 2012.Details about each calibration data set are given in Table 2.The measured x, y coordinates and control point coordinates were exported from Photomodeler and imported into the toolbox.
For data set 3, a manufacturing flaw of the calibration sheet was discovered.Thus, the nominal coordinates of the control points could not be trusted.Instead, a minimum datum definition was used (two control points + Z coordinate of third) for the bundle.However, for the resection process in step 4 of the main algorithm, the control points were used as such with their nominal coordinates.
The toolbox does not yet include automatic blunder detection.Instead manual blunder detection was performed based on result and residual plots available in the toolbox.

EXPERIMENTS
For each data set, two experiments were performed to determine the pull-in range of the camera calibration with respect to the initial focal length estimate f0.For the FULL experiment, the full set of lens distortion (LD) parameters K1, K2, K3, P1, and P2 was estimated.The initial focal length f0 = mf * value was successively set to between m = 1/8 and m = 128 times the true value f * .Steps 1-5 of the algorithm in Section 2.1 were used to determine the other initial values.In step 6 of the algorithm, the same initial values were given to each of the four bundle algorithms listed in Section 2.2.The required number of iterations or failure to converge to the true values was recorded for each algorithm.A maximum number of 100 iterations was allowed for convergence.
For the second experiment, a STABLE set of LD parameters was determined for each data set.The posterior statistical properties optimal EO initial EO Figure 3: Initial EO and OP values for f0 = 0.3f * (left) and f0 = 3f * (right) for data set 3 (f * = 20.942mm).Initial (white) and optimal (blue) camera positions are connected by dotted lines.Initial OP coordinates are red, optimal blue.Given the smaller f0 value, the distance to the target is underestimated and the cameras cluster above the centre of the target.Their incorrect position affects the initial OP values as well.Given the larger f0 value, the distance to the target is overestimated and the cameras fan out radially away from the target.In both cases, the orientations of the cameras are approximately correct.
of the LD parameters from the FULL experiment were analysed to determine which parameters were stable and should be part of the second experiment.Parameters with correlation values above 95% were considered unstable and were excluded, as were parameters whose estimated values were not statistically significant.The parameters were tested in the following order: K3, (P1, P2), K2, and finally K1, where (P1, P2) were tested together.If any parameter was removed, the remaining parameters were re-estimated and the process was repeated until only stable parameters remained.The parameter testing procedure was applied once for each data set after which the pull-in investigation described above was executed.
For each data set and experiment, the "true" calibration parameters were estimated by one of the bundle algorithms based on the best available initial values after which consensus on the optimal values was verified by the other algorithms.
The effect of too small or too large initial focal length on the other parameters is illustrated in Figure 3 for data set 3. Given a smaller f0 than the true, i.e. m < 1, the distance to the target is underestimated and the initial EO position are grouped together closer to the centre of the target.The opposite is true for a larger than true f0 with m > 1: The distance to the target is overestimated by approximately a factor of m and the initial camera positions are distributed radially away from the target.In both cases, the orientations of the cameras are approximately correct.

RESULTS
The results of the calibration and parameter selection process are given in Table 3 together with σ0 and computed distortion of each camera.The σ0 for data set 5 is significantly higher than for the other projects, consistent with the use of manual measurements.The other σ0 values based on coded target measurements are similar to those reported by Fraser (2013).Furthermore, the distortion for the compact Olympus camera is more than twice that of the DSLR cameras.Regarding the convergence, the same pattern is present in the results of all data sets and experiments.As a typical example, Figure 5 shows the number of required iterations for the four BA methods on data set 3. In a region near the true focal length, all methods converge within a few iterations.The width of the region vary by method.Near the lower end of the region, around f0 = 1/4f * , the resection process fails, and all methods signal failure.
At the upper end of the region, around f0 = 2f * , the GM and LM algorithms switch abruptly from convergence to failure.In contrast, the GNA and LMP algorithms have a gradual increase in the number of required iterations, roughly following a parabola, and their tolerance to too high initial f0 values is significantly higher than for the GM and LM algorithms.The GM method (a) makes an early over-correction where the cameras end up on the wrong side of the target.After that the GM method never recovers.For the the GNA (b) and LMP (d) algorithms, damping is active during the first iterations and stops the updates from becoming too large.Near the solution, damping is relaxed and no damping is required during the final iterations.The LM algorithm (c) oscillates between damped and undamped and makes an early over-correction of the rotation angles after which is does not recover in the allowed number of iterations.The shown results are typical for m > 2 for all data sets and experiments.
algorithm makes an initial over-correction that results in the cameras appearing on the wrong side of the target.After that, the GM algorithm never recovers.In contrast, the GNA and LMP algorithms follow approximately the same pattern.Initially, their damping schemes are active and stop the updates from becoming too large.Later, the damping is relaxed as they approach the minimum.For cases where the GNA and LMP algorithms did not converge, the trace plots show that they were making progress but did not reach the optimal solution in the allowed number of iterations.The behaviour of the LM algorithm is more complex.It oscillates between damped and undamped steps and makes an early over-correction of the orientation angles from which it does not recover within the allowed number of iterations.
The results of all experiments and data sets show the same pattern.At the lower end, all BA algorithms fail below a certain m due to resection failure.At the upper end, the GM and LM algorithms fail abruptly at roughly the same m, whereas GNA and LMP converges for higher m values, albeit in an increasing number of iterations.Figure 6 shows the pull-in region for each method and experiment, defined as the interval where convergence was achieved in less than 100 iterations.The undamped GM method converged when the initial f0 was within a factor of 2 of the true.The damped GNA and LMP algorithms converged from a wider range of initial focal lengths and have a large tolerance to too high initial values with a pull-in range of at least 32 times f * .The behaviour of the LM algorithm is again more complex.On some experiments, it shows a pull-in range comparable to the undamped GM algorithm.On others, the pull-in range was significantly smaller.On the STABLE experiment on data set 5, the LM algorithm failed to converge even for f0 = f * .
An analysis of the iteration trace revealed that the reason was a combination of poor initial EO values for two cameras combined with the damped-undamped oscillations described above.

Unknown focal length and/or sensor size
In this paper we used the actual sensor size and focal lengths, available either from the EXIF information or from knowledge about the used camera.Our data suggests that an initial value of f0 ≤ 32f * will give convergence, at least for geometrically strong networks with good image coverage.
If the sensor size is unknown, we observe that the choice of unit for the sensor size and focal length is arbitrary, since only their ratios appears in the collinearity equations.Furthermore, said ratios are functions of the angle of view of the camera.Since it is reasonable to assume that the angle of view is restricted to a reasonable range, we suggest that the sensor height is fixed arbitrarily and the other internal parameters are scaled accordingly.
Barring other knowledge, we suggest fixing the sensor height to 24 mm to match the 36-by-24 mm "35 mm" standard film format, in which case the calibrated focal length will be the "35 mm equivalent" focal length of the camera.
For our cameras, the angle of view is 60-90 degrees and the focal length is approximately equal to the sensor height.Based on that and the conclusions from the previous paragraphs, we suggest an f0 of approximately 25 times the sensor height, or 600 mm in "35 mm equivalent" focal length, corresponding to an angle of view of about 4 degrees.Unless the real focal length is very long (angle of view is very small), this should be on the high side of the actual focal length and within the pull-in range of the best damped algorithms.If the angle of view is significantly higher than 90 degrees, a lower f0 should be used.Figure 6: The pull-in range for varying values of m for all data sets and experiments.Gray background corresponds to the FULL experiment, yellow to the STABLE experiment.In general, the undamped GM method (blue) converges when the initial focal length is roughly within a factor of 2 of the true for all experiments and data sets.The damped GNA (green) and LMP (purple) methods converge in less than 100 iterations for m = 1/2 up to at least m = 32.The damped LM (red) behaviour is more complex and is discussed in the text.Data set 4 has the widest pull-in range, data set 5 the narrowest.
If the EXIF focal length is available it could be used.However, using the standard f0 value suggest above instead does handle the situation when the EXIF information is incorrect, something that has happened to the first author using another copy of the zoom lens used in data set 2.

Proposed algorithm
In summary, we propose the following rules of thumb for using the algorithm in Section 2.1 for camera calibration: • If the sensor size is known, use it.Otherwise, assume a sensor height of 24 mm and scale the sensor width to match the assumed (unit or otherwise) pixel aspect ratio.
• If the angle of view is roughly 60 degrees, use 25 times the sensor height as f0.If the angle of view is above 90 degrees, use 10 times the sensor height as f0.
• Use the GNA or LMP algorithms for the bundle.
• Initially, estimate all LD parameter with the bundle.After convergence, exclude any unstable parameters and reestimate until only stable parameters remain.

Discussion
Even though photogrammetric state of the art often include selfcalibration or on-the-job-calibration, off-line camera calibration is still important to investigate the stability of the camera, investigate the metric quality of new cameras, or as initial values for self-calibration (Shortis et al., 1998;Luhmann et al., 2006;Fraser, 2013).
The theory and formulation of radial and de-centring lens distortions from Brown (1971) have been adopted in close range photogrammetry as well as in the field of computer vision.Whereas the photogrammetric literature focus on quality, the importance of strong networks, high redundancy, good initial values and the use of the Gauss-Markov estimation model (McGlone et al., 2004;Luhmann et al., 2006;Fraser, 2013), the computer vision literature focus on simplicity, high level of automation, and the use of the Levenberg-Marquardt method (Zhang, 2000;Hartley and Zisserman, 2003;Snavely et al., 2008;Furukawa and Ponce, 2009).
While several computer vision-based camera calibration software are publicly available, including their source, the computation of statistical quality parameters is generally lacking, especially with respect to correlations between the estimated parameters (Remondino and Fraser, 2006).In contrast, photogrammetric software often provide a detailed statistical analysis, including automatic exclusion of unstable parameters.However, if the bundle fails to converge, the available statistical analysis may be anything between limited and useless.Good photogrammetric software include tools and suggest guidelines, but due to the closedsource nature of photogrammetric software, the detailed information needed to determine the reason for the failure, be it poor camera network configuration, measurement blunders, or poor initial values due to erroneous EXIF information, is often hidden from the end user.
In this paper we investigated a camera calibration algorithm based on the freely available Damped Bundle Adjustment Toolbox for Matlab that required initial values of only one parameter -the focal length.The results show that the damped GNA and LMP bundle algorithms have a wide pull-in range, especially compared to the GM and LM algorithms, even on highly correlated camera parameter sets.This is consistent with the results reported by Börlin and Grussenmeyer (2013a).The ability to converge with correlated parameters, enables a posteriori statistical analysis and exclusion of unstable parameters starting with the full parameter set rather than adding parameter incrementally.
Other techniques for estimating the initial focal length, not investigated in this paper, include the DLT method for 3D targets by Abdel-Aziz and Karara (1971) and the relative orientation techniques by Nistér (2004) and Snavely et al. (2008).These techniques could of course be used as a complement, should our standardised approach fail.
Given its popularity in the computer vision community, the performance of the LM algorithm was surprisingly poor.The reason was largely attributed to an implementation detail of the algorithm in Börlin and Grussenmeyer (2013a); the choice of λ0, the initial and minimum damping value.The λ0 was introduced to ensure bias-free co-variance estimates at the minimum, should the algorithm converge.Advocates for the LM method may argue that the algorithm was unfairly treated and it may certainly be possible to fine-tune the LM parameters to allow the LM method to have a wider pull-in range.However, since neither the GNA nor LMP algorithms had the same problems, this rather reinforces the argument in Börlin and Grussenmeyer (2013a) that unless a biased convergence is acceptable, the λ damping scheme used by LM introduces an implementation detail that may -and in this case, did -cause problems.
The STABLE experiment may be seen as multi-stage process to estimate both which parameters are stable and the actual parameter values, similar to the established multi-stage practice used by many photogrammetric software.The difference is twofold; whereas the established "forward" practice usually starts with few parameters and add new parameters incrementally until no more significant parameters can be added, our "backward" procedure starts with all parameters and removes any statistically insignificant and/or highly correlated parameters until all remaining parameters are stable.A further difference is that in the "forward" procedure, the estimated parameter values at one stage is used as the initial values for the next.In our "backward" strategy, all remaining parameters to be estimated are reset to their original initial values before adjustment.The "backward" strategy was chosen for simplicity, but our main algorithm could also be modified to use the "forward" strategy.
The plotting features of the toolbox, especially the camera network plots illustrated in figures 2-4, where useful to understand the reason for bundle failure.Other plots were helpful for blunder detection.The toolbox did not do any automatic parameter exclusion.However, the computed statistical analysis provided the information necessary to determine which parameters were stable.Furthermore, the availability of the source code simplified the automation of several key tasks.
The fixed unit aspect ratio and manual outlier detection are limitations to this study, as is the relatively small number of cameras and lenses.However, the wide pull-in range reported does indicate that the suggested algorithm should work for a large majority of conventional camera-lens combinations.

CONCLUSIONS
The Gauss-Newton-Armijo (GNA) and Levenberg-Marquardt-Powell (LMP) algorithms of the Damped Bundle Adjustment Toolbox applied to the camera calibration problem have superior pull-in range compared to the classical Gauss-Markov algorithm and the Levenberg-Marquardt method.The GNA and LMP algorithms converged even if the initial focal length estimate was 32 times too large.This was the case even for highly correlated parameter sets, enabling a statistical analysis to remove unstable parameters.
The source of the Damped Bundle Adjustment Toolbox is freely available and includes implementations of the above mentioned algorithms.The toolbox estimate the standard deviations and correlations of the estimated parameters, something only partially available in other free camera calibration toolboxes.The plotting features of the toolbox (see figures 2-4) were especially useful to understand the behaviour of the bundle algorithms during the iterations.

Figure 1 :
Figure 1: (a) The calibration target used in data set 2. The same type of target was used in data sets 1 and 3. (b) The calibration test field used in data set 4. (c) Part of the control point network on the INSA building used in data set 5. 4. Based on the assumed IO parameters, determine initial EO values: (a) Use the 3-point spatial resection algorithm (McGloneet al., 2004, Ch. 11.1.3.4) on the available control points (CPs).
Network 4: 10 images (2 rolled) of the 3D calibration field taken by the Canon EOS 7D with fixed Canon EF20mm lens.Network 5: 13 images (2 rolled) of the control point network on the INSA building taken by the Canon EOS 5D with Canon EF 20mm fixed lens.

Figure 2 :
Figure 2: The camera networks used for the calibrations.Control points are plotted as red triangles, object points as blue dots.The object space unit is meters.

Figure 4
Figure4shows the difference in behaviour of the algorithms for data set 3 at f0 = 3f * .The iteration trace shows that the GM

Figure 4 :
Figure4: Iteration trace of the EO parameters for the four bundle algorithms on data set 3 with f0 = 3f * .The initial EO positions, same for all algorithms, are indicated by red crosses.The EO positions at subsequent iterations are indicated by blue crosses.The GM method (a) makes an early over-correction where the cameras end up on the wrong side of the target.After that the GM method never recovers.For the the GNA (b) and LMP (d) algorithms, damping is active during the first iterations and stops the updates from becoming too large.Near the solution, damping is relaxed and no damping is required during the final iterations.The LM algorithm (c) oscillates between damped and undamped and makes an early over-correction of the rotation angles after which is does not recover in the allowed number of iterations.The shown results are typical for m > 2 for all data sets and experiments.

Figure 5 :
Figure 5: Convergence results for data set 3: The Canon EOS 7D with Sigma lens for the FULL (upper panel) and STABLE (lower panel) experiments.The panels show the number of iterations required for each BA algorithm for varying values of m.A value of 100 indicate failure.Near the true focal length (m = 1), all algorithms converge quickly.Below m = 1/4 the resection fails, so neither bundle works.Above approximately m = 2, the GM (blue) and LM (red) algorithms fail whereas the GNA (green) and LMP (purple) algorithms degrade gracefully and converge up to at least m = 32.

Table 1 :
The cameras and lenses used in the experiments.The listed crop factor is the ratio of the diagonal of the 35mm film format (36-by-24 mm) to the sensor diagonal.The angle of view value is computed across the sensor diagonal.All cameras were focused at infinity.Zoom lenses were set at their smallest setting.The sensor sizes have been retrieved from public sources.The EXIF info shows what focal length and/or sensor width information was available in the images.

Table 2 :
Statistics about the measurements of each data set.The target depth-to-width and distance values are computed with respect to a stationary camera, i.e. as if the calibration object was moving.The reported redundancy value is the number of observation minus the number of unknowns.The average max angle is the average of the maximum intersection angle of the observation rays for each target.In data set 3, the control point coordinates were used by the resection process only.For details, see the text. *

Table 3 :
The calibration results of the FULL experiment (upper half) and STABLE experiment (lower half).The maximum distortion is computed at the sensor corners and is given in percent of the sensor half-diagonal.