ON THE QUALITY AND EFFICIENCY OF APPROXIMATE SOLUTIONS TO BUNDLE ADJUSTMENT WITH EPIPOLAR AND TRIFOCAL CONSTRAINTS

Bundle adjustment is a central part of most visual SLAM and Structure from Motion systems and thus a relevant component of UAVs equipped with cameras. This paper makes two contributions to bundle adjustment. First, we present a novel approach which exploits trifocal constraints, i.e., constraints resulting from corresponding points observed in three camera images, which allows to estimate the camera pose parameters without 3D point estimation. Second, we analyze the quality loss compared to the optimal bundle adjustment solution when applying different types of approximations to the constrained optimization problem to increase efficiency. We implemented and thoroughly evaluated our approach using a UAV performing mapping tasks in outdoor environments. Our results indicate that the complexity of the constraint bundle adjustment can be decreased without loosing too much accuracy.


INTRODUCTION
Precise models of the environment are needed for several robotic applications and are central to several UAV-based services. Most SLAM and visual mapping systems use a form of bundle adjustment (BA) for simultaneously refining camera pose parameters and 3D point coordinates. Thus, effectively solving the BA or the underlying error minimization problem is essential for many approaches such as structure from motion (Agarwal et al., 2011) and online SLAM or visual odometry. BA has favorable properties: it is statistically optimal in case all statistical properties are modeled and considered correctly, it is efficient in case sparse matrix operations are used, and can be parallelized. A broad review is given by Triggs et al. (2000). Steffen et al. (2010) have shown that rigorous BA can be formulated based only on epipolar and trifocal constraints. The epipolar constraint is a relation between two camera views, that enforces an image point to be on the epipolar line described by an corresponding image point in another image and the essential or fundamental matrix between the views (Hartley and Zisserman, 2004). A trifocal constraint between image points is necessary if the corresponding scene point lies on the trifocal plane, which practically always is true for neighbored images in an image sequence, where projection centers are collinear or nearly collinear. Epipolar and trifocal constraints lead to implicit functions that enforce the intersection of bundle of rays in 3D space without explicitly representing 3D point coordinates. This reduces the number of unknown parameters of the underlying optimization problem to the camera pose parameters. The obtained normal equations are equivalent to the normal equation system of classical BA when applying the Schur Complement to eliminate the unknown 3D point coordinates. Its solution is therefore in statistical terms as optimal as classical BA.
BA based only on epipolar and trifocal constraints has several advantages over classical BA: • it allows to integrate image points, whose projections rays have small parallactic angles, which in classical bundle adjustments would lead to 3D points lying numerically at infinity. Including such observations increases the generality of BA and improves the estimated rotations of the camera pose parameters (Schneider et al., 2012); • it leads directly to the normal equation system reduced to the camera pose parameters. Classical BA needs to apply the Schur Complement to eliminate the 3D points; • it does not require an initial guess for the locations of the 3D points, which is required for classical BA; • it allows to arrive at approximate solutions, e.g., by neglect correlations between multi-view constraints and the relinearization of observations, which significantly increases the efficiency without a substantial loss in accuracy.
Nevertheless, the trifocal BA formulation without simplifying approximations leads to higher computational complexity than classical BA. Because of the implicit epipolar and trifocal constraints one needs to employ the Gauss-Helmert Model for optimization, which requires the costly determination of corrections for all observations, which is not needed in the Gauss-Markov Model employed for classical BA. To substantially reduce the computational complexity Indelman et al. (2012) propose to neglect (1) correlations between the constraints and (2) corrections to the observations during optimization and (3) fix the weights for the individual constraints after the first iteration.
In this paper we investigate (1) the gain in efficiency and (2) the loose of quality of several assumptions, which lead to an approximate solution of BA which substantially reduces computational complexity. Additionally we propose a new formulation for trifocal constraints which can be employed in BA without structure estimation. Contrary to formulations of trifocal constraints in previous work it does not degenerate in specific situations.

RELATED WORK
Sparse BA is most efficient in case a sparse representation is used (Hartley and Zisserman, 2004). The publicly available software package SBA for generic sparse bundle adjustment by Lourakis and Argyros (2009) is used for example in a modified version in Bundler (Agarwal et al., 2011) to solve large-scale structure from motion. Konolige (2010) introduced Sparse SBA (sSBA) which exploits the sparse secondary structure of sparse camera to camera relations, which increases computational and memory efficiency. More recently, the popular software package g2o (Kümmerle et al., 2011) shows a comparable efficiency as sSBA but uses a more generic formulation of the optimization problem using factor graphs.
BA without structure estimation has been proposed by Rodríguez et al. (2011), but their approach relies only on epipolar constraints, which are not able to transfer a consistent scale between cameras having parallel epipolar planes which occur on straight camera trajectories. Thus Steffen et al. (2010) propose to use epipolar and trifocal constraints in BA without structure estimation. But their trifocal constraints can not be computed in a closed form expression which is why a stable condition needs to be sampled, where the number of samples is not fixed. Indelman et al. (2012) propose simplifying approximations to the optimization problem by rewriting the implicit trifocal and epipolar constraints into explicit expressions. This way the authors obtained a pose graph formulation, which can be optimized with the computational efficient incremental smoothing and mapping (iSAM) algorithm by Kaess et al. (2012). But their approach can not handle all possible camera configurations.

CLASSICAL BUNDLE ADJUSTMENT
The general objective of BA is to optimally estimate camera rotations Rt, camera positions Zt and 3D point coordinates Xi simultaneously. In the following, we assume that each observed 2D image point xit in view t is associated to a certain 3D point i and that the intrinsic camera calibration is given by calibration matrix Kt. Given an initial guess, i.e., knowing approximate quantities R a t , Z a t and X a i , the reprojection with projection matrix Pt = Kt R aT t I3 | − Z a t yields the homogeneous image point With the three rows P1,t, P2,t and P3,t of Pt we obtain the reprojected image point in Euclidean coordinates x a it = P1,t X a i /P3,t X a i , P2,t X a i /P3,t X a i T and the reprojection error vit = x a it −xit in the image plane. Assuming the image points to be corrupted with mutually uncorrelated Gaussian noise Σx it x it , maximum likelihood estimates are obtained by iteratively improving the unknown parameters by minimizing the squared Ma- x it x it vit using the Gauss-Markov model, see (Förstner and Wrobel, 2016, Sect. 4.4).
With T camera poses and I observed 3D points, the total number of unknown parameters to be optimized counts 6T + 3I. If one is only interested in estimating the camera poses, the normal equation system can be reduced to the 6T pose parameters by applying the Schur complement. However, as BA needs to be solved iteratively due to its non-linearity one is forced to compute the 3D point in each iteration, even if they are not of interest. In contrast to that, we can directly obtain the reduced normal equation system without applying the Schur complement or determining 3D points by employing epipolar and trifocal constraints, which are introduced in the next section.

EPIPOLAR AND TRIFOCAL CONSTRAINTS
The classical solution to BA seeks to minimize the reprojection error of corresponding points. Alternatively, we can formulate an error minimization problem that exploits constraints from the epipolar geometry of image pairs as well as constraints that result from observing the same point from three different camera images, so-called trifocal constraints. This formulation has the advantage, that only the camera extrinsics are unknown parameters and we do not need to estimate the 3D point parameters in the optimization. After describing how to formulate such constraints in this section, we explain in Sec. 5 how to consider them in BA.
Given three corresponding image points (x1,x2,x3) in three views t = 1, 2, 3, we need to formulate three independent constraints (g1, g2, g3) for each correspondence related to a 3D point. Two-view epipolar constraints do not allow to transfer a consistent scale given straight trajectories with collinear projection centers (Rodríguez et al., 2011), which usually appear in image sequences. We always use one trifocal and two epipolar constraints, which are simpler and one trifocal constraint is sufficient.
The first two constraints are epipolar constraints and enforce the camera rays to be on their epipolar lines w.r.t. the first camera where S(·) is the skew symmetric matrix of the input vector. Note that we assume here, that the the camera calibration is given, such that we can convert an image point x into a ray direction x, e.g.in case of a pinhole camera with calibration matrix K by The third constraint enforces the intersection of all ray directions in a single point, which we formulate in the following way. Consider two planes A2 and A3 that go along the ray direction x2 and x3 and are projected as 2D lines l2 and l3 such that we have A2 = P T 2 l2 and A3 = P T 3 l3, see Figure 1. The intersection of both planes in 3D yields the 3D line and along the ray direction x1 has to intersect L23 in a single point. Using homogeneous coor-dinates this leads to the constraint where D is the dualizing matrix.
We need to guarantee that L1 intersects L23 in one single point.
In order to achieve a numerically stable constraint we choose the two lines l2 and l3 and specify their direction v2 and v3 to be perpendicular to the epipolar lines in the second and third image, such that we have When using the constraint g3 in an estimation procedure, the vectors v2 and v3 can be treated as fixed entries.
These constraints work for all points if they are not close to an epipole, which would also not work in a classical BA. Then at least two projection planes are nearly parallel and the intersecting line is numerically unstable or, in case of observational noise, inaccurate. This especially holds for forward motion, for which image points close to the focus of expansion, i.e. the epipole, cannot be handled.
If none of the image points are close to the epipoles, then, following Figure 1, the two projection planes A2 and A3 of the second image intersect the projection line L1 of the first image in well defined points and are not parallel, thus have a well defined intersection line L23, which therefore needs to pass the projection line L1. Hence, the triplet constraint never has a singularity and fixes the image points x2 and x3 perpendicular to the epipolar lines w.r.t. the first image used in Eq. (2) and (3).

Indelman (2012) uses the trifocal constraint
but this formulation degenerates in case the epipolar plane normals n12 and n23 of first and second camera and second and third camera are perpendicular. The constraint projects normal direction n12 given with different lengths in Eq. (9) by S(R2x2)R1x1 and S(R1x2)(Z2 −Z1) on normal direction n23 given with different lengths by S(R3x3)(Z3 − Z2) and S(R3x3)R2x2. In case of perpendicular normal directions, the constraint would be fulfilled under multiple solutions.
So far we have only considered three-view correspondences. In case of correspondences in less than three view we can only apply the epipolar constraint (2). In case of Ni > 3 correspondences, we need to avoid to use the same constraints twice, and use only independent constraints between the different views. Each corresponding image observation contributes with two constraints, therefore the total number of constraints between corresponding views counts (2Ni − 3). Each correspondence needs to be involved in at least one epipolar and one trifocal constraint. Indelman et al. incorporate an new image of an image sequence by formulating epipolar constraints between the last two recent images and trifocal constraints between the last three recent images.
As in a classical BA the estimation of the poses of calibrated cameras is only possible up to a similarity transformation. To overcome the 7 DOF ambiguity of the overall translation, rotation and scale, we define either the gauge by imposing seven centroid constraints on the approximate values of the projection centers. This results in a free BA, where the trace of the covariance matrix of estimated camera poses is minimal. Or we estimate the camera poses relative to one camera, which fixes six DOF. To define the overall scale, we constrain two cameras to have a certain distance to each other, see (Förstner and Wrobel, 2016, Chapt. 4.5).

TRIFOCAL BUNDLE ADJUSTMENT
We sketch the maximum likelihood estimation with implicit functions, also called estimation with the Gauss-Helmert model, see (Förstner and Wrobel, 2016, Chapt. 4.8) and relate it to the classical regression model, also called Gauss-Markov model. This is the basis of four variants for simplifications. These lead to approximations which are then compared with the statistically optimal ones w.r.t. accuracy and speed of convergence.

Gauss-Helmert Model
The Gauss-Helmert model starts from G constraints, g = [gg], among the N observations l = [ln], which are assumed to be a sample of a multivariate Gaussian distribution N (IE(l), σ 2 0 Σ a ll ), and U unknown parameters x = [xu]: We assume the covariance matrix of the observations is approximately Σ a ll , hence we assume σ0 = 1; we will be able to estimate this factor later. Given observations l = [ln] there are no parameters x for which g(l, x) = 0 holds. Therefore the goal is to find corrections v of the observations and best estimates x such that the constraints between the fitted observations l = l + v and the estimated parameters x hold and the weighted sum of the squared residuals is minimum.

Solution in the Gauss-Helmert model
The solution is iterative. Starting from approximate values l a and x a for the fitted observations l and the estimated parameters x we determine corrections ∆l and ∆x to iteratively update the fitted observations and the unknown parameters Each iteration solves for the corrections ∆l and ∆x with the linearized substitute constraints Observe, due to we introduce the covariance matrix of the constraints which we assume is regular, thus has the weight matrix of the constraints W gg as its inverse.
We can determine the corrections in two steps. First, the corrections ∆x are determined from the linear equation system . (17) Second, we determine the corrections ∆l from From Eq. (12) we can determine the estimated variance factor and the weighted sum of the residuals Eq. (12) evaluated at the estimated values 5.1.3 On the Structure of the Weight Matrix W gg If each constraint gj depends on one observational group lj, whose observations are not part of another constraint g j , the covariance matrix Σgg is diagonal. The same holds, if the constraints can be partitioned into groups g i which only depend on one observational group li; then the covariance matrix Σgg is block diagonal.
If the corresponding Jacobians for each group are A T i and B T i , the matrix N of the normal equation system can be expressed as a sum over all constraints: and accordingly If a group of constraints g i shares observations, as in our case, the covariance matrix Σg i g i = B T i Σ l i l i Bi will not be diagonal or block diagonal any more. Then their inverse, i.e. weight matrix, will be full in general.
For example, in BA, all Ni observations referring to the same scene point Xi will have a sparse but not diagonal covariance matrix Σg i g i , hence a full weight matrix of size Gi×Gi. Let us consider three epipolar constraints g = 1, 3, 5 and two trifocal constraints g = 2, 4 between the image points of four consecutive images, xit, t = 1, 2, 3, 4. Then the structure of B T i for this group of cameras will be as follows and the covariance matrix will be Σg 1 g 1 Σg 1 g 2 Σg 1 g 3 0 0 Σg 2 g 1 Σg 2 g 2 Σg 2 g 3 Σg 2 g 4 Σg 2 g 5 Σg 3 g 1 Σg 3 g 2 Σg 3 g 3 Σg 3 g 4 Σg 3 g 5 0 Σg 4 g 2 Σg 4 g 3 Σg 4 g 4 Σg 4 g 5 0 Σg 5 g 2 Σg 5 g 3 Σg 5 g 4 Σg 5 g 5 with Σg j g k = B T j Σ ll B k . The covariance matrix has in general a full inverse. Hence, matrix N reads as The effort of inverting the generally sparse matrix Σg i g i can be significantly reduced, if the matrix product F i = W g i g i Ai is determined by solving the (generally sparse) equation system Σg i g i F i = Ai for F i.

Solution in the Gauss-Markov Model
The solution for the estimated parameters can also be obtained from a Gauss-Markov model when substituting into Eq. (14). Using Eq. (15) and cg from Eq. (17) we immediately obtain the linearized Gauss-Markov model which leads to the same estimates as in Eq. (17).
The iterative solution of this model, however, has to take the linearization point for the Jacobians A and B into account, which are the fitted observations l and the the estimated parameters x. Hence, the result of estimation in the Gauss-Markov model only is the same, if we in each iteration step determine ∆l via Eq. (18) to obtain the fitted original observations l via Eq. (13). This is possible, but requires access to the Jacobian B. Then there is no difference between the Gauss-Markov and the Gauss-Helmert model. In addition, we need the inverse of the covariance matrix Σgg, which in general will not be a diagonal block matrix with small blocks referring to groups of two or three constraints.
These are reasons to investigate approximate solutions, which can be expected to be computationally more efficient.

Approximations of the Optimal Model
We address four cases of simplifications of the original estimation model. All are approximations of the original model and lead to suboptimal results.

CASE A: Approximated Jacobians
The Jacobians A and B are approximated, by linearizing at the original observations l, instead of at the fitted observations l. The approximation will increase if the standard deviations of the observations increases, or if there are outliers in the observations. The suboptimality of this approximation has already been discussed in (Stark and Mikhail, 1973).

CASE B: Approximated Weights of the Constraints
The matrix W gg is approximated by neglecting the correlations between the constraints. Hence we use the inverse of the diagonalized covariance matrix, with the rows bg of B. This significantly reduces the effort for determining W gg . For CASE B we assume the Jacobians A and B are taken at the estimated parameters and the estimated observations. This can only be realized within the Gauss-Helmert model, since otherwise the estimated observations l are not available.

CASE C: Approximated Jacobians and Weights for the Constraints
We approximate both the Jacobians, by linearizing at the given observations, and the weight matrix, by neglecting the correlations between the constraints. This approximation is useful when applying the Gauss-Markov model.

CASE D: Approximated Weight Matrix W gg of the First Iteration
We approximate the weight matrix W gg by that obtained in the first iteration. This reduces the computational burden in the further iterations.
The weight matrix W gg then depends on the approximate values for the observations and the parameters. Since the approximate values for the parameters usually deviate more from the estimated parameters, than the observations deviate from their fitted values, the degree of approximating the weight matrix by the one of the first iteration mainly depends on the quality of the approximate values for the parameters.
This type of approximation may refer to the full weight matrix or to its diagonal version, as in CASE C. Here, we assume the constraints are treated as uncorrelated. Then we arrive at the same iteration scheme as Indelman et al. (2012).
We will investigate the effect of these four approximations onto the result as a function of the noise level, namely the assumed variance σ 2 0l of the observations, and the variance σ 2 0x of the approximate values x a . In order to be able to make the two standard deviations comparable, we assume they describe relative uncertainties with unit 1, for angles units radians. The standard deviation σ 0l is the directional uncertainty σ l /c, where σ l is the standard deviation of the image coordinates and c the focal length.

Generating Approximate Values with a Specified Relative Precision
We perform tests with simulated data by taking the final estimates of real datasets as true values and artificially generate noisy observations and noisy approximate values. In this section we describe how to generate approximate values for the rotation matrices Rt and the positions Zt of the projection centers.
The relative precision of directions or angles is easily specified by their standard deviation measured in radians. Hence if we prespecify the relative precision of the approximate values with σ0x, e.g. σ0x = 0.01 = 1 %, we just need to deteriorate the rotation axes and rotation angles by zero-mean noise with standard deviation σα = σ0x.
The relative precision of the coordinates of a set of camera positions, say Zt, is less clear. We propose to use the standard deviation of the direction vectors D tt = (Zt − Z t )/d tt , with the distance d tt = |Zt − Z t | between two neighbouring points Zt and Z t ; here we assume isotropic uncertainty. Furthermore we take relative standard deviation of the distance d tt between neighbouring points, i.e. σr tt := σ d tt /d tt as measure.
There is no obvious way to generate a set of points such that the average relative standard deviation of a given point set fulfills this measure. The following approximation appears sufficient for the experiments. We assume the true values of the camera positions are given by Zt. We distort them by taking them as approximate values. Then we generate disturbing observations, namely the coordinate differences D tt with a covariance matrix of ID(D tt ) = σr d tt I3, with σr = σ0x. We only use pairs (tt ) ∈ T from a Delaunay triangulation. Since coordinate differences alone do not allow to estimate the coordinates, we fix the gauge by requiring the sum of all estimated coordinates to be zero. Therefore we have the following linear Gauss-Markov model with constraints Minimizing tt |D tt | 2 under the constraint leads to estimates Zt. Due to the estimation process, their relative standard deviations will generally be smaller than σrd tt , for all t in the neighbourhood of t. Hence, we need to increase the distance of the points Zt from the true values adequately: By taking the average relative variance we can adapt all coordinates by and thus achieve σr = σr = σ0x.

Evaluating the Results of the Approximations
As quality measure we use the differences between estimated pose parameters xCASE obtained using the the approximation of a certain case and the true values x. Due to the freedom of choosing seven gauge parameters for BA, we can only compare U = 6T −7 parameters, if we have T unknown poses.
In order to illustrate the loss in accuracy we will employ the rootmean-square error (RMSE) of the of deviations of the coordinates and of the rotation angles We will give the deviation of the RMSECASE of each case from the RMSE0 obtained with the rigorous estimation. In addition to the deviation of the RMSE we will report the loss in accuracy for each case compared to the ideal solution. Observe, these measures do not take the inhomogeneous precision of the estimates into account and depend on the chosen gauge.
Therefore we also provide the squared normalized Mahalanobis distance with the covariance matrix Σ x x of the parameters obtained using the rigorous estimation. The squared normalized Mahalanobis distance F is a test statistic which follows a Fisher distribution F (U, ∞), if the model is statistically optimal, which is the zero hypothesis H0. It is a sufficient test statistic and does not depend on the chosen gauge. The expected value for F is 1, the one-sided confidence interval for a significance level S is [0, F (U, ∞, S)]; we use S = 0.99 in the following.
In addition to the Fisher test statistic F we also give the loss in accuracy related to the Fisher test statistic Hence, we assume the loss in accuracy is induced due to a bias x b,CASE caused by the approximation, so that xCASE = x+ x b,CASE .

EXPERIMENTS
Our experimental evaluation is designed to investigate the accuracy decrease of BA when applying the individual approximations proposed in Sec. 5.2, which are meant to increase efficiency. We illustrate the loss in accuracy as a function of the noise level, namely the standard deviation of observations, and on the relative precision of camera poses. We evaluate the individual simplifications on two image sequences recorded on different UAVs.
The first image sequence BUILDING contains 119 images taken with a 5 MPixel camera with a focal length of c = 1587.87 pixel on a 5 kg UAV platform, triggered each second. The flight was guiding the UAV along the facade of a house, the variation in position is around 60 m and 15 m in height, see Figure 2.
The second image sequence FIELD contains 24 images taken with a 12 MPixel camera of the DJI Panthom 4 with a focal length of c = 2347.1 pixel. The camera was pointing downwards while the copter was flying a meandering pattern at 100 m height with three stripes, each stripe consists of eight images, see Figure 3. The images have an front and sidelap of 80 %, this way the 24 images cover an area of 100×90 m 2 .
For both datasets we match interest points in the images to obtain corresponding image points. In order to determine the simplification effects, we need ground truth for camera poses and corresponding image points, to incorporate deteriorations under controlled conditions. We use the observed image coordinates as input for the BA software BACS (Schneider et al., 2012) to obtain estimated pose parameters and fitted image points for two realistic UAV flight scenarios, which are consistent and are used as ground truth.

Checking the Rigorous Reference Solution
First we check how the rigorous trifocal BA reacts on different noise level σ 0l of observed image points on the BUILDING dataset. The estimated variance factor σ 2 0 , see Eq. (19), needs to become one, in case the noise level σ 0l used to deteriorate the image points is used also in the covariance matrix Σ a ll in Eq. (10). We follow Sec. 5.3 to generate deteriorated approximate values by using a moderate relative precision of σ0x = 0.001. We deteriorate the observations on each noise level 100 times with different random noise and apply the trifocal BA. Figure 4 shows the mean of the obtained standard deviation σ0 of estimated variance factors using different noise levels σ 0l to disturb the observed image points. Having high noise we observe, that σ0 deviates from one, as then second order effects, which are neglected in the estimation procedure of BA, become visible. The effects are negligibly small and within the tolerance bounds [0.9943, 1.0058] of  the fisher test using a significance level of 1 %. For the following evaluation of the accuracy decrease of the individual approximations we will use a maximum noise level of σ 0l = 0.003. for observations ∆ln and parameters ∆xu are small compared to their standard deviation, | ∆ln/σ 0l | < Tc, | ∆xu/σx u | < Tc, with a threshold Tc = 0.001, thus requiring the corrections to be less than 0.1 % of their standard deviation. This requires σx u to be known, which is for this experiment derived in each iteration from the inverse normal equation matrix.

Effects of Approximations
We now experimentally evaluate the decrease of accuracy when applying the individual approximations for BA proposed in Sec. 5.2 on the two image sequences recorded by UAVs.
We add normal distributed noise to the true observation values with different magnitudes to obtain different noise levels. We use a moderate relative precision of σ0x = 0.001 to deteriorate the approximate values of the camera poses. After that we optimize the pose parameters with the rigorous estimation, which is called CASE 0 in the following, and with the approximations of CASE A-D. With the estimated and true pose parameters we can determine the root mean square error of the estimated camera positions and rotations according to Eq. (33) and Eq. (34). For each noise level we randomly generate 100 times different noise for the observations and determine the RMSE for each case. In both datasets the approximation made in CASE A induces the smallest deviations to the optimally estimated coordinates and rotations in both datasets, the deviations induced by the approximation of CASE B are almost twice as big. CASE C, which contains the approximations of CASE A and CASE B, shows slightly higher deviations than CASE B, thus is mainly affected by the approximations of CASE B. CASE D shows smaller deviations than CASE C, even though it contains an additional approximation. The reason could be the high relative precision σ0x used in this experiment. The convergence of CASE D is affected by the relative precision as the weight matrix W gg is fixed after the first iteration, while CASE A-C are not affected. Thus we will investigate the decrease in precision of CASE D by varying σ0x in a further experiment.   The RMSE does not take the inhomogeneous uncertainty of the estimated positions and rotations of all images into account and depends on the chosen gauge. Thus we use the squared normalized Mahalanobis distance given in Eq. (37) which considers the covariance information of the parameters, which are obtained by the rigorous estimation. Table 1 lists the mean loss in accuracy ∆FCASE of the estimated pose parameters in percent when applying the individual approximation cases under different noise levels σ 0l . The loss of accuracy ∆FCASE is obtained with Eq. (38). The obtained values are similar to the values obtained by using the root mean square error.
CASE A, CASE B and therefore also CASE C are mainly affected by the noise level σ 0l , whereas CASE D is affected from both, noise level σ 0l and the relative precision σ0x of the approximate values. Therefore we also investigate the average decrease in precision by varying σ0x from a moderate relative precision of 0.1 % to an inferior relative precision of 10 % to deteriorate the approximate values of the camera poses. We use a moderate noise level  of σ 0l = 0.001 for the observations by adding normal distributed noise to the true observation values with different magnitudes to obtain different noise levels. Note that the estimated parameters converge differently when applying randomly generated approximate values. Thus, we randomly generate 100 times different noise for the observations and approximate values and determine the mean loss of accuracy ∆FCASE D for each relative precision level σ0x. Table 2 shows the obtained ∆FCASE, which increases with the relative precision σ0x of the approximate values for the camera poses.
In both experiments the decrease in accuracy is less than a 1/3 of noise variance. This is a moderate loss, and -if computing time is essential -may be accepted.

CONCLUSION
In this paper, we presented an approach to bundle adjustment without structure estimation by employing epipolar and trifocal constraints between corresponding image points. We introduced a novel closed-form expression for the trifocal constraint, which does not degenerate at certain configurations. The proposed bundle adjustment is as optimal as classical bundle adjustment, but leads to more computational complexity as the Gauss-Helmert model needs to be employed for optimization.
We evaluated the quality decrease of simplifying approximations which allow to employ the Gauss-Markov model to increase the  Table 2. The loss in accuracy ∆FCASE D in percent of estimated pose parameters at noise level σ 0l = 0.0001 and different relative precision σ0x of approximate values, both in radian. computational efficiency on two datasets acquired by UAVs. The empirically investigated loss in accuracy of the estimated camera pose parameters are shown to be small in case of small noise in the observations. In spite of this favorable result w.r.t. the investigated approximations, the effect of the approximations onto outlier detection, which relies on the variances of the residuals needs to be investigated, in order to identify the loss in the power of outlier detection methods.