A HYBRID GLOBAL IMAGE ORIENTATION METHOD FOR SIMULTANEOUSLY ESTIMATING GLOBAL ROTATIONS AND GLOBAL TRANSLATIONS

In recent years, the determination of global image orientation, i.e. global SfM, has gained a lot of attentions from researchers, mainly due to its time efficiency. Most of the global methods take relative rotations and translations as input for a two-step strategy comprised of global rotation averaging and global translation averaging. This paper by contrast presents a hybrid approach that aims to solve global rotations and translations simultaneously, but hierarchically. We first extract an optimal minimum cover connected image triplet set (OMCTS) which includes all available images with a minimum number of triplets, all of them with the three related relative orientations being compatible to each other. For non-collinear triplets in the OMCTS, we introduce some basic characterizations of the corresponding essential matrices and solve for the image pose parameters by averaging the constrained essential matrices. For the collinear triplets, on the other hand, the image pose parameters are estimated by relative orientation using the depth of object points from individual local spatial intersection. Finally, all image orientations are estimated in a common coordinate frame by traversing every solved triplet using a similarity transformation. We show results of our method on different benchmarks and demonstrate the performance and capability of the proposed approach by comparing with other global SfM methods.


INTRODUCTION
Image orientation (also known as Structure-from-Motion -SfM or pose estimation) plays a key role in the field of photogrammetry and computer vision. Although this topic has been very well studied in the last several decades, it recently again caught the interest of photogrammetrists due to the increasing number of images (e.g., images shared through websites) and images taken without proper acquisition planning. Today, according to the procedure in which images are oriented, there are typically three different strategies to solve this problem: incremental, hierarchical and global methods. Incremental SfM (Snavely et al., 2006;Agarwal et al., 2009;Schönberger and Frahm, 2016;Wu, 2013;Wang et al, 2018 and starts with an initial subset of images, e.g., initializing a small reconstruction, and iteratively adds further images to the block with repetitive intermediate bundle adjustment . Farenzena, et al. (2009), Mayer (2014 and Toldo, et al. (2015) present a so called hierarchical method, which improves the incremental idea by first dividing the images into overlapping subsets, and then processing all subsets individually by incremental SfM, finally merging them in a hierarchical way with a number of bundle adjustments. Both of these strategies are relatively slow because of the repeated use of bundle adjustments. To overcome this problem, Martinec & Pajdla (2007), Arie-Nachimson et al (2012), Jiang et al. (2013), Moulon et al. (2013), and Wang et al. (2019a and) present global SfM methods which first estimates all available image pose parameters, and then perform only one final bundle adjustment for refinement. All above mentioned global methods have a common limitation: they work in two individual steps. Only after image rotations have been solved, the translation parameters can be estimated. This can create problems, if rotations are incorrectly estimated. In addition, many researchers (Wilson and Snavely, 2014;Shah et al, 2018) also found that global methods are sensitive to outliers of relative orientations, since outliers are difficult to detect in global computations. We are most interested in those time efficient strategies, and thus present a novel hybrid global image orientation approach. To improve the time efficiency and robustness, among the overlapping image pairs and their corresponding relative orientations we first extract an optimal minimum cover connected triplet set (OMCTS) such that it not only includes all available images with a minimum number of triplets, but also makes the corresponding relative orientations within extracted triplets as compatible as possible. Then, we apply a hybrid method by considering non-collinear and collinear triplets separately, where collinear means the three image projection centres are collinear (see Fig. 1 for the workflow of our method). For the non-collinear triplets, we make use of algebraic constraints of the corresponding essential matrices and derive eligible essential matrices, subsequently image pose parameters are computed by essential matrix averaging. For collinear triplets, this method is invalid, thus their image pose parameters are recovered by using relative orientations with the depth of object points from individual local spatial intersection. As the estimated image pose parameters are given in local coordinate systems of each individual triplet, we need to transform them into a global unified system. We do so by traversing image poses of each individual triplet using similarity transformations. Finally, we run one single bundle adjustment to refine our results.
Our main contributions are threefold: First, we present an idea to build an optimal minimum cover connected triplet set that can combine both, collinear and non-collinear triplets. Second, we introduce a hybrid method to solve the image pose parameters for both non-collinear and collinear triplets separately, from which global pose parameters in a unified coordinate system are estimated. Finally, via testing various settings for solving noncollinear triplets, a very reasonable set is suggested, and our hybrid method is then evaluated by comparing the results with other global SfM methods using various benchmarks.

RELATED WORK
Recently, research of image pose estimation or SfM has become very active, since more and more complicated datasets have become available, e.g. images downloaded from the Internet and images with repetitive structures and critical configurations (Wang et al., 2019c). In this section we review some state-of-theart works in global SfM research. Typically, global SfM methods are conducted in two separate steps, global rotation estimation and the subsequent global translation estimation. However, there are also some works, called integrative global methods, which estimate global rotations and translations simultaneously.
Global rotation estimation. The problem of global rotation estimation from relative rotations of image pairs have been studied by many researchers. Govindu (2001) use quaternions to represent rotations, global quaternions are determined by a constrained least squares optimization. Martinec and Pajdla (2007), Arie-Nachimson et al (2012) and Moulon et al. (2013) first relax the constraints on rotation parameters and present a linear homogeneous equation system, which is solved using SVD (singular value decomposition). Hartley et al. (2011) present a robust iterative method using L1 norm optimization based on the Lie algebra of SO(3). Chatterjee and Govindu (2013) present a two-stage approach: they first calculate initial global rotation using a minimum spanning tree and then refine the solution with an iterative reweighting scheme combining the Lie algebra of SO(3). Reich et al. (2016Reich et al. ( , 2017) present a method which extends the approach of Chatterjee and Govindu (2013), they studied the algebraic characterization of relative rotations in multi-image settings and apply a convex relaxed semidefinite program to obtain a more robust initial solution which is further refined by using Lie algebra of SO(3).

Global translation estimation.
Unlike global rotation parameters, global translation parameters cannot be directly estimated, since the baseline of an image pair has an arbitrary length. Nevertheless, many methods have been studied for global translation estimation. Govindu (2001) present an iterative reweighting scheme to obtain global scale unified translation vectors; however, this method is invalid in degenerate cases, e.g., when projection centres of images are (nearly) collinear. Jiang et al. (2013) present a solution which can solve degenerate cases by using depth information of tie points; a global linear equation system is built by concatenating connected triplets. Since the triplets are required to be well connected, this method normally recovers fewer images. It was extended by Cui et al. (2015) and Wang et al. (2019b), they first solve for the global scale factor for each eligible relative translation and then resize all relative translations such that they are all in the same global scale unified system, and finally global translations are estimated by using those resized relative translations. Wilson and Snavely (2014) propose a method called 1DSfM, to robustify their result, they first detected blunders of relative translations by projecting the 3D relative translation into different 1D direction vectors. Typically, the blunders clearly stand out in some directions of the 1D vectors. Then, a non-linear method based on inliers of relative translations and tie points is proposed, this non-linear method is not guaranteed to converge when outliers exist. By using collinearity equations and the information of tie points, Wang et al. (2019a) propose a linear global method. Given the global rotation and tie point information, they first selected some robust tie points that can connect all available images into the same photogrammetric block. Then, the translation parameters and selected 3D tie point coordinates are solved simultaneously. But, as the number of images increases, so does the number of unknown tie points, which brings much more computational burden for the linear global method.
Integrative global method. Recently, ideas were published to avoid having to compute rotation and translation separately. Bourmaud et al. (2014) derive the image pose parameters as a Lie group SE(3), they propose a generative model based on the formulation of a concentrated Gaussian distribution on the matrix Lie group and solve an iterated extended Kalman filter on that group to compute the elements of SE(3). Kasten et al. (2019a) propose a method to globally recover the projection matrix of each image by using fundamental matrices of image pairs. However, as the projection matrix yields a projective reconstruction, information on interior orientation parameters cannot be introduced. Later, the authors extended their work. Exploring the algebraic characterizations of essential matrices, they introduced a method to simultaneously solve for rotation and translation of each image from essential matrices (Kasten et al., 2019b). The disadvantage is that this method cannot deal with projection centres that are all (nearly) collinear.
The remainder of this paper is structured as follows: In Section 3 we introduce some basics of essential matrices in multi-image settings. Section 4 describes our method of estimating image pose parameters by using the information of triplets. In Section 5, we report results of experiments on various benchmarks to evaluate our method. Finally, Section 6 concludes our work.

THE N-IMAGE ESSENTIAL MATRIX
Following partial content of Kasten et al. (2019b) to make this paper more self-contained, we next give some definitions and corollaries with respect to the so called N-Image essential matrix.
Given a set of n images which are denoted as 1, 2, 3,…, n, let t i ϵ ℝ 3 and R i ϵ SO(3) be the translation and rotation parameters of image i in a global coordinate system. The essential matrix of two images i and j can be derived as

Definition 1.
A matrix E ϵ Sym3n (Sym3n denotes the space of all the 3n×3n symmetric matrices), whose 3×3 block matrices are denoted by Eij, is called a N-Image essential matrix if ∀ i≠j, rank(Eij)=2, and the corresponding two eigenvalues are equal, ∀ Eii=0, where 0 denotes the corresponding zero matrix.
A scale consistent N-image essential matrix thus is a matrix representation of the orientation parameters of the N images, coding them in a similar way that the essential matrix does for two images, so that rays of conjugate points intersect.
Corollary 2. Again following Kasten et al. (2019b), it is possible to determinate all image rotation matrices {Ri}i=1, ..., n, projection centres {ti}i=1, ..., n (in a global coordinate system) and n non-zero scalars{αi}i=1, ..., n, given a scale consistent N-Image essential matrix E, where the camera projection centres are not all collinear, in the following way.
1. Do spectral decomposition of E and obtain the eigenvectors A, B of E together with the corresponding eigenvalues to be found in Σ + and Σ − . SVD decomposition is not used, because a standard SVD method has multiplicity of singular values on E with the corresponding rank being equal to 6 and typically sorts the singular values in a descending order, which doesn't produce the specific SVD form as corollary 1 explains.
2. There are in total eight possibilities of √0.5( + ) with ) , because of the sign ambiguity of each eigenvector which can be solved by equation (6), see below.
4. Eij=̂Σ +̂+̂Σ+̂, as Eii = 0 we see that ̂Σ +̂ is skew symmetric; we can derive the projection centre [ti]×= ̂-1̂Σ + . The N-Image essential matrix can thus be regarded as a tool to estimate rotations and translations simultaneously from pairwise essential matrices. However, three practical difficulties exist: First, we can't compute every essential matrix for each pair, because many image pairs do not overlap; second, calculated essential matrices are typically normalized, e.g., when employing the 5-Point algorithm (Nistér, 2004), thus it is very difficult to guarantee for a N-Image essential matrix to be scale consistent if N>3, because the non-zero scalars cannot be set arbitrarily; third, the case that all projection centres are (or nearly are) collinear does exist in many applications, e.g., images captured by mobile mapping car moving along a straight line or aerial images within one strip.

METHODOLOGY
To solve these three practical difficulties, we investigate triplets instead of larger sets of images, which overcome the first two points and then present a hybrid method to separately deal with collinear and non-collinear triplets to avoid the third difficulty.
We first introduce corollary 3.
Corollary 3. Given a non-collinear triplet, the corresponding scale consistent 3-Image essential matrix is invariant to scales (see our proof in the appendix).

Generation of an optimal minimum cover connected image triplet set
We use three images with mutual overlap and extract all related triplets, a corresponding triplet graph is then built as Fig. 1 shows: triplets denote nodes and two triplets are connected to each other, if they share two common images. An optimal subset of these triplets is selected for better time efficiency and robustness. We select such a subset called optimal minimum cover connected image triplet set (OMCTS) with the following requirements: 1) the selected triplets cover all available images and the three relative orientations should be as compatible as possible; 2) triplets from the selected subset are connected, which guarantees that the photogrammetric block will not break; 3) the minimum number of triplets that fulfil the above two requirements is selected.
To identify the compatibility of each triplet, similar to Wang et al. (2019b) and Kasten et al. (2019a), we compute two triplet closure discrepancies with respect to relative rotations and translations, respectively. Given three relative rotations of a triplet, Rij, Rjk and Rki, RijRjkRki = I3×3 should hold. However, this is not strictly the case because of outliers and noise in relative rotations. We can use ∠ ( , 3×3 ) = arccos (( ( − 3×3 ) − 1)/2) as one indicator of the triplet compatibility. The discrepancy in relative translation can be calculated from the difference of the sum of the angles formed by the three projection centres within a triplet and 180°, i.e., ∠ ( , 180°) = |θi + θj + θk -180°| with θi = arccos || |||| || and ||.|| the L2 norm (see Wang et al., 2019b for details). Based on these two criteria, the triplet compatibility indicator is formulated as max ( ∠ ( , 3×3 ) , ∠ ( , 180°) ). Finally, we employ a greedy triplet deleting scheme: starting with the triplet with the largest indicator, a triplet is deleted as long as the remaining triplets are still connected and no image is deleted from the photogrammetric block (see Appendix for more details on generating the OMCTS), note that we introduce our triplet selection process in a less sophisticated way and a more grounded graph theory based explanation is given by Shah et al. (2018).
The collinearity degree of a triplet is determined by the minimal angle among θi, θj and θk. From the triplets selected for the OMCTS, the ones with that minimal angle larger than a threshold θang are considered to be non-collinear, the others are considered collinear.

Solving image pose for non-collinear triplets
Based on Kasten et al. (2019b), this section focuses on noncollinear triplets which are denoted as { } =1 , K is the number of detected non-collinear triplets and nc is the nc-th non-collinear triplet, the corresponding 3-Image essential matrix is denoted as { } =1 , the elements of { } =1 are the unknowns. As input, we have corresponding estimated essential matrices ̆( e.g., using the 5-point algorithm) for each overlapping image pair, and they can be transformed into estimated 3-Image essential matrices denoted as {̆} =1 . Our goal is to first seek a scale consistent 3-Image essential matrix that is as close as possible to the estimated 3-Image essential matrix for all non-collinear triplets and then estimate exterior pose parameters within each non-collinear triplet by using corollary 2. The constrained problem can be formulated as  (1) is not easy due to the non-convex rank defect and block rotation constraints. The the alternating direction method of multipliers (ADMM, Boyd et al., 2011) is used to solve equation (1) iteratively; we can generate an equivalent constrained optimization problem.
(2) subject to rank ( ) = 6; Σ + ( ) = −Σ − ( ); rank ( ) = 6; and are auxiliary matrices for constraints of rank defect and block rotation, respectively. and are two Lagrange multipliers. Initializations are given at sp = 0 (sp denotes the number of iterations) as We then solve (1) iteratively by alternating between the following steps: This is a convex quadratic optimization problem and can be solved using equation (3) subject to rank ( ) = 6; ( ) + ( ) is a block rotation.
Similar to equation (4), the initial guess of would be − −1 , which may violate the extra constraints. To obtain an eligible solution, we do a spectral decomposition for the initial guess. Σ + and Σ − are eigenvalues, A and B are corresponding eigenvectors, A, B ϵ ℝ 9×9 . Now, the requirement is that , Ai and Bi is the corresponding block matrix of A and B. This is also applied in corollary 2.
Let ̈= [̈1̈2̈3] , where ̈ is the closest scaled rotation of √0.5 ( + * ), which is obtained by first computing a SVD of √0.5 ( + * ) and replacing the diagonal matrix of singular values by an 3×3 identity matrix, the average of original singular values is the scale factor. Let ̈= √0.5 ( − * ), we update A and B by ̈ = √0.5 ( ̈+ ̈) and ̈ = √0.5 ( ̈− ̈) , finally (d) Computing and = −1 + - In our experiments, we set ∆ 1 = 100 and ∆ 2 = 0.01 to weight rank defect and block rotation constraints, respectively, and repeat the above four steps 100 times (more interpretations related to settings of ∆ 1 , ∆ 2 and sp are discussed in our experimental section below) and we obtain the scale consistent 3-Image essential matrix. Rotation and translation of each image within one noncollinear triplet are then estimated using corollary 2. Different to Kasten et al. (2019b), where the authors deleted all (nearly) collinear triplets, both non-collinear and collinear triplets are considered in this paper. To deal with collinear triplets, we choose one image as reference and use the information of relative rotations and translations to estimate the exterior orientation parameters of the other two images. Global rotations within one triplet are straightforward to compute: we assign an identity matrix to one image and obtain the other two rotations by propagating the relative rotations. However, global translations within one triplet are not that easy to compute, because the length of relative translations are typically normalized to 1 when decomposing the essential matrix, and this will normally lead to scale ambiguity as Fig. 2 shows. The projection centres C1, C2 and C3 of images {1, 2, 3} are collinear, which generates a collinear triplet. P12 and P13 represent the same object point, but have different positions after triangulation due to the different scales of the two models. Fig. 2

Solving image pose parameters from collinear triplets
where |.| returns length, 12 and 13 are the corresponding Z values (as object points are always in front of cameras, the Z value is guaranteed to be larger than 0). Each three-ray point contributes one λ, we use the idea of Wang et al. (2019b) to obtain a robust solution ̇. Given ̇ and the relative rotations R12, R13 and relative translation t12, t13, we obtain a triplet of scale consistent exterior orientation parameters by formula (11) with the assumption that image pair (i1,i2) has most correspondences within the image pairs of the triplet (note that 23 and 23 are not used in this solution to reduce the computational complexity, and we assume that the relative orientations within the selected compatible triplets can be considered to be accurate after having checked them before). 1 = 3×3 1 = 2 = 12 2 = 12 3 = 13 , 3 = ̇· 13 (11) For all detected collinear triplets, equation (11) is used to obtain rotation and translation of each image.

Estimating all images pose parameters from triplets
We have now estimated the exterior orientation parameters (three rotations and translations per image) within all triplets, whether collinear or non-collinear, which are uniquely determined up to a similarity transformation. For any two connected triplets which share two common images, there is a possibility to compute a unique similarity transformation between these two triplets by using the two corresponding common image pose parameters calculated from individual triplet (Hartley and Zisserman, 2004). Since a minimum cover connected image triplet set has already been generated and the corresponding pose parameters within the triplets are available, the extracted connected triplets can be traversed and similarity transformations between all connected triplets can be applied to transform all exterior orientation parameters into a common coordinate system (see the Appendix for more details of calculating the similarity transformation between two connected triplets).

EXPERIMENTS
To evaluate our method, we implemented the proposed global hybrid image orientation method as the workflow in Fig. 1 shows. We set the free parameter θang to be 0.17 (in radian) for all experiments 1 . The experiments are first conducted on four terrestrial close range datasets, one of them is a public dataset with 128 images around a building (Zach, et al. 2010) which consists of both (nearly) collinear and non-collinear images. The other three test data are benchmark datasets published by Strecha et al. (2008) which are made up of 11 to 30 images. Each of these three datasets is provided with ground truth exterior orientation parameters, which are used for comparison. Finally, we further explore our method by dealing with one set of oblique quasiaerial images from an open public photogrammetric contest 2 (Özdemir et al., 2019). The bundle adjustment of Wang et al. (2019b) integrated with the open source Ceres-solver (Agarwal et al., 2017) is applied for refining the results.

Analyzing various settings of ∆ , ∆ and sp
To inspect the influence of ∆ 1 , ∆ 2 and sp on solving equation (1), we first investigate the rank constraints (i.e., rank( ) = 6) on castle-P30 by calculating the logarithm of the mean ratio between the 7-th and 6-th singular values 10 ( 7 / 6 ) of all triplets in { } =1 for different settings of ∆ 1 , ∆ 2 and sp. In general, a reliable solution of a 3-Image scale consistent essential matrix from equation (1) can generate a very small value for 10 ( 7 / 6 ). The results shown in Fig. 3 indicate that in our experiment 10 ( 7 / 6 ) decreases as the iteration process runs 1 https://github.com/wx7531774. and it starts to become stable at the 80-th iteration. The case of ∆ 1 > ∆ 2 normally generates much smaller values for 10 ( 7 / 6 ) than that of ∆ 1 ≤ ∆ 2 does. Also, the larger the ratio of ∆ 1 / ∆ 2 is, the smaller 10 ( 7 / 6 ) becomes in general, because only if the rank constraint is fulfilled, can the spectral decomposition be processed for the block rotation constraint. So, we typically set a high weight ∆ 1 . However, as Fig. 3 shows, we can't conclude that an infinitely large ∆ 1 is best, because this will lead to the constraint that √0.5( ( ) + ( )) is a block rotation matrix contributing nothing to equation (1). Thus, it is possible that the estimated rotation matrix is not an element of SO(3). To demonstrate this, we test different values of ∆ 1 by fixing ∆ 2 =0.01 and sp=100. The Frobenius norm between the estimated rotation matrix and its closest element in SO (3) is computed for each image denoted as ∆ , then the logarithm for the largest ∆ is computed, the result is shown in Fig. 4. As can be seen, the estimated rotation matrix tend to be further away from SO(3) as ∆ 1 increases. Based on this evaluation and to obtain a reliable and accurate solution for equation (1), we set ∆ 1 = 100, ∆ 2 = 0.01 and sp=100 in our all experiments.

Building dataset
Our hybrid method classifies the triplets of the OMCTS into collinear and non-collinear ones and processes them separately. To show that this strategy is superior to the idea of considering all detected triplets as either non-collinear or collinear, we conduct experiments on the building dataset using three corresponding pipelines: hybrid, all non-collinear and all collinear (they are indicated by "HM", "ANC" and "AC", respectively, henceforth). As this dataset does not have ground truth exterior orientation and Wang et al. (2019b) was demonstrated to provide a reliable result for it, we use the exterior orientation from Wang et al. (2019b) as reference. AN AC HM Reference Figure 5. Motion trajectories of Building of different pipelines. Fig. 5 shows the set of projection centres using the different pipelines (without refinement of bundle adjustment). The green ellipses denote drifts; the larger the ellipse, the bigger the drift, this in other words implies that "ANC" produces the worse result. The reason is that some triplets of the dataset are (nearly) collinear, which violates the non-collinear constraint described in corollary 1, thus, the corresponding estimated 3-Image essential matrix is not scale consistent. "AC" performs better than "ANC". We find that the method described in section 4.3 can actually also be used for non-collinear triplets. However, errors stemming from inaccurate Z values of object points in equation (10) can accumulate in the process of traversing all connected triplets as described in section 4.4, and this can lead to the drift depicted in Fig. 5. "HM" generates the best result, the detected non-collinear triplets satisfy the non-collinearity constraint of "ANC" and the remaining collinear triplets (less triplets compared to "AC") show less error accumulation. In addition, the method for solving collinear triplets only use two necessary relative orientations, which is not be as robust as solving non-collinear triplets using all the relative orientations. Thus, among these three pipelines, based on the result presented our hybrid method is the best one to deal with datasets consisted of both non-collinear and collinear images.

Three benchmark datasets with ground truth
We also inspected three benchmark datasets with ground truth of exterior pose parameters, namely, fountain-P11, Herz-Jesu-P25 and castle-P30 (Strecha et al., 2008). The interior orientation parameters are extracted from the EXIF information. Similar to the Building dataset, we ran the three pipelines ("ANC", "AC" and "HM") for these three benchmarks. Besides, we further compared our results to the results of several recent global rotation and translation estimation methods.
ANC AC HM (c) Castle-P30 Figure 6. Motion trajectory of three benchmarks with different pipelines, red triangles denote the results computed from corresponding pipelines and blue triangles indicate the ground truth exterior parameters. Fig. 6 shows the results for the exterior orientation parameters of these three benchmarks by using the corresponding different pipelines (without bundle adjustment), where the blue triangles represent ground truth and the red triangles indicate the estimated exterior pose parameters (the estimated exterior pose parameters are transformed into the coordinate system of ground truth using the 3D similarity transformation method presented in Wang et al. (2019b)). From Fig. 6, we find that all three pipelines work very well on fountain-P11and Herz-Jesu-P25, as the blue and red triangles are very close to each other and some almost overlap. However, results of castle-P30 look different, a similar phenomenon as described above for the building benchmark: "AC" is better than "ANC", and the proposed method "HM" is the best. This can be explained by the fact that the images of fountain-P11and Herz-Jesu-P25 are all almost non-collinear and the relative orientations are already rather accurate, so error accumulation is not a major problem, thus, all three pipelines perform very well. However, Castle-P30 is closer to building in that it has both collinear and non-collinear triplets, and outliers of relative orientations exist due to repetitive structures.
Visualizations of the results in Fig. 5 and Fig. 6 are provided for a qualitative comparison of the different pipelines. To generate a numerical analysis, based on the three benchmarks with ground truth we calculate the mean rotation error denoted as mean angle error and the mean translation error which are both listed in Tab. 1. From this table, it can be inferred that the exterior orientation parameters (rotation and translation) estimated by "ANC", "AC" and "HM" achieve nearly the same accuracy on fountain-P11 and Herz-Jesu-P25, respectively. The result of castle-P30 shows a very explicit superiority of "HM": the angle and translation error of our hybrid method are approximately 15 to 20 and 5 to 10 times smaller than those of "ANC" and "AC", respectively. What Tab. 1 implies is consistent with Fig. 6, thus, we can conclude that both "ANC" and "AC" can perform very well on small datasets with very few collinear triplets such as fountain-P11and Herz-Jesu-P25 (as Fig. 6 (a) and (b) illustrate), whereas, for the castle-P30 dataset with not only more images but also both, collinear and non-collinear images (see Fig.6 (c)), "ANC" results are invalid due to the non-collinearity constraint requirement, and the performance of "AC" also decreases because error accumulation increases, when more connected triplets are traversed. As in the first test, "HM" provides the best solution for the problem at hand. A visualization of image orientation and sparse 3D object point result after bundle adjustment is shown in Fig. 7. Figure 7. Visualization of benchmarks' SfM results by "HM" (after bundle adjustment). Colourful triangles denote exterior pose parameters; red dots are estimated 3D object points.
To obtain a deeper understanding of the performance of "HM", we compare rotation and translation results of "HM" with those of several global rotation estimation and global translation estimation methods, respectively. Tab. 2 presents numerical results for the mean rotation and translation errors of different methods. Before bundle adjustment, "HM" outperforms all the other methods listed in Tab.2, specifically, the mean angle errors and mean translation error of "HM" are the smallest on all these three benchmark datasets (except for the translation error of castle-P30, where Wang et al. (2019b) is 2 millimetres better than "HM" which is negligible). This is probably a consequence of the fact that we only use some optimal triplets in the extracted OMCTS, the selection acts as a kind of blunder detection method for the relative orientations, whereas the other methods typically employ more redundant relative orientations, and may thus be negatively influenced by relative orientations not spotted as outliers. After bundle adjustment, both rotation and translation accuracies are improved on all benchmark datasets, and remaining differences are negligible. Table 1. Mean angle error R in degree and mean translation error T in meter for different pipelines. We highlight the best results of each dataset. Table 2. Mean angle error R in degree and mean translation error T in meter for different global estimation methods. We compared our rotation results with Chatterjee and Govindu (2013) (Global_R), Reich and Heipke (2016) (1) and Jiang et al. (2015) (2). GR_L2 adopts the "Global_R" method with L2 norm, their corresponding results are provided by Wang et al. (2018). The translation results are compared with Reich and Heipke (2016) (1), Jiang et al. (2015) (2), Wang et al. (2019b) (3) and Wang et al. (2019a) using L1 and L2 norm denoted as Global_T and GT_L2, respectively. Note that the results of (1), (2) and Wang et al. (2019a) are directly cited from the corresponding papers, and we reimplemented the approach of Wang et al. (2019b). The best results of each dataset are highlighted. Figure. 8 Overall view of the simulated urban scenario. Table. 3 Precision assessment. RMS(x), RMS(y) and RMS are the RMS (root mean square) of reprojection residuals (in pixels) in image space in horizontal direction, vertical direction and Euclidean residual. Table. 4 Accuracy assessment in 10 -1 mm. CH1, CH2 and CH3 are the corresponding check bars showed in Fig. 8.

Experiments on oblique aerial image dataset
To further explore the capability of our method, we test another dataset of oblique quasi-aerial images (Özdemir et al., 2019). This dataset includes a set of 420 nadir and oblique images (6016×4016 pixels each) captured in a controlled environment over an ad-hoc 3D test field which simulates a typical urban scenario, as shown in Fig. 8. Three evaluation criteria are proposed to assess the image orientation results: 1. Precision assessment, the reprojection residuals of 115 targets (red crosses in Fig. 8) are used to evaluate the precision of orientation results in image space; 2. Accuracy assessment, three control bars (shown as blue lines in Fig. 8) and three check bars (showed as yellow lines in Fig. 8) with known length are provided to evaluate the accuracy of the orientation results; 3. Relative accuracy assessment, the errors of translation and rotation are evaluated by taking the provided exterior pose parameters as a reference. More information is provided by Özdemir et al. (2019).  Table. 5 Relative accuracy assessment. Taking   is indicated as (III), "-" means the corresponding items are not available. Before bundle adjustment, among these methods we find that "HM" typically generates the best results (as the highlighted items show in these three tables) with only small discrepancies; this finding is basically identical with the results of castle-P30 shown in Tab. 2. "ANC" and "AC" were also tested, however, "ANC" failed when solving equation (1) due to the collinearity of projection centres, see. Fig. 9, and "AC" is also not reliable as we have explained in the last section. The rotation error of (II) and (III) are identical because both of them apply the same method of Chatterjee and Govindu (2013) to estimate global rotations. After bundle adjustment, "HM", (II) and (III) achieve nearly the same precision. (I) is not included in Tab. 5, because the exterior orientation parameters of (I) are the reference for the relative accuracy assessments. The results after bundle adjustment have been published in a contest of image orientation 3 . Fig. 9 is the visualization of image orientation and 3D object points using "HM" from two different perspectives. Figure. 9 Visualization of SfM results by using "HM".

CONCLUSIONS
In this paper, we present a novel hybrid global image orientation method which can solve global rotation and translation simultaneously. Specifically, an optimal minimum cover connected triplet set (OMCTS) is extracted, among which noncollinear and collinear triplets are first solved individually and global exterior pose parameters are then estimated by traversing all these solved connected triplets. Comparisons with several recent global SfM methods on different benchmarks demonstrate that our method can normally provide the best initial estimation of exterior orientation parameters for bundle adjustment. In the future, we will test larger and more interesting datasets, such as images downloaded from Internet (Wilson and Snavely, 2014), as these images are normally unordered which can create additional challenges for extracting the OMCTS. Also, the comparisons before applying final bundle adjustment and the time efficiency of the proposed hybrid method needs to be further investigated.

APPENDIX
1. Corollary 3. Given a non-collinear triplet, the corresponding scaled consistent 3-Image essential matrix is invariant to scales.
2) N ≥ 4, we assume N = 4, similar to N =3. We can also set up equations similar to PM when N = 3 and the corresponding matrix PM = and rank (PM) = 4, there is an infinite number of solutions and we cannot obtain a unique closed form solution like the case of N = 3, this is also true when N > 4.
Hence, the new 3-Image essential matrix ̅ is also scale consistent, which means that a scale consistent 3-Image essential matrix is invariant to scalars.

Algorithm for generating the minimum cover connected image triplet set
Input Original exhaustive triplet set, each triplet's quality indicator, corresponding set of images = {1, 2, 3, … , }.
Output Optimal minimum cover connected image triplet set 1. Build a triplet graph Gτ = {τ, ε }, where τ is the original exhaustive triplet set denoted as nodes and ε are the edges between triplets (two triplets are connected only if they share two common images). 2. Sort all triplets by their quality indicators in descending order, obtain corresponding triplet index set Ind. 3. Start with the triplet of largest quality indicators: Do { Remove τIndj and its corresponding edges from Gτ, then, check that: a. The remaining Gτ is connected; b. The images' number of remaining Gτ doesn't reduced.
If both a and b fulfil, Gτ is successfully reduced by removing the corresponding triplet τIndj, otherwise, we keep Gτ unchanged and try the next iteration by considering j=j+1; }while ( j = {1,2,3,…size of (Ind)}) Finally, the triplets which exist in the remaining Gτ consist of the triplet set that we desire. Figure 10. Two connected triplets.