ROBUST ESTIMATION IN ROBOT VISION AND PHOTOGRAMMETRY: A NEW MODEL AND ITS APPLICATIONS

: Robust estimation (RE) is a fundamental issue in robot vision and photogrammetry, which is the theoretical basis of geometric model estimation with outliers. However, M-estimations solved by iteratively reweighted least squares (IRLS) are only suitable for cases with low outlier rates ( < 50% ); random sample consensus (RANSAC) can only obtain approximate solutions. In this paper, we propose an accurate and general RE model that uniﬁes various robust costs into a common objective function by introducing a “robustness-control” parameter. It is a superset of typical least-squares, l 1 - l 2 , Cauchy, and Geman-McClure estimates. We introduce a parameter-decreasing strategy into the IRLS to optimize our model, called adaptive IRLS. The adaptive IRLS begins with a least-squares estimate for initialization. Then, the “robustness-control” parameter is decreased along with iterations so that the proposed model acts as different robust loss functions and has different degrees of robustness. We also apply the proposed model in several important tasks of robot vision and photogrammetry, such as line ﬁtting, feature matching, image orientation, and point cloud registration (scan matching). Extensive simulated and real experiments show that the proposed model is robust to more than 80% outliers and preserves the advantages of M-estimations (fast and optimal). Our source code will be made publicly available in https://ljy-rs.github.io/web.


INTRODUCTION
Robust estimation (RE) is an important technique for model fitting, whose goal is to seek a true geometric model from contaminated observations. Almost all geometric problems related to gross errors (outliers) are inseparable from RE since it is the theoretical basis of outlier detection. RE has been widely applied in robot vision and photogrammetry, such as line fitting, feature matching, image orientation, image odometry, bundle adjustment, point cloud registration (scan matching), and simultaneous localization and mapping (SLAM), etc. Various robust estimation methods have been developed, among which random sample consensus (RANSAC) (Fischler, Bolles, 1981) family and M-estimations solved by iteratively reweighted least squares (IRLS) (Holland, Welsch, 1977) are the most widely used methods.
RANSAC is a hypothesize-and-verify technique that alternates between minimal subset sampling and estimated model verification until the stop criterion is reached. The model with maximum concensus set is accepted as the correct one. RANSAC has many variants, e.g., fixed locally optimized RANSAC (FLO-RANSAC) (Lebeda et al., 2012), universal RANSAC (USAC) (Raguram et al., 2012), and marginalizing sample consensus (MAGSAC++) (Barath et al., 2020). Although many efforts have been made, RANSAC-type methods suffer from several limitations caused by the basic framework of hypothesizeand-verify. First, RANSAC-type methods can not obtain optimal solutions. They use minimal subsets instead of all observations for model estimation, which is sensitive to noise for the lack of redundant observations. Second, the number of sampling trials increases exponentially with outlier rate. As * Corresponding author a result, RANSAC becomes slow at high outlier rates (Chin, Suter, 2017). Third, RANSAC-type methods are difficult to be adapted in adjustment systems such as bundle adjustment and pose graph optimization.
M-estimations originated from statistics have been applied in various fields. Huber, l1-l2, Fair, Cauchy, Geman-McClure, Welsch, and Tukey estimators are well-known M-estimators. In addtion, there are many variants, such as GM-estimators, S-estimators, and MM-estimators. More comprehensive surveys can be found in (Huber, 2004). The basic idea of M-like estimators is to use robust functions to penalize outliers and optimizes the cost via the IRLS method. In the optimization, outliers are given small weights (close to 0) while inliers are given large weights (close to 1). Thus, the effect of outliers towards to the total energy is largely discounted. M-estimations overcome some of the shortcomings of RANSAC. First, they can reach globally optimal solutions if good initializations are provided. Second, they are very efficient (1∼2 orders of magnitude faster than RANSAC-type methods). More importantly, they can be easily adapted in an adjustment system for joint optimization. Therefore, M-estimations are the preferred methods in robot vision and photogrammetry, especially for gross error detection in adjustment. Nowadays, many observations are obtained automatically by algorithms, which contain a large fraction of outliers, such as the feature matching and point cloud registration tasks. However, M-estimations can not handle cases with more than 50% outliers (Chin, Suter, 2017). 2016) and weighted q-norm estimator (Li et al., 2020b) further improve the robustness by introducing a lq-norm (0 < q < 1) model. These methods are robust to more than 70% outliers. However, they involve non-convex optimization problems, which decrease their efficiency and make them almost impossible to be used in an adjustment system.
In this paper, we aim to develop an estimator with following properties: fast, optimal, robust to high outlier rates, and suitable for adjustment systems. We propose a new robust model that is a superset of typical least-squares, l1-l2, Cauchy, and Geman-McClure estimates. This model introduces a "robustness-control" parameter to unify various robust costs into a common framework. We propose a parameterdecreasing strategy in the IRLS (called adaptive IRLS), where the "robustness-control" parameter is decreased along with iterations. In adaptive IRLS optimization, the proposed model acts as different robust loss functions and has different degrees of robustness. We use several important tasks of robot vision and photogrammetry, including line fitting, feature matching, image orientation, and point cloud registration, to show the effectiveness and versatility of the proposed model. Extensive simulated and real experiments show its great potentials, i.e, it is robust to more than 80% outliers and inherits all advantages of M-estimations. Our contributions are as follows: • We propose a robust cost function that unifies a set of loss functions with different degrees of robustness. Our cost function can be generalized to different dimensions.
• We propose an adaptive IRLS for optimization, so that the model acts as different robust loss functions.
• We provide several applications in photogrammetry for readers to understand and use the proposed model.

Motivation
Least-squares cost is widely used in robot vision and photogrammetry because of its optimality for Gaussian noise. However, it is very sensitive to outliers. A single outlier can make the solution highly suboptimal, since the Gaussian noise assumption is violated by outliers. As shown in Figure 1, leastsquares cost (the red curve) increases quadratically and gives same weights to all observations. It is not a robust cost. In contrast, typical M-estimation costs (such as Geman-McClure, Tukey, and Welsch) are bounded functions. They give small weights to observations with large residuals while giving large weights to others in the IRLS, so that the observations with large residuals contribute small to the total energy. The problem of these costs is the sensitivity to high outlier rates. If outlier rate is higher than 50%, initial estimated model is largely biased from its ground truth. In this case, inliers may have large residuals and are given very small weights (Observations with residuals lager than 4 are given ≈ 0 weights in this figure). This is so strict that many inliers may not be participated in the optimization at the beginning.
So, if we can bridge the gap between least-squares cost and Mestimation costs, the above-mentioned problem will be largely alleviated. At the beginning, the cost should let all observations participate in optimization like the least-squares cost. Then, the cost gradually increases the robustness and becomes a robust function like the M-estimation costs (The change process is shown by arrows in Figure 2.). Although initial model may be not accurate, true inliers with large residuals still contribute to the total energy. Actually, in each change of the cost function, a small portion of observations with the largest residuals are excluded from optimization (assigned ≈ 0 weights). Generally, these observations are outliers. Therefore, as the cost function changes, many outliers are discarded and the true outlier rates decrease. When the cost becomes a M-estimation like cost, the true outlier rates can be smaller than 50%.

Cost Function
The basic cost function of our proposed robust estimation model is, where r is the residual of an observation; α reflects the robustness of the function, which is called "robustness-control" parameter; and β > 0 is a scale parameter. Our cost yields the classic least-squares loss when α = 2, f (r, 2) = 1 2 r 2 Least-squares cost is very sensitive to gross errors. Even one outlier can lead to an arbitrary bad solution. When α = 1, the Algorithm 1: Robust estimation based on our robust model Output: parameters δ of model F 1 Initialization w (0) = 1, δ = δ (0) , α (0) = 2, β = 10; 2 while ∆δ < ε do 3 Variable update: estimate δ (t) via WLS (Eq. (9)); 4 Weight update: compute weights w (t) via Eq. (7); 7 Compute final δ via WLS with the newest weights; 8 Output δ and select inliers based on residuals.
proposed cost becomes the l1-l2 estimate, In our cost, α = 0 is a singularity, i.e., f (r, 0, β) is undefined. Take the limit as α approaches to 0, function (1) turns into the Cauchy estimate, As known, the l1-l2, Cauchy, and Geman-McClure estimates are robust cost functions. Our proposed cost is a superset of these functions and the shape parameter α controls its robustness. Considering the singularity, our final general cost is, Our cost can also be optimized by the IRLS like M-estimations. In the IRLS, Eq. (6) is reformulated as a weighted least-squares (WLS) problem, i.e., w(r)r 2 . w(r) = ∂ρ ∂r r is a weight function, where ∂ρ ∂r is the derivative of ρ with respect to r. Hence, our weight function used in the IRLS is, is a set of outlier contaminated observations. The geometric relationship between {(xi, yi)} n i=1 can be modeled by a function F, i.e., yi = F(xi, δ) (noise and outliers are not considered), where δ is the parameters of F. The goal is to estimate the parameters δ. This problem can be formulated as our robust model, where ri = r(xi, yi, δ) = yi − F (xi, δ) . We then rewrite this problem as an IRLS problem, where wi = w (ri, α, β). Problem (9) can be optimized by adaptive IRLS, whose details are summarized in Algorithm 1. At each inner iteration t, three main steps are performed: This is a simple WLS problem that can be easily solved.
3. α update: decrease the "robustness-control" parameter α by a step-size τ to change the weight function, i.e., Compared with traditional IRLS, our proposed adaptive IRLS has one more step of α update. Along with iterations, this step changes the cost function and increases robustness. As can be seen, our adaptive IRLS has the same time complexity with traditional IRLS. If the WLS optimization in Variable update costs O(n) time. Then, the complexity of the proposed robust estimation method is O(n).

Line Fitting
The goal of this problem is to fit a 2D straight line from outlier- where δ = (a, b). Then, the residual ri = (yi − axi − b) and the weight function of our model for line fitting is,

Image Feature Matching
Invoking the image feature matching problem, our goal is to eliminate mismatches from initial correspondences {(xi, yi)} n i=1 that are extracted from an image pair (I1, I2), where xi, yi ∈ R 2 . We choose affine transformation to model the relationship between (I1, I2), i.e., estimating an affine model F : R 2 → R 2 that aligns I1 and I2, ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume V-1-2021 XXIV ISPRS Congress (2021 edition) This contribution has been peer-reviewed. The double-blind peer-review was conducted on the basis of the full paper. https: where δ = (A, t), A is a 2 × 2 affine matrix, and t is a 2 × 1 translation vector. In this problem, ri = y i − (Ax i + t) and the weight function used in Algorithm 1 becomes,

Image Orientation
The goal of exterior orientation is to recover the image pose from n contaminated 3D-2D correspondences {(Qi, pi)} n 1 , where {Qi} n 1 are 3D object points and {pi} n 1 are their 2D counterparts on the image. Assume camera internal parameters are provided by matrix K, then, Qi can be projected onto pi by camera projection model,

Point Cloud Registration
In the point cloud registration problem, we aim to estimate a rigid transformation that merges a point cloud pair (H1, H2) in the same coordinate system. Given a set of 3D initial correspondences {(Xi, Yi)} n 1 that is extracted from (H1, H2), the model F for registration problem is, where δ = (R, T ), R is a 3D rotation matrix, and T is a 3×1 translation vector. Then, the weight function for point set registration is, Note that if we add a scale factor s in Eq. (18), the problem becomes an absolute orientation problem.

EXPERIMENTS AND EVALUATIONS
This section evaluates the proposed robust model on both simulated and real experiments. We compare our method with Cauchy M-estimator, Welsch M-estimator, RANSAC, FLO-RANSAC, Q-norm estimator, and MAGSAC++ on line fitting, feature matching, image orientation, and point set registration tasks. We use root-mean-square error (RMSE), success rate, and running time for quantitative evaluations. Table 1 summarizes parameter and implementation information of each method. All experiments are performed on a laptop with CPU Core i7-8550U @ 1.8GHz, and 8 GB of RAM.

Parameter τ Study
Parameter τ is important for the proposed model. We conduct a simulated feature matching experiment to study the sensitivity to τ . The simulation details are as follows: Given the number of observations n and an outlier rate η, the size of inliers n1 is computed by n1 = n * (1 − η). We first randomly generate n1 2D feature points X1 = {xi} n 1 1 by using a normal distribution N (0, 500 2 ). Then, we obtain their correspondences Y1 = {yi} n 1 1 via transformation y t i = A t xi + t t , where A t = sx cos θ sx sin θ −sy sin θ sy cos θ and t t = tx ty . The rotation angle θ ∈ (− π 2 , π 2 ), anisotropic scales sx, sy ∈ (0.5, 1.5), and translations tx, ty ∈ (−500, 500) are all randomly generated. We also add Gaussian noise N (0, 2 2 ) to Y1 to make the simulation realistic, obtaining an inlier set H1 = (X1, Y1). Finally, two sets of points with the same size n2 = n − n1 are generated based on a normal distribution N (0, 500 2 ) to obtain the outlier set H2 = (X2, Y2). The inlier set and outlier set are merged to obtain the outlier-contaminated feature correspondences {(xi, yi)} n i=1 . In our experiment, we set n1 = 1000 and r = {10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%}. For each (n1, η) configuration, we perform 100 independent tests and report the average results. In a test, if the RMSE of a method is smaller than triple of the noise level (σ = 2 in this experiment), this estimation is successful. Thus, the definition of success rate is the ratio of successful estimation times in 100 independent tests. Figure 3 shows the results of the proposed method with different τ . If the outlier rate is lower than 80%, our model is not sensitive to τ . It achieves impressive performance when τ is within [0.05 0.5], i.e., the success rate are always close to 100% (Figure 3(a)). However, larger τ results in worse performance when τ > 80%. We also report the required number of iterations. As can be seen in Figure 3(b), smaller τ requires more iterations for convergence and costs more time. Thus, we make a trade-off and fix τ = 0.2 for all following experiments.

Line Fitting Experiment
The simulation process is similar to the one in Section 4.1. The differences are that the observations are 2D points {(xi, yi)} n 1 and the transformation is yi = a t xi + b t , where the slope angle is generated within arctan(a t ) ∈ (− π 2 , π 2 ) and the offset is within b t ∈ (−100, 100). The standard deviation of Gaussian noise is σ = 1. The results are summarized in Figure 4.
As shown, M-estimators (Cauchy and Welsch) work well at low outlier rates (r ≤ 50%). However, their performances dramatically drop if the outlier rate r is larger than 50%. Thus, their success rate curves show a "cliff-like drop". Q-norm estimator performs much better than M-estimators. It is robust against 70% outliers. Compared with abovementioned estimators, RANSAC-type methods (RANSAC, FLO-RANSAC, and MAGSAC++) and our method can deal with 90% outliers. Their success rates are always 100%. RANSAC and FLO-RANSAC have slightly lower estimation accuracies (larger RMSE) than MAGSAC++ and our method, since they only estimate an approximate solution. Although MAGSAC++ is also a RANSAC variant, it uses multiple strategies such as a Mestimation stage to refine the model obtained by the RANSAClike stage. Our method is much faster than MAGSAC++, e.g., 60+ times faster at an outlier rate of 90%, even though MAG-SAC++ is implemented in C++.

Feature Matching Experiment
Simulation: The simulation process is the same as the one in Section 4.1. Figure 5 reports the results. Again, M-estimators breakdown at an outlier rate of 50%. The reason is that Mestimators use median absolute deviations (MAD) for residual scale estimation. The median operator only considers half of the observations and gets inaccurate residual scale when the outlier rate is higher than 50%. In this task, our method is still very robust although the performance slightly decreases when outlier rate reaches 90%. Compared with RANSAC-type methods, our main advantages are two-fold: First, our model is much more efficient. For example, our method is about 90 times faster than MAGSAC++ at an outlier rate of 80%. Second, our method is suitable for residual adjustment, since adaptive IRLS is almost the same as traditional IRLS. As known, M-estimators solved by IRLS are used in bundle adjustment for outlier removal. Our method can also be extended for multi-image feature matching problem, which will be our future work.
Real data: Six real multimodal image pairs are used for evaluation. We apply radiation-variation insensitive feature transform (RIFT) (Li et al., 2020a) algorithm to extract initial feature correspondences. The lowest outlier rate η of these image pairs is 69.8% and the highest one is 96.5% due to severe nonlinear radiation differences. Figure 6 shows our matching results. The RMSEs are smaller than 2 pixels. The average running time is only 4 ms while MAGSAC++ costs 176 ms.

Image Orientation Experiment
Simulation: Given camera internal parameter matrix K, we first randomly generate n 3D image space points {Q c i } n 1 inside where the rotation matrix R t and translation vector T t are randomly generated. Then, the object space points are projected onto the image via di p t i 1 T = KQ c i , where di is the depth and p t i is the image correspondence of Qi. The Gaussian noise with 1-pixel deviation is added to p t i n 1 . Finally, n2 image points are selected to add errors (outliers), obtaining observed image points {pi} n 1 . {(Qi, pi)} n 1 are the contaminated 3D-2D correspondences. We use Gauss-Newton method in the WLS optimization to solve the nonlinear image orientation problem. It requires initialization for rotation and translation parameters. We obtain initial rotation angles by adding a random disturbance between [-10°, 10°] to the ground truth and generate initial translations between [70% × T , 130% × T ]. For fairness, this initialization is adapted for all compared methods. The image orientation results are displayed in Figure 7. With appropriate initializations, our method achieves 100% success rate at an outlier rate of 90% and performs better than RANSAC-type methods in terms of RMSE.
Real data: We use two aerial images (see Figure 8) for evaluation, which are obtained over the Pingdingshan City, Henan, China. The left image is captured by the central camera of SWDC system with a focal length of 12102.1 pixels and a size of 5406×7160 pixels and the right one is acquired by an oblique camera with a focal length of 14671.5 pixels and a size of 7160×5406 pixels. We measured 24 (contains 12 inliers) and 75 (contains 15 inliers) 3D-2D correspondences for Figure  8(a) and Figure 8( Figure 10. Our 3D matching results on four LIDAR scan pairs. n is the number of initial 3D correspondences and η is the outlier rate of initial correspondences. Each line represents a 3D correspondence. parameters [ϕ, ω, κ] and three location parameters [Xs, Ys, Zs]) and results are reported in Table 2. As shown, our results are very close to the ones obtained by the commercial software PATB with only inlier correspondences. Although the outlier rate of the second image is 80%, the proposed method still achieves an accuracy of 1.05 pixels.

Point Cloud Registration Experiment
Simulation: The simulation process is similar to the one in Section 4.1. The 3D points are generated by a normal distribution N (0, 100 2 ). The rotation angles are randomly generated within (− π 2 , − π 2 ) and translations are generated within (−100, 100). The noise level is 0.1m. Figure 9 provides the comparison results. Again, our method and MAGSAC++ achieve the highest accuracies, including success rate and RMSE. Our method is much faster than MAGSAC++. We note that the efficiency of MAGSAC++ becomes slow when the estimated model is complicated, such as in this task and image orientation task.
Real data: Four LIDAR scan pairs are collected from the ETH dataset 1 (ground truth transformation of each scan pair is provided) for evaluation. The resolution of these point clouds is downsampled to 0.1m. We use ISS (Zhong, 2009) feature detector and FPFH (Rusu et al., 2009) descriptor to establish initial 3D correspondences. 3D feature matching is much more difficult than its 2D counterpart. Therefore, the outlier rates of the initial 3D correspondences are very high (> 96% in these scan pairs). We first use an edge voting strategy (Li et al., 2020b) to filter some outliers before applying the proposed method for rigid model estimation. The 3D matching results are shown in Figure 10. The registration accuracy of our method is better than 0.2m (twice of the resolution).
We also show the potential of the proposed method in 3D laser SLAM task. We use three sequences of KITTI dataset 2 (with ground truth) for evaluation. Sequence 5 consists of 2761 laser scans; sequence 6 and 7 contain 1101 laser scans. For each sequence, we regard every 20 consecutive scans as a group. First, each group is registered by our method to obtain a submap. Then, we register consecutive submaps based on our method to get the whole scene map. We use the translational error (localization error) of KITTI as the evaluation metric. The reconstructed 3D scene maps and the estimated trajectories of the vehicle are displayed in Figure 11. The localization accuracy of our laser odometry is better than 0.5% on these three laser sequences.

Limitations
Our method relies on an assumption that outliers are randomly distributed or approximately uniform distributed. If this assumption is violated, our method may fail. Thus, our method is not suitable for cases with multiple geometric models.

CONCLUSIONS
We proposes a general objective function by introducing a "robustness-control" parameter. It is a superset of typical leastsquares, l1−l2, Cauchy, and Geman-McClure costs. We present an adaptive IRLS for optimization. It uses different weight functions in iterations and has different degrees of robustness. We also provide the applications to several important tasks of robot vision and photogrammetry. Extensive simulated and real experiments demonstrate that our model is robust to more than 80% outliers and is much faster than RANSAC-type methods. We will integrate our robust model in visual and laser SLAM systems in the future.