Leveraging Dynamic Objects for Relative Localization Correction in a Connected Autonomous Vehicle Network

High-accurate localization is crucial for the safety and reliability of autonomous driving, especially for the information fusion of collective perception that aims to further improve road safety by sharing information in a communication network of ConnectedAutonomous Vehicles (CAV). In this scenario, small localization errors can impose additional difficulty on fusing the information from different CAVs. In this paper, we propose a RANSAC-based (RANdom SAmple Consensus) method to correct the relative localization errors between two CAVs in order to ease the information fusion among the CAVs. Different from previous LiDAR-based localization algorithms that only take the static environmental information into consideration, this method also leverages the dynamic objects for localization thanks to the real-time data sharing between CAVs. Specifically, in addition to the static objects like poles, fences, and facades, the object centers of the detected dynamic vehicles are also used as keypoints for the matching of two point sets. The experiments on the synthetic dataset COMAP show that the proposed method can greatly decrease the relative localization error between two CAVs to less than 20cmas far as there are enough vehicles and poles are correctly detected by bothCAVs. Besides, our proposed method is also highly efficient in runtime and can be used in real-time scenarios of autonomous driving.


INTRODUCTION
Localization accuracy, robustness, and reliability are important criteria for the operation and the safety of autonomous vehicles, which need localization accuracy in the order of decimeters (∼30cm) to stay in a specific lane (Levinson et al., 2007). This is not an easy task. The localization provided by the Global Navigation Satellite System (GNSS) can be easily influenced by weather and environmental situations such as buildings, trees, and tunnels. Its accuracy ranges from a few meters to over 20 meters (Tan and Huang, 2006). Solutions like Differential GPS (DGPS) can reduce the localization errors to less than two meters or more, but they need additional infrastructures and are not reliable all the time (Kuutti et al., 2018). IMU measures accelerations and angular velocities of the vehicle body which can be used to calculate the trajectory of the vehicle in a Dead Reckoning manner. However, it suffers from the error drifting effect. By integrating GNSS and IMU, (Zhang et al., 2012) reduced the localization error to several meters which is less than half of the result of the standalone solution with GNSS or IMU. Similarly, (Wahyudi et al., 2018) tested the localization errors of the integration system of GPS and a low-cost IMU (MPU6050). It shows an average error of 2.67m and 0.96°. However, they tested in an open space where good GPS signals can be obtained. In urban areas, the performance of this solution can be worse when not enough satellites are in sight. To reduce the errors caused by GNSS signal interruptions, Tightly-Coupled (TC) fusion of GNSS and IMU uses filtering techniques to improve high-rate positioning results. The TC IMU/GNSS fusion framework proposed by (Kourabbaslou et al., 2019) can reduce the Root-mean-square error (RMSE) to less than 2m when at least four satellites are visible and to 8m when no satellite is available at a travel distance of 450m. * Corresponding author To further improve the localization accuracy to the order of decimeters or even centimeters, more reference infrastructures such as base stations with know positions or onboard sensors such as cameras, Radars, or LiDARs are needed.
On the one hand, Real-Time Kinematic (RTK) techniques improve the GNSS accuracy by the GNSS correction information received from GNSS base stations. With multiple GNSS receivers, it can reach an accuracy of centimeters. However, because of the limited communication range and power strength of base stations, it is hard or impossible to use this solution in some urban areas such as tunnels, tree canopies, or multi-path effect areas that are close to the buildings (Uradziński, 2011). Besides, RTK-GNSS solutions also rely on the good visibility of GNSS signals. The accuracy decreases along with the outage duration of GNSS signals (Li et al., 2018).
On the other hand, methods that are only based on the onboard sensors either match the information of consecutive frames or register the local environment information to the global map to get the global localization. Using the first method, (Ward and Folkesson, 2016) integrates short-range Radar with GPS/IMU and has reached an RMSE of 7.3m and 37.7cm laterally and longitudinally, respectively. But in the worst case, the longitudinal error has reached more than one meter. By registering the local radar reflection of underground features to the global underground feature map, a new method of using Localizing Ground-Penetrating Radar (LGPR) proposed by (Cornick et al., 2016) has shown a RMS error of 4cm in different weather conditions and driving speed. However, the experiments in (Ort et al., 2020) of using LGPR have reported a mean total error of 0.34m in clear weather. In rainy weather, the result has degraded to 0.77m. Different from LGPR and Radar, camera and LiDAR not only can be used for localization but also are necessary for the perception of the autonomous driving system. Therefore, it is convenient and efficient to use any stage of information from the perception process for localization. This information can both be used for local matching and global registration.
An early work by (Parra et al., 2009) has proposed a visual odometry method with a mono-camera to estimate the trajectory of vehicle motion. It estimates the relative locations of the same vehicle in consecutive frames by matching SIFT feature points with RANdom SAmple Consensus (RANSAC) algorithm. The corrected trajectory is visually improved. However, no error metric is reported in this work. Instead of using RANSAC, (Li et al., 2013) first matches GIST (Oliva and Torralba, 2006) descriptor representations of images to get coarse localization, which is then utilized as an initial localization for the fine localization step to find the nearest neighbors of SIFT (Lowe, 1999) feature points. This work has achieved a localization error of 74cm. To further improve the performance of image-based visual odometry, (Suhr et al., 2017) integrates GPS and IMU with a camera sensor. It registers the detected lanes and symbolic road markings in the images to these in the digital High-Definition (HD) map. The average result of their experiments shows that the mean lateral error is about 0.5m and the longitudinal error about 1.2m. Furthermore, driven by deep-learning (DL) based object detection and semantic segmentation, more high semantic level of objects or points can be regarded as landmarks for the localization. (Nicholson et al., 2019) represents object-wise landmarks with 3D quadrics which are estimated by fusing the detected objects of images from different views. (Tian et al., 2021) extended this work by decoupling the quadric parameters to improve the robustness against the observation noise. It reported an average ellipsoid central translation error of 2.13m.
In general, image-based visual odometry can reduce the localization error to less than one meter in some specific scenarios and can offer low-cost solutions. However, the accuracy of this method depends on the resolution of images. Higher resolution leads to higher accuracy but also to higher computational cost. Besides, small environmental changes such as illumination can introduce huge domain gaps to the captured images. This can greatly deteriorate the performance of the developed algorithms. In comparison to cameras, LiDARs actively sense the 3D environment by projecting 3D laser beams and collecting reflection information. The generated 3D point clouds are more accurate and more robust against weather changes than images. Therefore, they are more suitable for autonomous driving. Although they are currently high-cost sensors, the prices have a downward trend as many manufacturers are putting a big effort into decreasing the cost of LiDAR sensors.
Based on LiDAR sensors, (Hata and Wolf, 2016) localizes vehicles by registering the curbs and road marking detected from point clouds data into the global feature map of the environment. It reduces the lateral and longitudinal errors to less than 30cm. By the integration of GPS/IMU, filtering algorithms and more 3D features, the accuracy of LiDAR-based localization can be further improved. (Wei et al., 2020) uses 3D curb features, the intensity of road marker points as well as the height information of the surrounding high constructions to implement such an integration. It has achieved an accuracy of 9ccm and 18cm for lateral and longitudinal directions, respectively. (Levinson and Thrun, 2010) further improved the robustness of the localization against the dynamic changes of the environment by creating a probabilistic infrared intensity map for the road surface. Their result shows a positional RMSE of 9cm. Similar to images, LiDAR-based localization also has many DL-based achievements (Elbaz et al., 2017;Wang et al., 2019;Yin et al., 2018), they all achieved better accuracy than the minimum required accuracy of 30cm for autonomous driving. In summary, with better data, LiDAR-based methods can easily surpass the performance of image-based ones.
Both image-and LiDAR-based methods discussed previously suffer from the presence of dynamic objects such as cars, pedestrians, and the occlusions introduced by these objects. They attempt to reduce the negative influence of the dynamic objects either by predicting them and then removing them (Li et al., 2019) or by selecting the most relevant features for localization (Lu et al., 2019). Nevertheless, in a CAV network, these dynamic objects can be very useful information for localization because they are spatially better-distributed landmarks than most of the static features that might have unfavorable repetitive patterns for matching or registration, such as facades, curbs, and road markings.
With CAV networks, or in other words, Vehicle-to-Vehicle (V2V) communications, some works (Fujii et al., 2011;Golestan et al., 2012;Altoaimy and Mahgoub, 2014) have achieved the accuracy of several decimeters based on the travel properties of radio signals, such as time-of-arrival (TOA) and time-difference-of-arrival (TDOA). However, this is still not accurate enough for the lane-keeping standard or for fusing the perception data coming from different vehicles that are supposed to improve the perception performance. Such collective perception can effectively reduce occlusion and increase road safety. However,  show that data fusion among vehicles is very sensitive to localization errors. Especially, for object-wise fusion, a position error of only 0.4m has led to a large performance drop in average precision of vehicle detection from about 85% to 20%. Therefore, high accurate localization is one of the key challenges for the perception system of autonomous vehicles to benefit from V2V communications.
In this paper, we propose an efficient method to correct the relative locations of vehicles in a CAV network. This correction can not only improve the information fusion performance of collective perception but also has the potential to improve the absolute localization by registering the fused data that has fewer occlusions, larger coverage, and more distinguishable patterns, to the global map. The proposed method uses the RANSAC algorithm to match the keypoints that are extracted from the result of the perception module of the autonomous system, specifically, the LiDAR-based object detection and semantic segmentation. It then uses the matched consensus to calculate the relative transformation between two vehicles.

Problem formulation
In a mixed traffic scenario of collective perception which contains both autonomous vehicles and normal vehicles, we assume an ego CAV C0 in the CAV networks. Other Ncav CAVs (Ci, i = 1, ..., Ncav) that are in the communication range Rc of the ego vehicle C0 can share information to the ego vehicle. The ground truth poses of these CAVs are notated as Xi = (xi, yi, θi), where xi and yi are the coordinates in the global coordinate system and θi is the orientation of the vehicle relative to the global x-axis. Correspondingly, we use the symbol ∼ on the top of variables to represent the erroneous observations of these variables. Therefore, the observed erroneous poses by GPS/IMU system are noted as ∼ Xi= (xi,ỹi,θi). All vehicles in the CAV network are collecting point cloud data which are notated as P Ci (i = 0, ..., Ncav). Based on each point cloud P Ci, object detection, and point-wise semantic segmentation are implemented in the perception module of the autonomous system. From this module, we extract three kinds of point information for localization. First, the bounding box centers of the detected vehicles P b , then the points of poles Pp and the points of big planar structures P f , such as facades and fences. Each point in the set P b , Pp and P f contains a vector of 2D x and y coordinates. We call P b and Pp as anchor points for the simplicity of later explanation because they are the keypoints to be matched and then used for calculating the final transformation between the coordinate systems of the CAVs in the network.
The goal of the relative localization error correction is to find the optimized transformation parameters ∆x, ∆y, ∆θ that transform the point set f } from the local coordinate system CSi of the cooperative vehicle to the coordinate system CS0 of the ego vehicle so that the transformed point set P i,0 has the maximum nearest neighborhood consensus with the point set of ego vehicle P 0,0 = {P 0,0 b , P 0,0 p , P 0,0 f }. The superscripts of P * , * represents the index of vehicles and the coordinate systems. For example, P i,0 is the points that are observed by Ci and represented in CS0. The objective function of this optimization problem can be formulated as where is the threshold for counting nearest neighbors and are the homogeneous representations of the original erroneous 2D points in set

Properties of relative errors
Since the true global location is not known, choosing the local coordinate system of any of the two CAVs as the reference for calculating the relative localization parameters will lead to an error propagation through a non-linear transformation from the global to the local coordinate system. The proof of this error propagation can be found in the Appendix. Assume the localization error of both ego CAV C0 and cooperative Ci has the Gaussian distribution of x, y ∼ N (0, σ 2 x,y )m and θ ∼ N (0, σ 2 θ )°. In order to fuse the point cloud data of the ego and cooperative CAV, the relative localization errors between the CAVs should be estimated. We accomplish this estimation in the coordinate system of ego vehicle CS0, in which the three parameters ∆x, ∆y, ∆θ should be estimated so that P Ci can be correctly transformed to CS0 and better match P C0. The derivation of the error propagation from the global to the local coordinate system can be found in the appendix A.
Taking the error of (−0.5m, −0.5m, −5°) for C0 and (0.5m, 0.5m, 5°) for C1 as an example, the relative errors along x and y axis are shown in figure 1, in which the relative translation error ∆x, ∆y is in a range of about -6m to 6m. The magnitude of ∆x, ∆y is positive correlated to the distance between C0 and Ci and is enlarged by this distance in comparison to the original error of 0.5m. Besides, the translation errors oscillate as the orientation of the ego vehicle changes. Different to the translation errors, the rotation error ∆θ keeps constant and equals to the summation of the absolute global localization error of both C0 and Ci.
As a summary, the optimum relative errors ∆x, ∆y, ∆θ should be searched in a large 3D space. It is time-consuming to conduct a refined search using a method like maximum consensus (Axmann and Brenner, 2021), although it can provide the global optimum if the maximum search space is previously known. For the example shown in figure 1, the search space is about 12m × 12m × 10°. If the searching resolution is set to 0.1m and 0.1°, the result of consensus should be counted for 1.44 (120 × 120 × 100) million times. Therefore, a more efficient method using RANSAC is proposed in this paper.

Error correction with RANSAC
To obtain the classified points P b , Pp and P f for localization error correction, we use the pre-trained network that is proposed in (Yuan et al., 2022) to generate bounding box predictions and the down-sampled points of poles, facades and fences. We use all the detected bounding boxes to generate P b and all points of poles for Pp. However, only a subset of N f points of facades and fences are selected using Furthest Point Sampling (FPS) for P f because these points can be any point on the planes and provide less accurate matches than P b and Pp, in which each point corresponds to the center of a specific object. A reduction in the number of points P f not only reduces the computational overhead but also makes the algorithm rely more on P b and Pp so that it is more robust against the variation of point density in the point cloud data.
The workflow of proposed relative localization error estimation is described in algorithm 1. Line 1 and 2 define the notations of the algorithm. A and B are the set of the anchor points (center points of the detected vehicles and pole points) that belong to the ego point cloud P C0 and the cooperative point cloud P Ci, respectively. Line 3 defines the initial values that need to be updated during the workflow. For each point, b in set B, at most n Algorithm 1 Relative localization error estimation Ensure: , 2: c * : category label (box, pole or fence/facade) of point * , 3: Ncons,max = 0, M = ∅. 4: for each b ∈ B do 5: T ← CalT F ((ai1, bi1), (ai2, bi2)) 16: 18: if Ncons, max < Ncons then 19: if |M | ≤ 2 then 21: end if 26: end for 27: return Tout nearest neighbors are found in set A with a maximum Euclidian distance of 1, because the correct matching points might not have the smallest distance among the n nearest neighbors.We set n to two to ensure the efficiency of the algorithm. Besides, the neighbors should have the same category label as point b as described in line 5. The two neighbors (α, β) of b are then put into set M as dictionary format as shown in line 6. α and β can be None if not enough neighbors are found.
Based on the neighborhood matchings in set M , we start our RANSAC matching iterations from line 10. The total number of iterations is defined by the minimum value between the predefined maximum total number of iterations Nransac and the number of the combinations C 2 |M | (line 9), where |M | is the cardinality of M (line 8).
In each iteration, two indices of the matches in set M are randomly selected (line 11). The resulting matches chosen are given in line 12. Since the two nearest neighboring points (α, β) are found for each point b, only one from these two is selected for the current iteration (line 13-14) to calculate the relative transformation between the set A and B (line 15). Afterwards, the full selected localization points are transformed with the resulting transformation matrix T (line 16). The outcome B as well as the full localization point set of ego vehicle A = are used for counting the consensus for this iteration of RANSAC matching (line 17), for each point b k in B , if it has a nearest neighbor a k whose distance to b k is less than the threshold 2, the counting variable Ncons will be increased by 1. If the counting result Ncons is larger than the last iteration, the matching correspondences M of set A and B and the refined transformation matrix Tout should be updated (line 18-25). Different to T which is calculated only based on two matched point pairs, Tout uses all consensus anchor point pairs found between A and B to calculate a refined transformation. There are three parameters that need to be determined by CalT F with at least two point pairs. Therefore, it is possible that not enough matchings are found for the refined error correction. In this case, we return an identity matrix as shown in line 21.
The detailed implementation of function CalT F is described in algorithm 2. The input of the function is the N matched 2D point pairs. The mean (a, b) and the reduced coordinates

Dataset
We use a synthetic dataset COMAP (Yuan and Sester, 2021) that is specially designed for collective perception with CAVs to validate the effectiveness of our proposed method. The reasons for this are two folds: first, it is extremely expensive to collect real data with networked CAVs; second, the synthetic dataset can also provide 100% accurate ground truth localization which is beneficial for the evaluation of the localization error estimation algorithm.
COMAP contains 7788 frames of data collected by 20 CAVs and from 21 different junctions scenarios. Since 4155 frames are used for training the network proposed as in (Yuan et al., 2022), we use the other 2148 frames from the test set to evaluate our method. In each frame, we select one ego vehicle and at most five cooperative vehicles that are in the communication range Rc = 40m of the ego vehicle for evaluation. The relative localization error between each pair of ego and the cooperative vehicle is estimated using algorithm 1.

Experiment settings
To generate efficient number of labeled points for localization, we empirically set N f = 50 which ensures that algorithm 1 Figure 2. Sample result before (left) and after (middle and right) the relative localization error correction finds the correct consensus for P f with the lowest number of points. For nearest neighbor searching, the distance threshold 1 is set according to equation where Rc is the communication range and σr the standard deviation of the relative rotation error in degree. Rc · σr · π/180°i s the standard deviation of the relative location error derived from σr. The factor η = 2.58 is the z-score of standard normal distribution at the confidence level of 99%,which means that the relative positin error caused by σr has a confidence of 99% to be less than 1. This setting ensures that RANSAC has a high chance to find correct point matchings in a limited searching range. After the rough error correction with the estimated transformation matrix T , the threshold 2 for the second round of nearest neighbor searching is reduced to 1m. To investigate the influence of the number of RANSAC iterations (Nransac) and different standard deviations for localization errors (σx, σy, σr) on the performance of the proposed algorithm, we configure our experiments by varying Nransac from 10 to 50 with an offset of 10, and assuming the localization errors are normally distributed and varying the standard deviation of the translation error σx and σy from 0.2m to 1.0m with an offset of 0.2m, and the rotation error σr from 2°to 10°with an offset of 2°. All experiment runs are processed on a 6-Core Intel i7-8700 CPU.

Evaluation metric
The Root Mean Squared Error (RMSE) is used to evaluate localization accuracy after the error correction with the proposed algorithm. In order to have an intuitive comparison with the originally introduced localization errors in the global coordinate system, we take the residuals between the estimated and the ground truth relative errors that are represented in the global coordinate system for calculating RMSE.
However, the randomness of RANSAC might lead to failures in finding the optimal result. Besides, it is also possible that the field-of-view (FOV) of the two vehicles do not have sufficient overlapping detected vehicles and poles to calculate the relative localization error between them. Therefore, we also evaluate the rate of valid error correction results by filtering the results with the consensus numbers Ncons. Specifically, the valid rate is calculated with where Ns here is the total number of sample pairs of ego and cooperative vehicles, between which the relative localization error should be corrected, and thrcons is the threshold to filter the valid sample pairs. Finally, we use the processing frames per second (FPS) to evaluate the runtime of the proposed method.

RESULTS AND EVALUATION
In general, the proposed method can correct the relative errors to less than 20cm (see figure 4) as far as there are enough localization anchor points of the ego and cooperative vehicles can be matched and these matches have a good spatial distribution. An example result is shown in figure 2. All observed information of the ego vehicle is shown in blue, and cooperative vehicle in red. The detected vehicles are notated with rotated bounding boxes (BBoxes). The thick bars of the BBoxes indicate the front of the vehicle. Poles are shown with stars and other small points in the left two figures are the down-sampled points of fences and facades. Besides, all points before down-sampling are shown in the right subplot in order to better illustrate the performance of the error correction algorithm. As shown in the left subplot, the large initial relative error has led to wrong matches between the detected vehicles of the ego and the cooperative CAVs. After error correction, they are well matched as shown in the middle subplot. The right subplot shows that the point clouds of the two CAVs are also well aligned to each other.
To further give some quantitative details about the result, in the following subsections, we first show the relative error properties through error propagation based on the dataset. Then the result of parameter studies of the proposed method is demonstrated.

Error propagation
As discussed in section 2.3, the relative localization error between two vehicles can be amplified along with the error propagation through the non-linear transformation. This is also Figure 3. PDFs of errors in x-axis with initial local error of 0.4m verified with the test dataset. After adding the normally distributed error with σx,y = 0.4m and σr = 4°to the localization of each vehicle, the standard deviation of relative localization errors is greatly increased as shown in figure 3. The blue high peak shows that most vehicles have a very small initial localization error in the global coordinate system (CSg), the addition of the errors of two vehicles is much more flatly distributed with a bigger standard deviation (orange). After transforming this relative error from the CSg to the local coordinate system CS0, the standard deviation of the errors increased again (green). After error correction with our method, the standard deviation of the relative localization is greatly decreased as the example correction result of the initial error of σ = [0.4m, 0.4m, 4°] in figure 4 shows. The black lines indicate the initial error distributions, other lines illustrate the error distributions after correction with a different number of RANSAC iterations Nransac. With all configurations of Nransac, the relative errors for most samples can be reduced by a large margin, especially the rotation errors. As shown in the right subplot of figure 4, the initial distribution of rotation is nearly flat because of the large standard deviation of the rotation angle in degree. However, after correction, these errors are mostly reduced to less than one degree.

Performance on different RANSAC iterations
As a quantitative result, figure 5 shows the RMSEs of different Nransac by averaging over all samples including the outliers. In these outliers, either not enough matchings are found or wrong matchings are used for calculating the final transformation. This will lead to very large errors and therefore result in larger RMSE values. The RMSEs in figure 5 that are averaged over all samples and initial error configurations can reflect an overall performance on the dataset. As Nransac increases, RMSEs of both translation errors (x, y, and the Euclidean norm of x and y) and rotation errors have a decreasing trend. In the beginning, this trend is stronger than that at the end when Nransac > 30. Therefore, 30 to 40 iterations are the most sufficient configuration of Nransac for the proposed method to correct the relative localization errors caused by the global localization errors introduced in the experiments.
Different to the overall RMSEs, the top left subplot of figure 6 shows the valid rates based on different number of iterations (vertical axis) and the thresholds of the number of minimum consensus thrcons (horizontal axis). For all thrcons, the valid rates are increasing as Nransac increases and the improvement at smaller Nransac is more significant. Besides, the valid rates are, as expected, all decreasing in all Nransac configurations as thrcons increases. Because larger thrcons would lead to more exclusions of wrong estimation outliers.

Performance on different errors
Beside Nransac, we also plotted RMSEs (Ex, Ex, Er) of different initial localization errors with respect to the consensus thresholds in figure 6. These plots are all evaluated with Nransac = 30 because it is a sufficient setting regarding the RANSAC iterations and the performance as discussed in the last section. As thrcons increases, the RMSEs after correction have mostly a decreasing trend, especially for those with larger initial errors. When the error reduces to a specific level, it reaches its best performance and stops decreasing. For example, the RMSE in the x-axis (Ex) decreased from 0.19m to 0.15m and then to 0.09m as the consensus threshold increased from 2 to 6, it then fluctuates in a small range. This fluctuation effect is more apparent in the last row of the RMSE plot of Ey and Er. Besides, as we enlarge the initial standard deviation of the errors, the RMSEs after correction are dropping. This is because our proposed method only searches correspondences in a predefined range. Larger errors will lead to a larger searching range, and therefore also needs more RANSAC iterations. Within limited iterations of searching, the best matching result might not be found and this will, therefore, lead to larger RMSEs. This effect can be better observed with the result of thrcons = 10 in figure 7. At the smaller initial errors, the RMSEs after error correction are all less than 20cm in the xand y-axis and the rotation error under 0.25°. The resulting lateral error in y-axis is smaller than the longitudinal error in x-axis because most points of poles and planar objects are regularly aligned along the longitudinal direction which increases the difficulty for matching in this direction. As the error increased to [1m, 1m, 1°], the RMSEs are greatly increased-the Euclidean norm of the RMSE of x and y increased from 15cm to 40cm, and the RMSE of rotation error from 0.25°to 0.4°. In this case, a brute-force search in all possible matchings can be used as only a limited number of detected vehicles and poles are in the scenario. However, the benefit of RANSAC in reducing runtime will be lost.

Runtime
The number of RANSAC iterations is the most correlated variable to the runtime. As shown in the left plot of figure 8, the FPS drops as we increase Nransac. Besides, the dropping slope is becoming more gentle as Nransac increases because Nransac gradually reaches the number of all possible matches C 2 |M | (see alg. 1) in some samples that only have very few anchor points for matching. In the right plot, the FPS with respect to RMSEs of the experiments with initial errors of σ = [0.4, 0.4m, 4°] and thrcons = 10 is shown. At a high FPS rate (∼150), the algorithm generates significantly worse RMSE results than the lower FPS. However, no RMSE reduction is observed in lower FPS. In combination with the previous discussions of valid rate, this reveals that only a small number of RANSAC iterations can already sufficiently find correct matchings between the anchor points of two CAVs, further increasing Nransac with the sacrifice of the runtime can only improve the overall performance by increasing the valid rate but not the by reducing the RMSE of all valid estimations which are filtered with thrcons = 10. In other words, if more error estimations are regarded as valid, more hard samples will be included in computing RMSEs, which, therefore, leads to larger RMSEs as shown in the left bottom corner of the right plot in figure 8.

CONCLUSION
In this paper, we proposed an efficient RANSAC-based method to correct the relative localization error between two CAVs that can share the point-cloud-based object detection and semantic segmentation result with each other via CPMs. Our method takes the center points of the detected vehicles and the pole points as anchor points for the RANSAC algorithm to find matches between the observed anchor points of two CAVs and calculate the transformation between the two CAVs based on these matched anchor points. In addition to the anchor points, the points of fences and facades are also used for counting the consensus of RANSAC algorithm and finding the best matches. The result of our experiments on the synthetic dataset COMAP shows that the proposed method can significantly reduce the relative localization errors. The overall performance of the proposed method is related to the initial error, the number of RANSAC iterations, and the consensus threshold for counting the valid error estimations. As far as the anchor points of the two CAVs have enough overlaps and good spatial distribution, the relative localization error can be reduced to less than 20cm.
Although the proposed method can efficiently and effectively reduce the relative localization error, it has no guarantee to find the best matches because of the randomness of RANSAC. In the worst case, wrong matches can lead to enlarged errors. Therefore, we plan to improve the performance by combining our method, for example, with maximum consensus so that the rotation can be roughly corrected first and therefore the searching range of RANSAC can be reduced. Besides, it is also necessary and meaningful to validate the effectiveness of our method on the real dataset which might have different features on keypoints. For example, the difference of the real traffic pattern might lead to different geometric distributions of the selected keypoints for matching. Moreover, real dataset with more data noise as well as weaker performance quality of object detection and semantic segmentation may result in a lower quality of the detected keypoints. However, it is inauthentic to simulate the patterns and the quality of the detected keypoints, for example, by randomly introducing errors to the keypoints. Instead, it might be more meaningful to test our error correction method on the real dataset and different object detection and semantic segmentation frameworks in the future work.
Zhang, F., Stähle, H., Chen, G., Simon, C. C. C., Buckl, C., Knoll, A., 2012. A sensor fusion approach for localization with cumulative error elimination. 2012 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), 1-6. APPENDIX A: Propagation of the global absolute localization errors of ego vehicle C0 and cooperative vehicle Ci to local relative error.