ANALYTICAL FORWARD-PROJECTION AND SYSTEM CALIBRATION FOR IMAGING THROUGH REFRACTIVE WINDSHIELDS IN AUTOMOTIVE APPLICATIONS

: We study in this paper models to mitigate effect of a windshield-related refraction on imaging systems. The literature shows that the distortions introduced by this curved surface are non-linear in their effect and reduce the performance of image-based analyses and depth estiamtion algorithms. We show that using geometric optics and local approximation of the windshield’s surface to a spherical one, a direct analytical refractive forward projection (RFP) form of a 3-space point onto the image plane can be derived. Howevere, an exact form requires solving a 22-degree polynomial which may become numerically unstable. To stabilize the solution, we demonstrate how the introduction of valid local assumptions on the interface allows reducing the polynomial degree down to 8 and 4. Utilizing these forms we then show that the RFP can be used to jointly estimate the camera pose and the windshield’s surface parameters through minimization of the reprojection error. The proposed models are tested on simulated data and validated on real-world observations. Results show stability and a sub-millimeter level of reconstruction accurecy, alluding to the validity and quality of our representations.


INTRODUCTION
The calibration of imaging systems has been the subject of extensive research as it sets the foundations of a variety of applications, including image-based scene analysis, multi-view 3-D reconstruction, structure-from-motion, visual odometry, and robotic navigation, to name only a few (Weilong and Lizuo, 2022). For autonomous driving applications, proper calibration of the imaging system is vital as it facilitates the localization, mapping, and estimation of distances and relative velocities of the surrounding traffic. In contrast with prototype versions of autonomous self-driving vehicles, where the cameras are mounted on the car roof, their placement in commercial vehicles has to be both functional and aesthetically pleasing, often being positioned behind the windshield (Hanel and Stilla, 2019). Handling refraction introduced by the windshield needs to address the non-linear ray trajectory and the surface model of the interface. The literature shows that while imaging through a flat refractive surface was researched extensively (Chadebecq et al., 2020), little can be found regarding the effect of curved surfaces, in general, and windshields, in particular (Hanel and Stilla, 2019). It has been demonstrated how distortions introduced by a windshield significantly affect the estimation of depth (Verbiest et al., 2020), and it has also been established that the refraction of the ray translates into nonlinear aberrations that are depth-dependent and are difficult to compensate for by direct image correction methods (Telem and Filin, 2010). In addition, Krebs et al. (2021), who analyzed the impact of these geometrical aberrations on classification tasks performed using state-of-the-art convolutional neural networks, demonstrated how the prediction accuracy was lowered by up to ten percent.
An explicit formulation of the refraction effect based on Snell's law was proposed by Chari and Sturm (2009), who analyzed * Corresponding author the multi-view relations when the scene contained a single refractive planar surface that separated two different media. Telem and Filin (2010) introduced an axial modification to the imaging system by developing a varifocal adaptation to handle the depth-dependent distortions introduced by refraction. This modified form was integrated into the collinearity equations and allowed the pose and system parameters to be estimated simultaneously. Agrawal et al. (2012) investigated the multi-layered flat-refractive geometry and demonstrated that such imaging systems correspond to an axial form. Elnashef and Filin (2022) developed a two-view flat-refractive model and proposed an orthogonality constraint that separates the rotational, and translational system-related components. In reference to non-planar interfaces, Kunz and Singh (2008) studied the effect of a misalignment between the camera and port centers when imaging through hemispherical housing ports in an underwater environment. They showed that the non-linearity of the ray's path requires an iterative nonlinear minimization to project a point from object-space onto the image. ? explicitly modeled the refraction through a curved surface. The authors approximated the refractive medium as a thick cone slice, where the inner and outer cones have the same aperture and the centers are such that the medium thickness is constant. To estimate the surface parameters, they used an elaborate minimization cost function, which was based on a backward-projection ray-casting model. In a later work, ? approximated the windshield surface by an ellipsoidal form. A similar minimization was used but the results were of numerical simulations only. Yoon et al. (2020) studied stereo-imaging through a refractive interface with an arbitrary surface form and derived a refraction-aware triangulation model for depth estimation. In order to estimate the surface parameters, targets were set on the inner refractive interface. Experimental results were compared to a baseline method that attempted to mitigate the refraction effect using lens distortions model. A clear improvement when using the refraction-aware triangulation model was reported. Verbiest et al. (2020) who studied the effect of imaging through a windshield, demonstrated that the refraction effect leads to severe reconstruction errors when neglected. To reduce the complexity involved in modeling an arbitrary curved surface form, the authors used a local spherical surface approximation and applied a ray-tracing formulation to model rays trajectory through the system. They derived a polynomial solution as a function of the viewing angle, but no actual information on its degree, robustness, or performance was provided. An elaborate four-step procedure was proposed to estimate the system parameters, involving a non-linear minimization, which was followed by a refinement step using a local spline-based interpolation to accommodate for deviations from an ideal spherical shape. ? identified the axial nature of the imaging through hemispherical ports. An evaluation of the non-linear distortions was provided whose source was related to the camera displacement from the sphere center and the interface thickness. The authors developed a varifocal representation that followed the flat-refractive system modeling by Telem and Filin (2010). They then estimated the pose parameters and the decentering of their system.
As the literature shows, imaging through refractive surfaces introduces severe nonlinear distortions, and as the ray's path through the system becomes non-linear it also necessitates an iterative minimization procedure to project a point from object space onto the image plane (Verbiest et al., 2020). Considering the non-parametric geometric form of the windshield, modeling its consequent effect on the ray path and the pose parameters becomes involved, even when simplifying its shape. To alleviate these modeling-related complexities, we propose an alternative formulation of the refraction-related effect on the system. We show that even when modeling the surface by a simplified spherical form, a direct ray tracing formulation would translate to a 22-degree polynomial in terms of the image coordinates. This complex form is prohibitive for solving and has led many researchers to adopt complex optimization schemes. Instead, we show that with local approximations that have little effect on the outcome, the non-linear imaging system can translate into 8-or 4-degree polynomials for the forward projection. Our modeling facilitates an efficient and direct system calibration solution where all parameters can be estimated with high accuracy. Evaluations of our model demonstrate high levels of accuracy in the estimation of both pose and system parameters, robustness to high levels of noise, and a sub-millimeter level of accuracy in 3-D reconstruction.

THEORY
We consider an imaging system consisting of a camera mounted behind a thick windshield. The imaging system is of a threelayered air-glass-air form, where µ is the index of refraction of the windshield material. Without loss of generality, we set the index of refraction of the air to 1. Following Verbiest et al. (2020), we locally approximate the windshield surface to a spherical form. We define the axis between the camera and the sphere centers as the unit vector a = [0, 0, −1] T and the distance between them as d (Fig. 1). Thus, the sphere and camera centers are set to o = [0, 0, 0] T , and c = o + da, respectively. When d = 0 the camera projection center and sphere center coincide and no refraction occurs. Defining the sphere thickness by ds, the inner and outer radii become r and r + ds, respectively. The image plane coordinates x = [x, y] T , are given in a calibrated camera frame so that an image ray would have the form of v = [x, y, f ] T , and be measured in reference to the principal point, also the following correction to the lens distortion effects. The angular offset between the optical axis and system frame can be expressed by two rotations, off and about the optical axis M = Mz(τ )My(η). However, the rotation about the z-axis is strongly correlated with the camera rotation parameters (Telem and Filin, 2010), reducing the transformation to a single rotation, v0 = My(η)v. Following this transformation, the image ray in the system frame is given by v0 = Mv. The ray sequence v0, v1, v2 from the camera center to an object-space point describes the light traversal through the system (Fig. 1), where q 1 and q 2 are the points of refraction on the inner and outer spheres, respectively. For each refraction, the intersection on the interface can be determined by a ray-sphere intersection. We define Q SU = diag{1, 1, 1, −1} as a unit sphere where all points, q, lying on it fulfill the quadratic form q T Q SU q = 0. The inner and outer spheres Q Si and Q So , can be regarded as a transformed version of Q SU by: where and where both spheres are centered at the origin oi = oo = [0, 0, 0] T . To characterize a ray passing through that system we first seek the position on q 1 . Defining an image ray by q 1 = c + λv0, we wish to find its intersection point, q 1 , by solving the quadratic form: as a function of λ. We select λi+, the positive root, as the scene is defined in our system to lie in front of the camera. With the choice of λi+ the first intersection point, q 1 , is directly derived. In addition, as the center of the sphere coincides with the system origin, the normal to the sphere at q 1 , is given by ni = q 1/∥q 1 ∥. As the ray propagates through the medium, it refracts. To compute the new direction, v1, we use the vector form of Snell's law of refraction: where ξ = 1/µ, and: We compute q 2 using the intersection of v1 and Q So , by first solving for λo+ for q T 2 Q So q 2 = 0, where q 2 = q 1 + λo+v1. In addition, we have no = q 2/r+d s . Finally, the direction of the object-space ray, v2, is given by: We have traced the ray trajectory from image-space to objectspace, and if the distance to the object-space point is known, its coordinates can be fully reconstructed. 2.1 Ray-tracing based system calibration by iterative forward projection (IFP) The ray-tracing form sets the foundation for defining the cost function in reference to the camera pose and the system parameters. To begin, a set of N corresponding points {xi → Pi}, ∀i = 1, . . . , N is given and our objective is to calibrate the system parameters and solve the camera pose, namely π = {R, t, d, r, ds, η} as the unknowns to solve for. The objective is to estimate them optimally by minimizing the reprojection error. A problem that arises in the way of realizing this is the unknown path of the forward projection from an object-space point to its image-space location. This is a well-known iterative procedure, in which the ray is first linearly traced to the image plane. The ray is then backward traced to object-space using our ray-tracing solution and by using the depth of the objectspace point. This iterative procedure continues until convergence is reached. Essentially minimizinĝ whereP is the back-projected 3-D coordinates of the objectspace point, andx is the corresponding image-space point. Next, given a set of N corresponding points, a calibration step commences, as a function of the image-space reprojection error: In each iteration, all N 3-D points are first iteratively being projected to the image plane by using Eq. (10) and the estimated values of π. Then, Eq. (11) is used to estimate a correction to π. This nested minimization form has proven to limit the accuracy of the estimated parameters, introduce instabilities, and is also inefficient. It is common to find algorithms that fix some of the parameters as a means to stabilize the solution and secure convergence. Alternatively, we propose here an efficient solution to project 3-D points directly onto the image plane, thus, eliminating the need for minimization in Eq. (10).

Forward projection through refractive quadratic surfaces
Our objective is to alleviate the need for the iterative raytracing-based forward projection phase in the solution. To de- Data: Pose (R, t) and system parameters (a, r, d, ds, µ) 1 nP = a×P // Compute the image ray, v 0 in 3-D sapce rive a direct form, observe that the surface normal at the point of refraction, the incident ray, and the axis defined by the sphere and camera centers form a coplanar triad that defines the plane of refraction (POR). In addition, all planes of refraction form a pencil of planes about an axis that passes through the camera and sphere centers, a, and we use this axis to define the system frame ( Fig. 1). For convenience, we develop our model within the POR and for that define a planar reference frame within it. Consider the camera center to be positioned at a distance d from the center of the sphere and letẑ2 = N (a) define the system axis, where N is a normalization operator. The orthogonal direction toẑ2 can be defined byẑ1 = N (ẑ2 × (ẑ2 × v0)).
Within this plane, the origin is at the sphere center, o = [0, 0] T , suggesting that the camera center coordinates are c = [0, d] T . The rays, v1, and v2 along the light path can be represented by 2-D vectors vp i = [ẑ T 1 vi,ẑ T 2 vi] T , i = {1, 2} in the POR. At the inner interface, we set q 1 = [x, z] T , as the point of intersection with the sphere. This allows us to express vp 0 and n at q 1 by vp 0 = q 1 − c = [x, y − d] T and ni = N (q 1 ). To compute q 2 and no the intersection between the ray vp 1 and the outer sphere needs to be computed, from which the values of noi and then vp 2 follow. Defining p = RP + t as an objectspace point transformed to the camera system frame, where R, t = [tx, ty, tz] T as the rotation matrix and translation vector, where × denotes the cross product. As q 1 is a point on the inner circle with radius r, we can write z = √ r 2 − x 2 . Substituting this form into Eq. (12) transforms the constraint into a function of only x. Assuming that the pose and system parameters are known, one can project a point P by solving a single variable constraint. However, because of the complex trajectory, this constraint contains a cascade of root terms, observable from the root terms in Snell's law of refraction, (Eqs. 7 & 9) and the two quadratic equations when computing the points q 1 and q 2 (Eq. 5). Agrawal et al. (2012) demonstrated that a solution for the flat-refractive system can be reached by removing the square-root terms through mathematical manipulation leading to root-free constraint with one unknown. The authors then demonstrated that for a 3-flat-layers system with two refractions, Eq. (12) correspondence to 12-degree polynomial. However, in our case and because of the higher dimensionality of the surface compared to the simple planar case, Eq. (12) yields a 22-degree polynomial with the monomials 1, x, x 2 , . . . , x 22 . This form facilitates a forward projection of a point from objectspace onto the image directly for a noise-free system. However, simulations show that this projection is unstable when noise is introduced. To improve the stability of the polynomial, we propose two approximations to the ray-trajectory through the system, which we demonstrate are negligible in effect, and then validate their effect in our real-world experiments.
First, we observe that the change in the normal direction between ni and no results in a change between the directions of the rays v0 and v2 which would be similar if ni = no. Analytically, from Fig. (1), it can be shown that the angle between the two normals is given by: where θ1 is the refraction angle formed between the normal ni and v1. For a windshield, the radius would be much larger than the glass thickness ( r >> ds ). For example, a standard 1 : 100 ratio would amount to ϕ ≈ 0.01 tan θ1 and for a camera with a 45 • filed-of-view, the offset angle will not be greater than ϕ ≈ [−0.01 • , 0.01 • ]. Hence, setting ni = no is a reasonable approximation with a byproduct of v0 = v2. The effect on Eq. (12) is vast, as the polynomial form of the forward projection reduces to the degree 8 with the monomials 1, x, x 2 , x 3 , x 4 , x 5 , x 6 , x 7 , x 8 . This form is stable and facilitates application for the system parameters estimation. Details on the polynomial coefficient can be found in Appendix (A.1).
Another viable simplification (see e.g., Yoon et al., 2020) is to consider the two interfaces to be locally parallel. Computing q 2 entails solving a quadratic equation of the intersection between v1 and the outer sphere, Q So . For a windshield, these computations can be approximated by computing the distance to a parallel interface at a distance ds as follows: Hence, With both approximations, Eq. (12) is further simplified, leading to a 4-degree polynomial, with the monomials 1, x, x 2 , x 3 , x 4 . This form is advantageous as it also allows computing the roots linearly in a closed form by applying Ferrari's solution (e.g., Kurosh, 1980). Details on the polynomial coefficient can be found in Appendix (A.2). Using both approximations translates to a local flat-refractive approximation with ni as the flat-surface normal. Furthermore, if the windshield surface is smooth and has sufficiently low curvature, the approximation errors are practically negligible, regardless of whether the medium is thick or thin. When the medium is sufficiently thin, the error caused by the parallel assumption is also negligible, also confirmed by our experiments. Given a known set of camera pose and system parameters, a point P can be forward-projected onto the image, x, directly and linearly using either the 8-or 4-degree polynomials, respectively. The correct solution (root) is the one that best abides by Snell's law refraction. Algorithm (1) details the polynomial forward-projection procedure.
Calibrations Calibrating the system would require minimizing the function in Eq. (11). However, because of the large number of unknowns, the minimization may prove to be hard to solve. Instead of solving the entire set of parameters, which includes the camera pose and system parameters it was shown by Elnashef and Filin (2019) that the problem can be separated into two parts, refraction dependent and refraction independent parameters in a flat-refractive system. This model can be extended to the refraction through curved surfaces, as it is a coplanar constraint on rays within the POR. Harnessing this, we can solve R, tx, ty linearly and independently from refraction and system parameters with as little as five coplanar points (e.g., checkerboard target). Thus, reducing the parameters to solve for to only six, including tz, r, ds, d, µ, η.

EXPERIMENTS
To test the performance of our proposed calibration solution, experiments were carried out via simulated experiments and validated using real-world data. For the simulations, the gener- ated data followed similar settings as the ones used in the realworld experiments. A Canon PowerShot G9 X camera model was used, with a 5472 × 3648 pixels frame, and a 2.41µm pixel size, while the principal distance was set to 32 mm. The index of refraction was set to µ = 1.5 for the glass-made windshield. All image points were perturbed by a random Gaussian noise with a standard deviation ranging from 0 to 2 pixels, and for each noise level, 100 trials were performed. As for the number of images used in each trial, though a single image suffices to estimate all of the system-and pose-parameters, for redundancy, ten images of a 2-D gridded target were used. The target comprised of 48 points forming a grid of 6×8 spaced by 2.5 cm with a total size of 17.5 × 22.5 cm. This was simulated while positioning at depths ranging between 0.5 and 5 m in front of the camera and rotating randomly by up to 20 • in each test. We randomly placed the camera at a distance dw from the windshield, at a predefined range [2,30] cm along the a, while offaxis deviations were randomly set up to 5 • off a. To generate simulated data, independent of our model, the ray-tracing model described in Kunz and Singh (2008)  Validation metrics: We used the following metrics to evaluate our results: the positional error was defined as the relative difference between the known and estimated pose εt = ∥ttrue − t estimated ∥/∥ttrue∥, and the rotational error is measured by the minimal rotation angle required to go from the estimated rotation matrix to the actual one. For its computation, we note that for the 3-D rotation group SO(3) there is an intrinsic notion of distance, which we present here in a matrix form (Trefethen and Bau, 1997). Let R andR be the true and estimated rotation matrices, respectively. The differencerotation-matrix is defined by: δR ≜ RR T , and the angle of the difference rotation can be retrieved by the trace of δR such that: ε ϕ = cos −1 ( (1−trace(δR)) /2).
Pose estimation Plots of the estimation accuracy of the algebraic and reprojection-error minimizations are presented in Fig. (3). As shown, for a noise level ±0.5 pixels and using a 2-D target, the positional error is as low as 0.075%, and the angular error is 43.2 ′′ (arc seconds). Applying the optimal form improved the results to 0.018% and 10.8 ′′ for the positionaland angular error, respectively. The quality of these results is comparable to the ones obtained in an in-air imaging setup. Additionally, the reprojection error exhibits a linear trend as a function of the noise level increase, an indication of the stability of both the linear and non-linear forms.
Calibration In the calibration process, the estimated pose parameters were fixed and we only solved the remaining system 1 https://grabcad.com/library/mazda-miata-windshield-2002 parameters. To evaluate our proposed polynomial-based calibration, the 4-and 8-degree forward projection (FP) forms were compared to the iterative FP (IFP) counterpart (Sec. 2.1). Fig. (4) plots the results of estimating the system parameters as a function of Gaussian noise. Results show that using our solutions the errors of all the system parameter estimates are on a low fraction of a percent in reference to their actual values. This is while the IFP solution error increases sharply as more noise is added to the system, reaching more than 2% error in the estimation of the camera distance from the sphere center, the windshield thickness, and index of refraction, d, ds, and µ, respectively. For example, while using a 2-D target and a ±0.5 pixel noise, an error of 0.016%, 0.009% and 0.625% for d were recorded using 4-, 8-degree FP, and the IFP solutions, respectively. Similarly, and for the same level of noise, an error of 0.015%, 0.021%, and 0.33% for ds was recorded. In both d and ds, an improvement by an order of magnitude or more was recorded when compared to the IFP solution. The refraction index, µ for the same noise level was estimated with an error of 0.021%, 0.022%, and 0.691%. When estimating the reprojection error, a linear trend was observed through the different levels of added noise, which indicates the stability of the estimation model (Fig. 5). Note that when comparing both 4and 8-degree FP forms comparable results were reached with the 4-degree FP being more efficient.

Real-world experiments
Real-world experiments were performed as a two-step calibration procedure. First, the camera was calibrated outside of the car and the intrinsic calibration parameters were estimated and corrected. Next, the camera was set in the car and in front of the windshield, and images of the checkerboard target were captured through the refractive surface (Fig. 6). To validate these results, a reconstruction experiment followed where our ability to restore intrinsic geometric properties within the scene was evaluated. For that, all 48 corners of the 2.5 cm-spaced, 17.5 × 22.5 cm, grid were measured (Fig. 6). Our intrinsic measures included the coplanarity of the points, collineation, parallelism, and orthogonality, where the measured grid lines facilitated our analyses. External metric measures of the grid dimensions provided a means to evaluate the metric accuracy. In this experiment, the images that were used for calibration were not used for the reconstruction. The working distance for the reconstruction was also different from that used for the calibration. The first test evaluated the coplanarity of the mapped points, and a plane that was fitted to them yielded a 0.08 mm deviation using our 4-degree FP model compared to 3.94 mm using the IFP solution. The collinearity test for points lying along straight lines, and as a consequence the removal of the refraction effect that bends them, yielded a mean deviation of 0.18 mm and a maximal deviation of 0.31 mm from collinearity compared to a mean deviation of 2.88 mm and a maximal deviation of 4.95 mm. Computation of the angles between parallel and orthogonal lines of our solution shows a mean absolute error of 12 ′′ and 15 ′′ in parallelism and orthogonality, respectively, whereas the application of IFP solution was of much lower quality, with 263 ′′ and 347 ′′ . Finally, the dimensions of the target were computed by measuring the distance between edge points along the width and height of the target. The errors were 0.48 mm (0.27% of 175mm), and 0.79 mm (0.35% of 225 mm) with respect to the actual dimensions in our solution, again showing significant improvement compared to the IFPbased solution with 2.31 mm (1.32% of 175mm), and 3.29 mm (1.46% of 225 mm).

CONCLUSIONS
The literature has shown that refraction through a curved nonparametric interface surface introduces non-linear distortions. Attempts to model it has largely been based on the iterative forward projection ray-tracing model. This leads to an inaccurate system and pose parameter estimations, and even divergence in cases where all parameters are estimated simultaneously. Instead, we developed here a direct forward projection representation through a refractive spherical interface that yielded a 22degree polynomial form. However, with slight relaxation on the ray propagation model, this form was reduced to 4-and 8degree forms. Our experiments have demonstrated that these relaxations have little to no effect on the quality of the pose and system parameters. Also, the reconstruction accuracy was on a sub-millimeter level. This suggests that our simplified form offers a viable calibration and modeling framework for automotive applications.