DIRECT ESTIMATION OF THE RELATIVE ORIENTATION IN UNDERWATER ENVIRONMENT

While ccuracy, detail, and limited time on site make photogrammetry a valuable means for underwater mapping, the establishment of reference control networks in such settings is oftentimes difficult. In that respect, the use of the coplanarity constraint becomes a valuable solution as it requires neither knowledge of object space coordinates nor setting a reference control network. Nonetheless, imaging in such domains is subjected to non-linear and depth-dependent distortions, which are caused by refractive media that alter the standard single viewpoint geometry. Accordingly, the coplanarity relation, as formulated for the in-air case does not hold in such environment and methods that have been proposed thus far for geometrical modeling of its effect require knowledge of object-space quantities. In this paper we propose a geometrically-driven approach which fulfills the coplanarity condition and thereby requires no knowledge of object space data. We also study a linear model for the establishment of this constraints. Clearly, a linear form requires neither first approximations nor iterative convergence scheme. Such an approach may prove useful not only for object space reconstruction but also as a preparatory step for application of bundle block adjustment and for outlier detection. All are key features in photogrammetric practices. Results show that no unique setup is needed for estimating the relative orientation parameters using the model and that high levels of accuracy can be achieved.


INTRODUCTION
Relative orientation enables the estimation of a minimal set of parameters that are necessary to establish coplanarity among corresponding points between two views (Mullen, 2004). Its simplified form, which requires no knowledge of object space coordinates, makes it useful as a means to obtain scene reconstruction, up to a similarity transformation, without setting a reference control network (Stewenius et al., 2006;Pollefeys and Van Gool, 1997;Hemayed, 2003;Hartley and Zisserman, 2003). These advantages become paramount in underwater environments where reference control networks are difficult to establish and where local ones become the more practical solution. Accordingly, the use of relative orientation has been applied in a diverse set of applications, including: seabed mapping, archaeological surveys, marine biology studies, vehicle navigation, or industrial equipment inspection, to name only a few (Eustice et al., 2005;Allotta et al., 2015;Ricci et al., 2015;Teixeira et al., 2016;Ludvigsen and Sørensen, 2016;Pergent et al., 2017;Herkül et al., 2017). Establishing a relative orientation in underwater environments is challenged, however, by limited visibility due to scattering and absorbance of light, and by refraction that alters light rays trajectory from their standard 'collinear' path and ultimately the coplanarity relation (Menna et al., 2017). The problem of recovering 3-D geometry in the presence of refraction has been studied extensively in recent years. The majority of approaches rely on standard in-air models, assuming that the pinhole camera model with distortion can compensate for refraction (e.g., Pizarro et al., 2003;Singh et al., 2007;Shortis, 2015). These models have proved applicable, but analysis of the actual image-point correction due to refraction shows that for * Corresponding author close-range imaging, the actual aberration is depth dependent, and three dimensional (Sedlazeck and Koch, 2012;Shortis, 2015). Modeling the effect of imaging through refractive media has shown that the standard single viewpoint perspective (SVP) model no longer applies in this setting and introduces systematic errors that are proportional to the distance between the entrance pupil and the flat port (Menna et al., 2017). To handle these effects, recent studies considered the introduction of an explicit modeling of the refraction effect. As an example, Chaudhury et al. (2015) implemented a ray-tracing based model that allows expressing the underlying refractive geometry as an extension of projective geometry. The authors assumed, however, that refraction occurs only at the camera center, allowing to geometrically estimate the relationship between the observed image point and the one obtained as if refraction did not occur. The case of a fixed interface and the moving camera was considered by Chen et al. (2011) where the authors assumed that the imaging system is integrated with an inertial measurement unit (IMU) so that the pitch and yaw angles of the camera are considered known. The authors then proposed a closed-form solution to the absolute orientation estimation, yet required the vertical displacement of the camera to be known in advance. A setup of two cameras embedded in different watertight housing was considered in Kang et al. (2012). The authors neglected the glass thickness and considered the image plane and the glass interface to be parallel. They estimated then the air medium thickness for each imaging system and the five relative pose parameters in a non-linear optimization procedure. Telem and Filin (2013) derived a coplanarity constraint, which is based on the varifocal representation. The authors demonstrated that because of refraction, the linear relation between views does not hold when imaging underwater, leading to a non-linear and inhomogeneous coplanarity constraint. Aiming to accommodate for a thick glass interface found in deep-sea high pressure housings, Jordt et al. (2016) proposed a ray-tracing refractive structure-from-motion (SfM) framework by extending the one previously proposed in  and . The authors relied on the coplanarity constraint to estimate the relative camera motion between two successive views while using a non-linear refractive bundle adjustment. The proposed refractive SfM required a good initialization, especially for the two-view pose estimation case, and relied on a computationally demanding genetic optimization. Furthermore, the model exhibited sensitivity to noise, especially when using thin glass interface or when the distance between the perspective center and the interface was small. A pose and calibration models for an underwater stereo imaging system was proposed by Zhang et al. (2018) where the authors remodeled the underwater imaging system in the view of the light filed parametrization and proposed a forward projection error function which was minimized in a non-linear optimization fashion. The literature shows that because of refraction, underwater imaging suffers from depth-dependent, non-linear distortions and that the imaging system does not follow the standard SVP model. To handle that, recent research followed ray-tracing based principles and incorporated robust optimization strategies, proposed to algorithmically reduce the measurement noise, or used hardware-driven solutions. While exerting high computational demand, they still fall short of obtaining satisfactory levels of accuracies and are often supplemented by elaborate setups, or require an a priori knowledge of part of the camera pose parameters. In this paper, we propose a geometrically driven model that accounts for the refraction effect and constitutes an alternative formulation of the relative orientation and the coplanarity constraint. We show that the pose parameters can be estimated linearly and directly, with no need for first approximations or iterations. The linearity is achieved by separating the relative pose from the underwater-related system parameters where the refraction effect is manifested. The advantages of our model are the following: firstly, the relative pose parameters can be estimated linearly; secondly, and contrary to the in-air case, the model allows to estimate the scale of the system; and thirdly, 3-D data can be extracted with no need to establish an explicit reference frame. The model is analyzed by simulated experiments and in actual underwater conditions. Evaluation is of the inner and geometric accuracy, both are analyzed by the reconstructed point-sets. Results show that a high-level of accuracy is reached while facilitating a flexible orientation strategy for modeling in underwater environments.

Varifocal model
We consider an imaging system made of a camera shielded by a flat-port housing. The image formation model is of a ray that propagates from an in-water point until it refracts at the port glass-interface and then propagates through the air until it reaches the image plane while passing through the perspective center (Fig. 1). The system is defined by the camera's focal length, f , the distance from the housing interface to the perspective center, F, and the refractive indices. Snell's law of refraction suggests that the refracted ray, the normal to the interface surface, and the incoming-and outgoing-rays, all lie on the same plane ( Fig. 1). As the refracted rays pass through the perspective center, all planes of refraction revolve about the optical axis as long as it coincides with the normal to the Figure 1: Varifocal configuration, the dashed line follows the direction of the underwater ray until intersecting the system axis, where kF is the perspective center offset, α is the angle of refraction; and β is the angle of incidence (µ1 and µ2 the refractive indices of air and water, respectively).
interface. Hence, the axiality of the system. Axiality is also a property of the varifocal model, but here the position of the perspective center is modified along the optical axis to maintain the collinearity of the incoming ray.
Axial camera modeling -Modifying the imaging system with respect to the interface (Fig. 1), F becomes the principal distance, and coordinates are given by: are the housing system coordinates for a given image plane coordinates, xi and yi. The collinearity form in this system can be written as: where Xi is the 3-D coordinate of a point in object space; ri is the i-th row of the rotation matrix R; tx, ty, and tz are the camera position in the camera reference frame; λi is a scale factor; and ki: is the principal distance adaptation to the refraction effect, where α is the angle of incidence (Fig. 1); and µ = µ1/µ2 is the refractive index ratio between the indices of air and water.
Deviation from parallelism -The derivation of Eq.
(2) assumed that the optical axis and the normal to the interface coincide. This assumption fails to hold when the image plane is not parallel to the housing interface. In such a case all the planes of refraction still pass through the perspective center. They do not coincide however with the optical axis, rather with the normal to the interface, which is common to all planes of refraction. The offset between the normal to the interface and the optical axis is an image related term which is defined by two rotation angles, one off the optical axis and another about it. Telem and Filin (2010) have shown that the rotation about the z-axis is strongly correlated with the exterior orientation angles, thus ab-sorbed, and cannot be estimated independently. Consequently, the transformation between the two systems can be approximated by the use of a single rotation about the y-axis. Hence, the modification becomes: where Ry (η) is the rotation matrix from the image plane to the interface by an angle η. Thus, the modified expression for k becomes: where x is the image point coordinate, and M3 = r3 ⊗ r3 is the outer product of the third row in Ry (η) in which the refraction effect is accounted for. As offsetting the perspective center should be performed in reference to the system in which the normal to the interface acts as the optical axis, both image and object-space related vectors should transform to this system, followed by a modification to the perspective center: The glass interface effect on the system was not considered to that point, but it was demonstrated by Telem and Filin (2010) that for a standard underwater imaging system it introduces a lateral shift which is smaller by an order of magnitude or more than the one caused due to angular refraction. This shift can be reduced to a constant value, factored by the glass thickness. Hence, it is absorbed by F .

Plücker coordinates of a 3-D line
To establish the relative orientation we define the ray's direction using Plücker representation of lines. We show that such a representation facilitates a linear estimation of the camera pose parameters. We represent the projected image rays as 3-D lines using Plücker coordinates. Of the definitions for a Plücker 3-D line we use the vector form (Förstner and Wrobel, 2016). Let A and B be homogeneous coordinates of two 3-D points defining a line. The Plücker line representation is given by: where Ai, Bi is the i th ordinate of the respective point. L can be split into two 3-vectors a and b, where, a and b are the direction and moment of the 3-space line, respectively. To define a proper line in space these two sub vectors must satisfy the constraint: Considering two 3-space lines, L and L , a coplanarity relation between them is obtained if and only if they fulfill the following constraint: where, W is a 6 × 6 permutation matrix.
Transformationsa metric transformation defined by a rotation matrix R and a translation vector t, acts on a point X by: Accordingly, Plücker line coordinates are transformed by, where, [t] × is a skew symmetric matrix (Förstner and Wrobel, 2016).

Underwater relative orientation
Considering, L and L as lines that represent the rays projected from two cameras meeting in a point in object-space, Eq. (10) can be written as: Describing a ray by a point v that lies on it and a unit vector directionx. The Plücker coordinates of two lines corresponding to image pair correspondence i are given by, In our case,xi would be the direction defined by the corrected perspective center vi = [0, 0, −kiF ] and its corresponding housing system pointxi (Eq. 1) in the first image. Setting the point on the line to be vi, Eq. (14) can be written in terms of the varifocal model as follows, As the last element of Plücker rays is zero, the 6 × 6 transformation matrix in (Eq. 13) can reduced to the following 5 × 5 matrix: yielding the constraint:L where,Li andL i are a 5-length vectors. We can also write Eq. (17) as: Eq. (17) provides a form from which the relative pose parameters can be estimated linearly. For that purpose, we use the sin- Figure 2: Histograms of the estimation accuracy using 30 corresponding points and based on 500 randomly simulated tests. In all tests, a Gaussian noise with σ = ±0.5 pixels was introduced to the image measurements. gular value decomposition (SVD; Golub and Van Loan (1983)) based solution for Er which yields satisfactory results. However, we propose a two-step approach, an alternative solution scheme to obtain better estimates. First the rotation matrix is estimated from E = [t] x R, which is computed by the first element in Eq. (18). Secondly, a non-linear refinement of this estimates is directly computed from the complete form of Eq. (18). To compute the scale, we fix the refined rotation matrix in Eq. (18) and compute the translation vector, t. As t is the only unknown and Eq. (18) is a non-homogeneous set of equation (unless R = I, meaning no motion), thus t can be computed using linear least-squares adjustment, with no scale ambiguity.

EXPERIMENTS
Evaluation of the model was performed by studying two scenarios, one in which the pose parameters were estimated and compared to ground truth, and another in which reconstruction error was evaluated. The evaluation was carried out using simulations with settings that resemble actual underwater imaging conditions and using real-world experiments. For both real and simulated experiments, the proposed model performance was characterized by the following metrics: (i) angle difference between the rotation axes of R andR (Eq. 19a); (ii) the direction difference between t andt (Eq. 19b); and (iii) the scale error between t and andt (Eq. 19c).
The parameters R, t are the ground-truth information used for the simulations, andR,t correspond to the estimated equivalents. Note that all these matrices or vectors are defined in absolute scales rather than up to a scale as is in the in-air case.

Synthetic simulations
To test the pose parameter estimation model we consider an imaging system with a standard frame camera and a flat-port housing. The settings consisted of a 3000 × 2000 pixels frame camera with a 7.5 µm pixel size; a 24 mm lens; µ glass = 1.5; µwater = 1.333; F = 50 mm; and η = 1 • . A normally distributed additive random noise with standard deviation ranging between σ = [0.1, 1] pixels was introduced to the image coordinates.
The first experiment tested the estimation accuracy for rotation, translation, and scale in the presence of σ = ±0.5 pixels of Gaussian noise. Histograms of the estimation accuracy of different metrics (Eqs. 19a,19b and 19c) are plotted in Fig. (2). The second experiment tested the quality and stability of the camera pose parameters (position and orientation), and scale estimation in the presence of Gaussian noise. Here measurement noise varied from 0.1 to 1 pixels. The accuracy estimates are discussed only for σ = ±0.5, other levels are plotted in the designated graphs in Fig. (3). The application of the proposed model for that noise level yielded an angular error of ±0.012 • . The scale estimation accuracy was 5.5e −4 . The reprojection error shows a linear trend with the increase of the noise level (Fig. 3), an indication of the stability of the model. In all experiments convergence was reached within three iterations, one for the estimation of the approximated values and two others for the estimation by the optimal form in Eq. (18).

Real world -Open sea experiment
In addition to the synthetic evaluation, another test was performed under real-world conditions in open sea. This environment provided less controlled and different settings. Here a Nikon D70s with a fixed 24 mm lens shielded by an IKELITE housing, and two 2-D rigid plates were immersed in water, both are used for testing the quality of the pose estimation. A dry calibration phase was performed while the camera was in the housing (Elnashef and Filin, 2019), then, a wet calibration was carried out using a total of five images. The a posteriori standard deviation of the adjustment wasσ0 = ±0.42 pixels. F was estimated with an accuracy of ±0.07 mm, ±1e − 6 for µ, and ±0.05" for η. The system parameters were then fixed in the next stage. We used ORB key-point detector for feature detection, then a matching between correspondence was performed to find corresponding points between the two images shown in (Fig. 4). Note that, the two images used in the relative orientation estimation were different than those used in the calibration stage.
Before approaching the non-linear estimation, approximate pose parameters were computed. To obtain them, we estimated the essential matrix, E = [t] × R using the housing system coordinates. Hence, we solved for E =x T i Ex i , which offers adequate approximation for Eq. (18). We applied the standard 5-point and the RanSac algorithms (Stewenius et al., 2006) to extract the parameters. With these values, we then optimally minimized the geometric error, by directly solving for the three rotations angles (ω, φ, κ) and two translation (tx, ty) (Eq. 18). Finally, to estimate the translation vector with scale, we first computed the optimal rotation matrixR from the estimated angles (Eq. 18). Then, fixing it in Eq. (18), we solved only for the three translation parameterst = [tx, ty, tz], using a standard least-squares solution.
Testing 3-D point measurement accuracy -We evaluated the application of these parameters when applied to the measurement of 3-D object-space points using the varifocal model proposed by Elnashef and Filin (2019). Measurements were performed on tilted test-plate (with respect to the camera) by two images. The evaluated measures included coplanarity, collinearity, parallelism and orthogonality of the measured grid lines, and measurement of the grid dimensions. All 63 corners of the 7 cm-spaced, 42×56 cm, grid (Fig. 4) were measured. The first test evaluated the coplanarity of the mapped points, and a plane that was fitted to them yielded a ±0.15 mm deviation. The collinearity test for points lying on straight-lines, and as a consequence the removal of the underwater effect that bends them, yielded a mean deviation of ±0.21 mm and a maximal deviation of 0.36 mm from collinearity. Computation of the angles between parallel and orthogonal lines shows an angular error of 47" and 27" in parallelism and orthogonality, respectively. These results are equivalent to 0.54 mm and 0.42 mm deviation, with respect to their actual length on the target (56 cm and 42 cm, respectively). Such deviations are comparable in their accuracies to the ones reached in our other experiments.

CONCLUSIONS
This paper proposed a geometrical driven and scaled-relativeorientation model for the underwater environment. Its main advantage lies in the preservation of the coplanarity constraint with no need to gain explicit knowledge of object point coordinates, while also separating the pose parameters from the refraction related ones. Therefore, our model enables the estimation of the underwater related relative orientation parameters with the addition of the scale. The experiments demonstrated how estimation of the orientation parameters yielded high levels of accuracy, robustness to first approximations, and high level of noise. The linear estimation form requires 17 point-correspondences, which may limit its applicability in images with low number of correspondence, and would increase the computational demand in the presence of high fraction of outliers. This was alleviated, however, by the direct estimation methodology proposed in the paper, which required only 5 points and allowed us to solve the pose parameters in an optimal manner by minimization of the reprojection error. We showed that by applying our proposed relative orientation model, where scale is also estimable, it was possible to obtain high estimation and also reconstruction accuracy in close-range underwater imaging with no domain knowledge given.

ACKNOWLEDGMENT
This work was supported in part by the Israel Ministry of Science, Technology and Space grant # 3-12487.