ODYN: AN ONLINE DYNAMIC NETWORK SOLVER FOR PHOTOGRAMMETRY AND LIDAR GEO-REFERENCING

: In recent years, many methods have been proposed to improve the quality of the estimated trajectories for airborne or terrestrial mapping platforms leveraging multi-sensor fusion. One of the motivating applications is, for example, the use laser scanners on small unmanned aerial vehicles, where the typically employed low-cost MEMS inertial sensors do not allow for satisfactory direct geo-referencing of the laser points. In this work we introduce ODyN , an online Dynamic Network solver that can fuses information from cameras, GNSS and inertial sensors in a single adjustment. It can be employed to estimate a high-frequency trajectory for precise direct geo-referencing, to improve photogrammetric reconstructions in challenging scenarios or to determine several types of system calibration parameters. The presented solver is hosted by the University of Geneva and is free for anybody to use. In this work we present a use case in airborne mapping where the obtained trajectory estimates are improved with respect to the recursive smoothing approaches conventionally used in direct geo-referencing.


INTRODUCTION
The precise determination of the trajectory of an instrument is an essential step in many remote sensing applications. For instance, it is required to perform direct geo-referencing of raw Light Detection and Ranging (LiDAR) measurements, where the determined position and orientation of the laser scanner are used to transform points from the laser frame to the desired mapping frame, e.g., in national coordinates (Zhang, Shen, 2013). In photogrammetry, the trajectory solution derived from Global Navigation Satellite Systems (GNSS) receivers and Inertial Measurement Units (IMUs) can be used in a bundle adjustment to increase the global accuracy of the reconstructed three-dimensional models. This is often necessary in challenging scenarios such as corridor mapping (Rehak, Skaloud, 2016). Other applications can be found in magnetic anomaly mapping (Gailler et al., 2021) and gravimetry (Skaloud et al., 2015). Many such applications require that the trajectory is estimated at high frequency, for example higher than 100 Hz, in order to resolve the fast attitude dynamics or vibrations of the carrier platform. This is particularly relevant in small Unmanned Aerial Vehicles (UAVs) (Nex et al., 2022).
Currently, the only available method that can estimate a highfrequency trajectory in absolute coordinates is the fusion of inertial and GNSS observations. This is typically done by means of recursive (Kalman) filters or smoothers (Titterton et al., 2004), nowadays an established building block of most navigation systems. However, while lightweight multi-frequency GNSS receivers are available for an affordable price, low-grade IMUs must be employed in many applications, e.g., in UAVs, because of space, weight and cost limitations. Such IMUs are characterized by time-correlated errors that have a complex structure and can not be completely recovered with only the auxiliary information provided by GNSS position/velocity fixes. This often leads to trajectory estimates of unsatisfactory In recent years, many researchers have attempted to improve the quality of trajectory estimation by leveraging multi-sensor fusion: the idea is that, since further sensors are available, beyond the IMU and the GNSS receiver, which are heterogeneous and partially redundant, a more accurate trajectory can be determined if all the available information could be somehow considered together. A comprehensive review of the literature on the topic is out of the scope of the current work. We cite approaches originating in Simultaneous Localization and Mapping (SLAM) within the robotics community, e.g., (Cioffi, Scaramuzza, 2020) and cascaded approaches in which the information fusion task is performed in multiple stages, for example by first running a Kalman filter/smoother to fuse inertial and GNSS readings, and then using a bundle adjustment to correct the trajectory based on image measurements (Hussnain et al., 2021). Other approaches also include point-cloud registration steps to fuse LiDAR measurements (Li et al., 2019).
An alternative approach is referred to as Dynamic Networks (DNs) and was first proposed in (Colomina, Blazquez, 2004). This approach is extremely general and allows the fusion of many types of sensor measurements in a single adjustment step, provided that a suitable model can be formulated. The advantages of DNs have been showcased, for example, in photogrammetry (Cucci et al., 2017a) and, more recently, in airborne laser scanning (Brun et al., 2022).
In this work we introduce ODyN, an Online Dynamic Network solver based on the ROAMFREE open-source sensor fusion library (Cucci, Matteucci, 2014). This solver is able to fuse inertial, GNSS and image observations, implementing the very same algorithm employed in (Cucci et al., 2017a). ODyN is Figure 1. The reference frames considered in ODyN. The supported measurement types, GNSS position (x), IMU specific force (f ) and angular velocity (ω) and image measurements (tp) are highlighted in red.
freely available at 1 and runs online in any web browser, thus allowing users to obtain state of the art trajectory solutions for their navigation and mapping problem without any particular setup complexity. A user-friendly graphical interface allows the import of raw measurement data and the configuration of many advanced system calibration parameters, such as complex stochastic models for the inertial sensor readings, as well as conventional system and camera intrinsic calibration parameters typical of professional photogrammetry suites. While not all the features of the underlying sensor fusion library are currently exposed in the graphical user interface, we hope that feedback from the community will help us to direct out efforts towards the most interesting ones. We present ODyN by walking the user through an example where we fuse raw measurements from a low-cost MEMS IMU and a GNSS receiver with image measurements for a challenging corridor mapping project. We will employ the raw sensor and reference data previously presented in (Vallet et al., 2020).
This work is organized as follows: a brief introduction to Dynamic Networks is given in Section 2. The concepts and configuration steps necessary to employ ODyN are presented in Section 3, the results are discussed in detail in Section 4 and 5 and conclusion are drawn in Section 6.

DYNAMIC NETWORKS
Dynamic Networks are a general sensor fusion framework built on an extension of conventional geodetic networks. They were first introduced in  and share many similarities with modern visual-inertial SLAM solutions, e.g., (Cioffi, Scaramuzza, 2020). In the following, we will summarize the main concepts behind DNs from a user perspective while we refer the reader to the original publications, or to (Cucci et al., 2017b), for a more in-depth discussion.
In DN, the objective is to determine a high-frequency trajectory solution for the body frame b by fusing measurements of several 1 ODyN: https://github.com/SMAC-Group/ODyN different sensors moving rigidly with b (see Fig. 1). This sensor fusion problem is formulated as a non-linear, weighted leastsquares adjustment in which the unknowns are samples of the body frame trajectory, 3D coordinates of points in object space (such as tie-points) and optionally system calibration parameters, such as boresights, lens distortion coefficients, IMU biases, etc. Each raw sensor observation, such as the image coordinates of a tie-point, or the specific force measurement at a certain timestamp, forms a constraint between one or more unknowns. All such constraints are formulated as: where zi is the i-th sensor measurements, f is a possibly nonlinear function defining the measurement model for zi, i Θ is the set of unknowns needed to evaluate f , ei is the residual associated to the measurement zi and ξi is a zero-mean Gaussian noise specifying the measurement uncertainty. A trivial example is given by the constraint for the GNSS position observation at time t, zx;t: where b n t is the body frame position with respect to n at time t, R n b;t is its orientation and Atn b is the GNSS lever-arm. In other words, b n t + R n b;t Atn b , the predicted position of the GNSS antenna based on the body frame position and orientation (which are unknown) should match the sensor measurement up to the measurement uncertainty. Note that Atn b can be considered either a constant or an unknown in the case where we would like to estimate the GNSS lever-arm. For the more subtle constraints for inertial sensor readings please see (Cucci, Skaloud, 2019), and (Cucci et al., 2017b) for image observation.
Minimizing the squared sum of all residual ei over all available measurements, weighted by the inverse covariance matrix of ξi, allows us to obtain the maximum-likelihood estimate for all unknowns. At the present stage, ODyN supports raw specific force and angular velocity observations as measured by an IMU (f and ω in Fig. 1), GNSS position observations (x) and tie-points (tp) found in camera images. Many other types of sensors measurements can be considered in DNs, such as 3D tie-points extracted from LiDAR point clouds, as in (Brun et al., 2022), magnetometers, barometers, etc., and they may be supported by ODyN in the future.
In general, all sensors are displaced and misaligned with respect to the body frame b for which the navigation solution is determined. For one sensor, its displacement and misalignement with respect to b are commonly referred as lever-arm and boresight, respectively. Three sensor frames are considered in ODyN: IMU, Cam, placed at the optical center of the camera, and Atn, at the L1 or L2 phase center of the GNSS antenna (see Fig. 1). Thus, three reference frame transformations need to be specified: Atn . Taking the first as an example, IMU is the rotation transforming vectors from the IMU to the b frame, the IMU boresight, and IMU b is the origin of the IMU frame expressed in b, the IMU lever-arm. In the case of Γ b Atn , only the lever-arm is important. Please note that the definition of the body frame, sometimes referred to as "user" frame, is arbitrary and is fixed implicitly by the other sensors transformations. Indeed, a common choice is to set b . = IMU or b . = Cam, depending on the application. In the latter case, we would have that Cam b = 0 and R b Cam = I. However, there are cases in which the user would like to determine the navigation solution for another sensor frame, e.g., a laser scanner, for which reason none of the previous choices has been imposed in ODyN. In order to solve a navigation or a photogrammetry problem, at least a coarse initial value for all the reference frame transformations has to be known a-priori. While lever-arms can be typically measured to sub-centimeter accuracy with simple mechanisms, this is much more complicated for boresights, and the effect of boresight errors are often underestimated (Nex et al., 2022). In DNs, the initial values can be refined by considering them as unknown in the least-squares adjustment, and possibly taking into account the uncertainty of the initial estimate, as provided by the user. However, it is well known that not all such parameters are observable in all circumstances, especially if other calibration parameters, such as the camera intrinsic calibration, are estimated simultaneously.

ODYN, AN ONLINE DYNAMIC NETWORK SOLVER
In this section we walk the user through the usage of ODyN by discussing an example. We will employ ODyN to perform tightly coupled fusion of raw IMU, GNSS and image measurements to determine the high-frequency trajectory of a helicopter while also estimating the unknown camera boresight R b Cam . We consider the airborne dataset presented in (Vallet et al., 2020). There, reference and lower-cost sensors (LiDARs, IMUs and cameras) were rigidly mounted together on a vibration dampened assembly and installed in a helicopter. During the flight, the helicopter flew profiles close to the ground to mimic a typical UAV flying altitude and at a speed similar to that of a small multi-copter (around 12 m/s) over an area featuring various terrain types including urban and rural areas, forest, croplands, roads, railroads and power lines. We focus on two successive flight lines depicted in Fig. 2 that are approximately 2 km long, for a total flight time of around 6 minutes. The joint processing of all the sensor observations provided by the following sensors is performed with ODyN: 1. the Navchip v1 (Thales) MEMS IMU, the performance of which is similar to popular commercial UAV-grade INSs such as the APX15 (Applanix), 2. the Javad Delta TRE-3 GNSS receiver (after postprocessing in PPK mode), 3. the IXAR180 (PhaseOne) 80 megapixels camera with 42 mm lens, such that the Ground Sampling Distance (GSD) is approximately 3 cm.
The required inputs and configuration settings are presented in the following, while the results are discussed in Section 4.

Input measurements
In order to perform the DN adjustment, the user has to supply the sensor measurements. In the case of simple inertial/GNSS fusion, the minimum required inputs are: 1. Raw, timestamped IMU measurements (accelerometer specific force and gyroscope angular velocity readings). The IMU measurements must have a uniform sampling rate and no missing samples are allowed.
2. Timestamped GNSS position measurements in WGS-84 ellipsoidal coordinates, either as provided by the GNSS receiver or after post-processing (e.g., PPP or PPK). Gaps or GNSS outage periods are allowed, the maximum length of which depends on the quality of the IMU. If the user would like to consider image measurements as well, as in (Cucci et al., 2017a), and in this example, further inputs are required: 1. Image measurements (tie-points), extracted from raw images with an external tool such as Agisoft Metashape, Pix4D or MicMac.
2. Image timestamps, e.g., mid exposure pulses time tagged by the GNSS receiver.
3. (Optional) coordinates of known ground points, to be used either as Ground Control Points (GCPs) or checkpoints.
The exact format of the input files is not specified here as it is subject to evolve in time as more features are supported by ODyN. We refer the reader to the documentation provided online.
We would like to stress the fact that all sensor measurements need to be timestamped to the same reference time: delays and clock jitter have the same disturbing effect of musicians in an orchestra playing out of tempo. A common solution is to leverage on the timing solution provided by the GNSS receiver, either by using its the time-tagging feature, or by distributing Pulse Per Second (PPS) signals around the payload for synchronization. Accurate payload-wise measurement timestamping can nowadays be easily achieved, even in small UAVs, thanks to the work presented in (Albrektsen, Johansen, 2018).

GNSS position
In the GNSS configuration panel, depicted in Fig. 2, the user is required to specify the GNSS lever-arm, i.e., the position of the GNSS antenna expressed in the body frame b, Atn b . ODyN is capable of refining the initial guess for the GNSS lever arm, taking into account the assumed uncertainty of the value provided by the user. The assumed standard deviation of the error in GNSS position measurements is also required. A map is produced to allow the user to assess that the input data has been loaded correctly. This map displays the GNSS trajectory and the GCPs/Checkpoints (as red and green dots, respectively) and the coarse image positions (orange dots).

Inertial Measurement Unit
In the IMU configuration panel, depicted in Fig. 3, the user is required to specify the transformation between the body frame b and the IMU frame, Γ b IMU , separated into the lever-arm and boresight parameters. Again, the solver can refine the provided initial values. In ODyN all rotations are parameterized with a Hamiltonian unit quaternion, as opposed to, e.g., Euler angles, to avoid possible sources of confusion in the rotation order.
A more subtle section of the IMU configuration allows the user to specify the noise model for the inertial sensor readings. ODyN assumes a stochastic model for the inertial measurement errors which is composed by the sum of: 1. A white noise, parameterized by its Power Spectral Density (PSD), in continuous time units.

(Optional)
A first-order Gauss-Markov process, whose parameters are the correlation time, in seconds, and the PSD of the driving noise, again in continuous time units.
3. An offset, which is constant for every measurement, typically referred as a Random Constant (RC) and useful to model, for example, the turn-on bias of the device.
Accurate knowledge of the noise properties of the device at hand is known to be fundamental in inertial navigation and it is the key to obtain optimal results with Dynamic Networks. There is vast literature on inertial sensor calibration and many methods exist to determine appropriate values, the state of the art being the Generalized Method of Wavelet Moments (Guerrier et al., 2013). All such methods work by analyzing long sequences of static inertial data, e.g., collected by placing the device on a stable and vibration dampened surface and leaving it to record for long time periods (hours). If such sequences are not available, the user can refer to the device datasheet, where at least the white noise PSD is reported. The IMU configuration panel plots the theoretical Allan variance (El-Sheimy et al., 2007) implied by the specified noise parameters, which can be compared with the one provided by the manufacturer.

Camera
The camera configuration panel, depicted in Fig. 4, allows the user to configure the typical photogrammetric parameters, including camera intrinsic and extrinsic calibration, image measurement precision and ground control points. As in the case of the IMU, the user is required to specify the transformation between the body frame b and the Cam frame, Γ b Cam , separated into the lever-arm and boresight parameters.
In this example, we do not know the boresight for the camera, except that its axes are coarsely aligned with the body frame ones. Therefore, we set R b Cam = I by specifying an identity quaternion for the camera boresight. Moreover, we flag the option that enables camera boresight estimation and we specify that the initial value is estimated to be correct up to σ = 1 degree, please see again Fig. 4.
The camera model employed by ODyN is the traditional pinhole camera model, with Brown-Conrady tangential, radial distortion, non-uniform scaling and skewing (Cledat et al., 2020, Section 2.1). The model applies to frame cameras and corresponds to the one considered by many computer vision software packages, such as OpenCV, and photogrammetric software, such as Agisoft Metashape. In particular the naming conventions and the units have been chosen to match exactly the latter for direct compatibility 2 . An interactive plot of the distortions implied by the supplied parameters is interactively produced by the interface on the right part of the panel.
In the lower section of the camera configuration panel, a table shows the supplied ground control points. We select four to be used as GCPs, leaving the others as checkpoints. This choice is reflected by the point colors in Fig. 2.

RESULTS
Once all the input and configuration have been supplied, the processing can be triggered. The solution time depends on the size of the project and the time complexity is polynomial in the number of IMU measurements and tie-points. However, thanks to the use of a highly efficient solver for least-squares problems (Kümmerle et al., 2011), computing the solution for a typical UAV flight (20 − 30 minutes) takes only few minutes; it took only 44 s to solve the presented example. The processing is done on the server side and duration is not affected by the performances of the user computer.
After the processing has been completed, several outputs are displayed by ODyN to allow the user to assess the quality of the determined trajectory. Those are presented in the following.

Estimated parameters
The values for all the calibration parameters that the user required to estimate are presented in a table, together with their uncertainty. The correlation matrix between the estimated parameters is also reported. In this example, three calibration para-meters are estimated: the accelerometer and gyroscope randomconstants, or turn-on biases, and the camera boresight, please see Fig. 5. Under the hood, the latent Gauss-Markov noise processes for the gyroscope and the accelerometer were also estimated, as they were enabled in the IMU configuration panel, see Fig. 3, and are not displayed since they have limited meaning for most of the users. From the output, it is observed that the camera boresight remains correlated with the accelerometer turn-on bias to a large extent, meaning that these two parameters can not be entirely resolved. Intuitively, a small increment in the boresight roll (or pitch) can be compensated by an appropriate increment in the turn-on bias y (or x) component. This behaviour is expected since the flight time is short (6 minutes) and the image geometry is very weak. A dedicated calibration flight, with multiple cross flight lines at different elevations and ground control points, should have been performed in order to better decorrelate the camera boresight from the other unknowns (including camera position and orientations). The reported uncertainty for the estimated boresight, which correspond to the roll (x), pitch (y) and yaw (z) components of the boresight, in radians, is in this case ≈ 0.25 • for roll and pitch and ≈ 0.75 • for yaw. Since small rotations are commutative, the specific Euler angle convention is not important.

Tie-points coordinates
The positions of the tie-points are displayed on a map along with the estimated trajectory, please see Fig. 6. The tie-points are color coded based on the estimated height. The East portion of the surveyed area is covered by vegetation, and few tie-points could be detected or matched there. In this example, the image coordinates of the tie-points (only those) were obtained with Agisoft Metashape.

GNSS residuals
The residuals associated to the GNSS position observations are displayed as a function of time, please see Fig. 7. In inertial navigation, it is customary to look at this plot since it enables the discovery of possible inaccuracies in system calibration parameters. For example, an error in the X −Y component of the GNSS lever-arm would result in patterns in the X − Y residuals, which should otherwise be zero-mean, Gaussian distributed and uncorrelated in time. In this example, small oscillations in the Z component are visible, however, the magnitude is consistent with the assumed uncertainty of GNSS observations σZ = 3 cm.

Checkpoint residuals
If any checkpoints are provided, the difference between the estimated and the given coordinates are displayed as histogram, please see Fig. 8. The interface allows the display of the checkpoint residuals in terms of mm or GSD units. For each point we consider the mean GSD of all images in which the point is visible, calculated in the neighbourhood of that specific point to account for an uneven terrain or flight height above ground.
We can observe that unbiased ground coordinates were determined, with X, Y , Z mean errors being far smaller than the GSD, despite the the weak geometry of the flight, the unknown camera boresight and the poor tie-points distribution. In this example, checkpoints were numbered increasingly from West to East. This foresight allow us to look for dome effects, at least along the direction of the flight lines, for which we found no particular evidence in Fig. 8.

EVALUATION
For the presented dataset, a navigation-grade AIRINS (iXblue) IMU provides the reference trajectory with an orientation accuracy better than 0.003 • . This allow us to quantify the quality of the trajectory estimated with ODyN. We consider the case presented in the previous section, (using four GCPs and unknown camera boresight, referred to as ODyN-1 in the following) and we also repeat the processing employing a value for R b Cam determined in a previous calibration flight. This case is named  As the standard solution for direct geo-referencing, we also consider the trajectory obtained integrating the Navchip v1 IMU with GNSS observations in a Kalman smoother (in our case Posproc, from Applanix), referred to as KS in the following (Tab. 1). The position error is omitted because it is similar for all cases and matches the expected GNSS positioning accuracy. From Tab. 1 it is observed that by fusing image observations with raw inertial readings with ODyN it is possible to obtain orientation estimates that are more accurate (mean error) and between two and three times more precise (STD) than using a Kalman smoother (and thus only considering inertial readings). This holds even if the camera boresight need to be estimated. The residual mean error in ODyN-1 corresponds to the error in boresight estimation, which is difficult in this case because of the weak flight geometry. If the value determined from a previous calibration flight is employed, the mean error drops to levels that are comparable to the reference trajectory accuracy.

CONCLUSIONS
In this work we have introduced ODyN and showcased the advantages of the Dynamic Network approach to sensor fusion over conventional direct geo-referencing methodologies. We encourage feedback from the community to motivate us to maintain and further develop the solver, for example by exposing further features that are already supported by the underlying dynamic network. Some of these features include threedimensional tie-points extracted from LiDAR point-clouds, calibration of sensor synchronization delays, and other navigation sensors such as magnetometers and barometers. We also plan to further enrich the suite of tools analyses, including for example the calculation of redundancy numbers, internal and external reliability, etc., extremely significant in rigorous uncertainty quantification of all the estimated quantities.