DIGITAL TWINNING IN THE OCEAN – CHALLENGES IN MULTIMODAL SENSING AND MULTISCALE FUSION BASED ON FAITHFUL VISUAL MODELS

: In engineering, machines are typically built after a careful conception and design process: All components of a system, their roles and the interaction between them is well understood, and often even digital models of the system exist before the actual hardware is built. This enables simulations and even feedback loops between the real-world system and a digital model, leading to a digital twin that allows better testing, prediction and understanding of complex effects. On the contrary, in Earth sciences, and particularly in ocean sciences, models exist only for certain aspects of the real world, of certain processes and of some interactions and dependencies between different “components” of the ocean. These individual models cover large temporal (seconds to millions of years) and spatial (millimetres to thousands of kilometres) scales, a variety of field data underpin them, and their results are represented in many different ways. A key to enabling digital twins in the oceans is fusion at different levels, in particular, fusion of data sources and modalities, fusion over different scales and fusion of differing representations. We outline these challenges and exemplify different envisioned digital twins employed in the oceans involving remote sensing, underwater photogrammetry and computer vision, focusing on optical aspects of the digital twinning process. In particular, we look at the holistic sensing scenarios of optical properties in coastal waters as well as seafloor dynamics at volcanic slopes and discuss road blockers for digital twins as well as potential solutions to increase and widen the


INTRODUCTION
Digital twins are well known in engineering (Glaessgen and Stargel, 2012) and help improving and understanding complex systems and phenomena.Also in natural sciences, realworld phenomena are often described by experts using physical models 1 that encode at least particular aspects of the world, for instance how different organisms contribute to food webs, how ocean currents behave, or how CO2 is exchanged between ocean and atmosphere (Tebaldi et al., 2021, Møttus et al., 2021).It is well-known that modelling only particular aspects is a coarse approximation of the world, but models that focus on individual components of larger systems proved to be valuable for understanding and predicting certain aspects, such as effects of fertilizers in sewage waters or flooding of tropical islands due to sea level rise (Delgado et al., 2019, Storlazzi et al., 2018).
With increasing amounts and diversity of sensory data, it becomes more and more difficult to comprehend the patterns in 1 In this document we use the expression physical model when a real world process is described by equations and parameters that are both specific, and often minimal, to the particular type of the process.The expert knowledge of that process (e.g.acceleration of a body) is already encoded in the type of equations (e.g.Newton's law of motions).In contrast, a statistical model could fit a generic polynomial (or neural network) to a large set of measurements to encode the relation between acceleration of a body, its mass and the required force.Note that in this document a physical model is also not restricted to the discipline of physics, but could also involve chemistry, economics, biology or other disciplines.
Satellite by Daan van Leeuwen is licensed under Creative Commons Attribution-NonCommercial.
Isle of Mull Beach by Jamesharmer is licensed under Creative Commons Attribution).the data and to manually model the relations.At the same time, missing data for some aspects, different sensor modalities and observation scales as well as incompatibilities of existing models impair larger twins of the oceans.For instance, satellite-based observations of the water colour can serve as a proxy for primary production (Doernhoefer et al., 2018) or water quality, but are limited to the upper few metres of the oceans, since satellite sensors cannot see deep into the ocean.It is difficult to extend these models to deeper water, since underwater imaging with cameras and artificial light sources from submerged platforms uses entirely different observational tools (Museler, 2003).Also the model representations of the water column and resulting resolutions are different.This makes it difficult to combine observations from the two worlds, and thus to combine different digital models, even for identical physical properties.
Another example refers to the shape of the ocean floor.Digital elevation models of the seafloor are typically obtained either by extremely coarse satellite altimetry, by acoustic techniques with moderate resolution, or until recently with high-resolution optical observations.Fusing coarse scale acoustical models with local optical observations could be a fundamental step towards a digital model of seafloor dynamics.The different measurement techniques and resolutions, however, pose major challenges that currently impair fine-scale seafloor deformation tracking.
For both examples, fusing the visual observations would enable local digital models of a marine environment.This is just one exemplary aspect, and of course many other aspects (chemistry, economy, biology, . . . ) have to be added to maximise the use, interconnectivity, and lastly the impact of such models.In case such fused models existed, they could be run with different start and boundary conditions, to yield model predictions, as is currently done in climate modelling.Answering such what-if questions provides the added value of an actual digital twin (Jones et al., 2020).Fusion is therefore at the heart of digital twinning in the ocean.Since seeing is one of human's most important senses, digital twins that include visual information are intuitively accessible and explorable by various human stakeholders.Hence, we suggest that fusion of visual information is also of key importance in digital twinning in the ocean.

RELATED WORK
While digital twins are well accepted and in wide use in engineering (Glaessgen and Stargel, 2012, Kritzinger et al., 2018, Tao et al., 2019), the digital twin concept is an emerging technology in natural sciences, and in particular in ocean science.The European Space Agency (ESA) is working towards a digital twin of Earth that uses remote sensing data2 .This twin excludes the ocean to a large extent because of the limited applicability of space-based techniques.In or related to ocean sciences, several larger data viewers are being explored such as in Digital Earth3 , Earth System Modeling4 or the future Marispace-X5 .These can be thought of as predecessors of full digital twins.The Future of Seas and Oceans Initiative of the G7 is targeting a digital twin capability6 .Also the Horizon Europe Missions aim at models for the European Digital Twin Ocean7 .
Copernicus Marine8 envisions to integrate data, models and physical ocean observations with digital technologies in the project The Digital Twin of the Ocean.To the best of our knowledge, however, currently no digital twins of the ocean are in productive operation to support ocean science.A key issue is fusion of the different existing data, representations and scales.
In the next section, we outline these challenges.

CHALLENGES IN OCEAN OBSERVATION, MODELS AND METHODS
While various models, simulations and observations already exist for several aspects of the ocean, bringing them together is a key challenge.In this section we discuss issues for fusion from a photogrammetric point of view, i.e. related to application scenarios involving optical water properties and seafloor deformation monitoring.Therefore, the following list should be understood as exemplary, not exhaustive.Various satellite systems exist in space, and they provide data products in well-understood formats (Kresse, 2008, Niro et al., 2021).As compared to satellite remote sensing, visual underwater measurements in the ocean are much less standardized.

Sensors
A huge variety of commercial or custom-built, professional or improvised camera systems involving different refraction properties (see Fig. 2), sensors and lights is operated by government agencies, research institutes, companies, private persons and other stakeholders.In shallow water areas, i.e. within the penetration depth of the sunlight, optical satellite data can be used to monitor the coastal ocean floor (see Fig. 3).This is, however, restricted to the top few metres, depending on the water properties.To capture the deeper ocean floor or the water column above it, cameras have to be brought very close, e.g. by submerged platforms such as autonomous underwater vehicles (see Fig. 4 for an example picture).Cameras can only be used at small distances and the situation is even more complicated when artificial light sources are required, e.g. in murky waters, in deeper areas where no sunlight penetrates, or when data has to be acquired at night.In this case, scattering effects limit visibility typically to few metres only (Köser and Frese, 2020).
In the aforementioned variety of underwater camera systems, a key challenge is that radiometric and geometric calibration of cameras is not standardized, and often not even performed at all.This can result in biased observations due to refraction (She et al., 2022) of the sensing or lighting (Song et al., 2021) system, depending on the instrument used.Since there is no common standard, the myriad of different systems and platforms make comparisons between different underwater images, and Figure 3. Sentinel-2 real colour composite of an area north of Kiel (Germany) and derived benthos mapping according to a method published by (Kuhwald et al., 2021) Figure 4. AUV photo taken in several metres' water depth in turbid water in the same area as Fig. 3 (please also cf.these data products and the corresponding scenario in Fig. 1) also comparisons or inter-calibrations between satellite and underwater images, very difficult.
On the other hand, acoustic data, e.g. from multi-beam echo sounders, can be captured from high altitudes above the seafloor, as sound can penetrate deep into the ocean.Ideally, the echo sounder ensonifies a corridor using beamforming, and records the backscattered signal using different hydro-phones.Signal runtimes in different hydrophones are exploited to reason about distance and direction of objects.However, due to refraction at water layers, multi-path propagation, and limits in instrument size and design, vertical resolutions are practically often limited to 0.5%-1% of the instrument altitude.
The horizontal resolution also depends on in instrument altitude through the acoustic footprint of the beam on the seafloor (typically 0.5-1.0• ).As opposed to cameras, acoustic sensors also strictly depend on external localization such as GNSS and INS.GNSS do not reach into the water, which requires acoustic relays of the position, introducing extra uncertainties.Multibeam echo sounders also suffer from side-lobe effects due to beamforming and sonar images can contain other artefacts from acoustic or electronic noise.Backscatter calibration is therefore difficult but needed, in particular when comparing surveys to detect changes.Geometric calibration requires precise knowledge of the speed of sound for all water layers below the sensor, which can be obtained by capturing sound velocity profiles.
Seismic imaging is a technique for looking into the seafloor (see e.g.(Gross et al., 2016).Somewhat similar to acoustic imaging, seismic waves are created e.g. by airguns at sea, and the returning signals are recorded using a multitude of hydrophones at different locations, in order to infer the subsurface structure of the seafloor using complex models that depend on the survey application.Also here, external localization is required.The native representation for sensed raw data between the different sensor types can vary between cartesian frames, polar representations or Fourier space.Multibeam echo sounders provide the backscatter strength of the seafloor, which, depending on the seafloor material, is direction dependent.Hard ground reflects the signal better than softer sediments.This "acoustic colour" however does not relate directly to the optical colour.

Geometry
When reconstructing the seafloor geometry from several camera images using techniques from photogrammetry or Structure-from-Motion, the resulting 3D model of the seafloor is an indirect, high-level data product, but not a raw measurement, and obtaining the model has to overcome several challenges (Köser and Frese, 2020).Absolute or relative uncertainty for these indirectly obtained data sets are difficult to specify or obtain.On top, these models can be biased in case refraction (Jordt et al., 2016, She et al., 2022) is not handled properly and can contain artefacts from reflections, moving fauna, smoke (Shivaswamy et al., 2021) or dynamic illumination.Since multibeam echo sounder data comes in swaths of polar coordinates, it is typically resampled and gridded in order to form 2D images such as seafloor maps.These are again indirect high level products where inaccuracies from navigation and other errors are already baked in, which makes uncertainty reasoning much more difficult as compared to raw sensor readings.Additionally, such resampled (gridded) seafloor models can contain artefacts from bubbles (Urban et al., 2017) or fish, or from noise.Fusing geometric products from acoustic sensors and cameras therefore requires robust methods.

Scales
Satellite resolutions depend on the sensor and range between hundreds of metres (e.g.Sentinel-3) to tens of metres (e.g.Sentinel-2), metres (e.g.PlanetScope) or even sub-metre (e.g.WorldView).The resolution of airborne data depends on the flight height and velocity, and usually varies between tens of metres and decimetres.Acoustic data is typically captured from the sea surface by a vessel, or taken by an underwater robot, e.g. in 100m altitude above the seafloor.Resolutions of such acoustic data range from tens of metres to decimetres.Available image data from underwater platforms has a one or two order of magnitude higher resolution when photographing at moderate altitudes such as 1m-5m, but could be easily improved further.For instance, it has been shown that e.g. the radius of gas bubbles can be estimated underwater with 1% accuracy (She et al., 2021) and detailed maps and 3D models of the ocean floor can be obtained from robotic platforms (Jordt et al., 2016).
The area surveyed decreases with the image footprint size, and so only small areas can be mapped with high resolution.Registering small detailed visual reconstructions of a few square metres with larger scale acoustic seafloor models with metre uncertainty is difficult (see e.g.(Gausepohl et al., 2020)), in particular in case the 3D data stem from different modalities (acoustic backscatter vs. visual colour).For deeper waters, where no divers can go, the lack of GNSS makes absolute localization uncertain.Local photogrammetric reconstructions (see Fig. 6 for an example) can therefore have a relative uncertainty of centimetres, but their absolute UTM coordinates depend on the localization capabilities of the underwater robot (typically acoustic, again with extra uncertainties) and can be tens of metres off.The same holds for acoustic maps from submerged robotic platforms.Localization of sonars on surface vessels is typically better, but for deeper waters, the sonar is also further away from the seafloor, leading again to a coarser model.

Representations
Visual data from satellites looking into the ocean is routinely analysed with radiative transfer schemes and corrected for atmospheric effects, sun-glint and surface roughness with models based on Cox andMunk (Cox andMunk, 1954, Jin et al., 2006).Underwater imaging employs the Jaffe-McGlamery models, which under very strong assumptions simplify to the fog model (clear shallow water in daylight without waves).Colour restoration using differentiable raytracing (Nimier-David et al., 2019, Jakob, 2019) requires knowledge of the attenuation and scattering properties of an infinitesimally small water volume (Nakath et al., 2021).Finally, water properties and effects can also be trained into neural networks (Jiang et al., 2018, Pahlevan et al., 2021), but all these different representations make it difficult to bring all aspects together.(Nakath et al., 2022) We eventually target two applications for digital twins.The first digital twin includes "seafloor dynamics" at volcanic slopes.Another envisioned twin relates to an "optical digital twin of coastal waters" and should include all data and models that relate to water quality and environmental status in a coastal region.In the following section, however, we start out with the description of a generic digital twin of a sensor and its environment, serving as a prerequisite.

Optical Digital Twins as Virtual Instrument Testbeds
The goal of this joint digital twin of the environment as well as the sensing hardware (cf.Fig. 7) is to simulate measurement setups, survey strategies and mission behaviour, e.g. for planning field campaigns, where ship time is extremely expensive is depicted.Figure adopted from (Nakath et al., 2022) and possibilities for tuning and modification of hardware and software is very limited.A thorough understanding of the employed sensing hardware is a prerequisite for appropriate data acquisition.If field data is not as expected, one has to be able to rule out sources of error or noise, in order to reliably interpret it.Furthermore, most sensing systems have to be calibrated and their parameters have to be tuned to the particular expected environmental conditions.Those tasks can either be carried out involving simulations and test runs of increasing complexity to rule out certain problems in simple stages or based on available ground truth data.If the sensor is be deployed without these preparations, all problems must be solved in a highly entangled ex-post-approach.However, the latter solution is often too tedious, expensive or even plainly impossible to apply.Hence, the development of remote sensing instruments typically incorporates different phases ranging from simulation, hardware in the loop approaches to actual deployments (Benninghoff et al., 2014).This approach is both beneficial for sensors to be employed by satellites, but as well for underwater sensing systems (Nakath et al., 2022, She et al., 2022).
To introduce a bi-directional intermediate step between pure numerical simulations and the real world (cf.Fig. 8), we seek to provide a fully operational digital twin, jointly representing sensors and an environment.First steps in such a direction have already been made in the underwater domain.In (Nakath et al., 2022) a geometrically verified digital twin of a hardware in the loop setup is presented, which can be used to develop underwater vision algorithms as well as to tune and finally test them (see Fig. 8).Furthermore, a physically proper and comprehensive description of camera-light systems allows for the optimization of the light poses (Song et al., 2021), thus preparing the setup for certain environmental conditions.However, it is also possible to calibrate the light source as well as the environment (here, the water body) based on parameter extraction from real imagery (Nakath et al., 2021)(see Fig. 10).
In the future, we strive to devise an integrated digital twin, comprising a water body as well as sensors and lights for in-situ and airborne measurements.It shall (i) synthesize measurements in a physically proper fashion, and be (ii) fully, i.e., end-to-end, differentiable.The former requirement ensures computations with physically interpretable quantities and an increased level of realism (as opposed to pure numerical simulations).While the latter allows for the computations of gradients, which in turn entails the opportunity for optimization (i.e., parameter identi-

Digital Twin Seafloor Dynamics
Volcanoes are fast growing geological structures with oftentimes mechanically unstable edifices.Collapses of unstable flanks can form destructive landslides.On ocean island volcanoes, where flanks are covered by water, these events pose an even greater threat as they can trigger tsunamis as in the case of the collapse of Anak Krakatau (Indonesia) in 2018.Monitoring and modelling ground deformation helps to assess the stability of volcanic slopes, but is inherently difficult at the seafloor.
Seafloor geodesy can provide information on deformation but is usually limited to a few discrete points.Repeated multi-beam echosounder surveys provide larger spatial coverage, but resolution and localization impose limits.Detection of change in optical data is a potential bridge between these two approaches.
The goal of this twin is to understand and predict seafloor dynamics at underwater volcanic slopes and associated landslides (Urlaub et al., 2015).Models and data sources that contribute to this will be integrated into a future digital twin.A key aspect is bringing together remote sensing data of different modalities such as seismic scans, multi-beam echo sounder data as well as visual data with their different modalities, uncertainties and scales (see Fig. 9 for an example).Seafloor monitoring in discrete locations will provide additional constraints.Jointly with other aspects (earthquakes, currents etc.) the resulting digital twin is expected to allow for new insights into the dynamic system of a changing seafloor, its controlling factors, and its impacts on the stability of submarine slopes as well as on other ocean systems, such as the quality of coastal waters.

Optical Digital Twin of Coastal Waters
Marine ecosystems are subject to long-term transformations conditioned by natural and anthropogenic influences (Halpern et al., 2008).As a large share of the world population is living close to coastal areas or rivers which drain into near-shore waters, these regions are especially exposed to man-made environmental changes.Hence, their continuous surveillance is a crucial task to enable timely reactions on changing ecosystem conditions.In comparison to deep-sea areas, their accessibility facilitates detailed analyses with a wide coverage, especially in the benthic zone.Therefore, its monitoring delivers insights into these vulnerable parts of the ocean and indicates trends in general ocean health.
Coastal marine areas are facing several harming factors like pollution, acidification and deoxygenation driven by increased inflow of nutrients (such as nitrogen and phosphorus) and organic matter (Lotze et al., 2006).Those factors have significant impact on water clarity and colour, phytoplankton productivity and algal bloom.As many of the monitored environmental health parameters have direct or indirect visual impact on coastal areas, their visualization is a valuable tool to emphasize ecosystem changes and to give decision-makers insights into potential challenges, threats and opportunities.Therefore, we are aiming at the development of an optical digital twin of coastal waters to visualize its ecosystem in an easily interpretable way.
For this purpose, we want to investigate visual properties of coastal water to establish an optical digital twin in the spectral domain based on multi-scale imagery.By combining inverse rendering and machine learning techniques with physical constraints, we aim at quantifying environmental parameters to identify the state of the underlying ecosystem.Forward rendering enables analyses in the opposite direction by generating imagery for the simulation and exploration of what-if scenarios to highlight the impact of parameter changes.
We seek to conduct measurements on multiple spatial scales and combine satellite-based, airborne sensing imagery and insitu imagery from camera-mounted AUVs.While the latter provides very detailed but sparsely covered visual information, the former is a low-cost and time effective solution to cover wide areas and get an outline of the composition of euphotic zones.By aligning and registering this optical information spatially and temporally in a digital twin, the advantages of both scales may be coalescent in a mutually beneficial model.This would enable cross verification of local measurements against measurements in satellite footprints and vice versa over multiple scales differing in orders of magnitude.Exemplary related multi-scale coastal images are shown in Fig. 3 and Fig. 4. On the one hand, their registration and evaluation in a digital twin could lead to a more robust and accurate combined approach.
On the other hand, the digital twin could apply its joint insights in bilaterally covered areas to extrapolate model assumptions to areas that only have been covered remotely.

CONCLUSION
Digital twins are considered promising and useful tools for ocean sciences, however current approaches are either in the development stage or only cover certain aspects.In this paper, we have discussed what we believe are key issues that need to be resolved in order to facilitate digital twins of interest that relate to visual aspects: In order to establish more comprehensive versions, fusion has to happen not only across different modalities, but also on vastly differing scales.In addition, focus should be put into multi-scale and multi-modal fusion, but also into bringing together different representations such as physical models and generic models We pointed out promising directions to be taken to tackle the issues and additionally introduced three ideas for digital twinning in the ocean, we will pursue in the future.

Figure 1 .
Figure 1.Multi scale digital twinning of coastal waters across different sensing modalities, comprising airborne and underwater sensing systems as well as a comprehensive representation of the environment.

Figure 7 .
Figure 7. Joint physically faithful simulation of UP: scenario, and DOWN: camera view, based on different parametrizations g of the Henyey Greenstein (Henyey and Greenstein, 1941) scattering function.Figure reprinted from(Nakath et al., 2022)

Figure 8 .
Figure 8.To enable a physically proper simulation, a proper verification of the twin is desirable.Here, a geometric verification (refraction effects) of a digital twin (MIDDLE) against a numerical simulation (LEFT), and real data (RIGHT)is depicted.Figure adopted from(Nakath et al., 2022)

Figure 9 .
Figure 9. Resolution difference of visual and acoustic data captured at the slope of Mt.Etna.Left: overview map showing part of the coastline at Mt. Etna.Centre: Magnification of an underwater area with higher resolution bathymetry obtained from acoustics.Right: underwater seafloor photo.