SYNCHRONIZATION OF PICAM CAMERAS FOR THREE-DIMENSIONAL STUDY OF DYNAMIC MULTI-DOMAINS NATURAL SCENES

In this article, we study the interest of PiCam and its possibilities offered for the realization of a light payload (small and inexpensive) in order to perform the 3D reconstruction of dynamic scenes (underwater or aerial) in close-range remote sensing. We see that on these observation scales, movements of the scenes due to flora and fauna cannot be ignored if we want these objects to be part of the final model. We review the sensors used in the literature for 3D reconstruction and then present the arguments in favor of PiCam with regard to the constraints posed by the use of light and agile vectors. The main issue is the synchronization of these low cost sensors, which is not native: we explain the different steps to obtain a satisfactory synchronization rate with regard to the dynamism of the studied scenes and present the results obtained.


INTRODUCTION
In close remote sensing, when we study natural environments, being very close makes us sentivite to the dynamism of the scene. Problems arise mainly for photogrammetric studies that performs 3D reconstructions of these dynamic scenes. Indeed, if the objects observed have moved between the acquisitions from the two stereoscopic points of view, their local geometry no longer conforms to the global geometry and then it will be impossible to find their three-dimensional position correctly. Considered as erroneous pairings, these points are therefore eliminated during the geometric filtering.
With this GSD, the observed small movements in the scene are very quickly visible: sensitivity to dynamism becomes a real problem, especially since the GSD is small. This is illustrated in Figure 1. We took images with sub-centimetric GSDs in aerial and underwater environments. They represent typical scenes of an acquisition campaign in good conditions, that is to say with good visibility and low wind or current. The sensor is static to not integrate a disparity in the observation of movements. The displacement maps are calculated on an interval up to one second for each of the examples. They show that movements are omnipresent and non-negligible: of the order of several tens of pixels, which according to the GSD represents up to 10cm.
GSDs are proportional to the viewing distance for a given sensor. In the marine environment, due to the rapid limitation of visib-Figure 1. Sensitivity to dynamism for several typical scenes of an acquisition campaign: an image of a scene (left) and its displacement map (right) calculated over an interval of 1s. The magnitude of the displacements is up to several tens of pixels.
To reconstruct dynamic objects (fauna or flora for example), we need to freeze the movement. In other words, the local relative displacement taking place between two shots should not exceed the size of the GSD (Avanthey et al., 2016). This is possible if the following two conditions are fulfilled: several sensors are required and they must be synchronized.
The use of a single sensor (temporal stereoscopic pairs) be-comes de facto insufficient, because its maximum acquisition frequency, constrained by the stereoscopic basis, is then too low. Furthermore, in the case of a multi-view reconstruction, it is then necessary to have at least three synchronized sensors so that the points are preserved in the global reconstruction.
We will present in section 2 the different sensors used in the literature to perform 3D aerial or underwater reconstructions as well as the selection criteria that led us to work with PiCams sensors. Then, in section 3 we expose the strategy which allow us to obtain a good synchronization rate and the obtained results are presented and commented in section 4. Finally, section 5 conclude on this work and discusses the envisaged perspectives.

PICAMS FOR 3D RECONSTRUCTION IN
CLOSE-RANGE REMOTE SENSING 2.1 Sensors used for aerial or underwater 3D reconstruction in close-range remote sensing The criteria for choosing a sensor is a matter of compromise.
In terms of quality, interests relate to the resolution (number of pixels per unit of length or density), sensitivity and noise, but also to the relationship between the focal length and the physical size of a pixel: the larger this ratio for a given sensor size, the more it will be possible to observe the scene from a distance for a given GSD. However, a too high focal length will result in a narrower field of vision.
In terms of control, the important functions for photogrammetric work are the ability to deactivate all automatic adjustments, the possibility of acquiring images at regular intervals with a high frequency, the ability to synchronize with other devices, the ability to be controlled by a computer and the possibility of retrieving images on the fly so that they can be processed in real time if necessary.
Finally, in terms of physical constraints, important criteria are size, weight as well as the possibility of choosing and changing the optics. As for the capture, it comes in two types: video or photography. The first mode offers a much higher acquisition frequency in return for a lower resolution and its recording format requires to extract frames to be able to be processed.
The sensors used in work relating to our subject in the literature can be mainly grouped into three categories: professional cameras, consumer high-end cameras and consumer entry-level cameras.
Controlled by a computer, these are the sensors that offer the widest spectrum of settings: acquisition parameters, durations and delays, electronic triggers, etc. The acquired data can be transmitted to a computer for real-time processing.
• Consumer DSLR (Digital Single-Lens Reflex) cameras: EOS 5D Mark II, M, 600D and 550D from Canon (Nicosevici, Garcia, 2008, O'Byrne et al., 2015, Rossi et al., 2017, Germanese et al., 2019, K5 from Pentax (Burns et al., 2015), A700 from Sony (Diamanti, Vlachaki, 2015) or D70, D200, D300, D700, D750 and D7000 from Nikon (Barazzetti et al., 2010, Bianco et al., 2011, Drap, 2012, Gintert et al., 2012, Menna et al., 2018. These sensors offer a wide choice of optics and are widely used in photogrammetry for the quality of the produced images. Intended for advanced users of the general public, they allow some adjustments, but are rarely designed to be controlled electronically. The images are saved on a memory card and they are not available for real-time processing. The intervalometer functionality is essential (shots at regular intervals without the need to press the shutter button) although it often exhibits a certain drift over time (non-constant difference between two shots) and its rate is often limited at one frame per second.
The range of compacts offers fixed optics sensors with a good quality, price and size ratio. Intended for the general public, the room for maneuver on their settings is low, especially for compact cameras, although the latest models are starting to expand their options. We find the essential function of an intervalometer, with the same problem on the regularity, but offering rates which this time go down under the second. In this category of low-cost sensors, we also have seen the arrival on the market of small high-definition camera modules, fully electronically controllable (such as the PiCam).

Choice of the camera
To choose a camera, our main criteria are its size, its weight and its price to be compatible with the use of light platforms. Indeed, in close-range remote sensing, the size of the studied areas is very small compared to those of aerial or space remote sensing. The main interest of these studies therefore does not lie in the spatial coverage but rather in the on demand acquisition capacity enabled by the operational flexibility of the vectors used. These vectors, and by extension their payload, must be as agile as possible, that is to say, compact and light. Low cost is also an important criterion as we have to duplicate the cameras in our case (2 or 3 at least).
These criteria de facto eliminate DSLR cameras as well as the vast majority of professional sensors and encourage to opt for entry-level sensors in the professional or general public categories.
Among them, the GoPros, uEye and PiCam v2 sensors fit particularly well our constraints. The PiCam v2 of Raspberry Pi which was released in 2016 (3280 × 2464px, 1.4µm pixel size, ∼30e) seems to offer a good compromise between the two other sensors tested by (Avanthey et al., 2016) for dynamic environments: the uEye camera from IDS (entry-level professional sensor, 1280 × 1024px, 5.3µm pixel size, ∼500e) and the Hero 2 camera from GoPro (a sports camera in the consumer category, 3840×2880px, 1.6µm pixel size, ∼200e, similar today to a GoPro Hero 7 Silver Edition). In terms of cost and weight, the PiCam is far below these two cameras, even by adding a board for control and data storage (Raspberry Pi Zero or 3+ for example: 10 to 40e).
In terms of image quality, the PiCam is better than the uEye and close to the GoPro which is good enough to give satisfactory results in 3D reconstruction (Bernardina et al., 2016), even if the adjustments of the acquisition parameters and especially the native post-treatments are not as advanced. As an example, the figure 2 presents images taken at the same time with Gopros and PiCams in water, the medium that poses the most problem on the quality of images. Studies carried out by (Venkataraman et al., 2013, Santise et al., 2017, Piras et al., 2017 shows that the PiCam sensor is suitable for photogrammetric work. Simultaneous acquisitions (mechanical or, better, electronic trigger signal), which lead to synchronizations at best at 500ms, are rarely sufficient with regard to the dynamism of natural areas as we can see in (Avanthey et al., 2016) with the GoPro. In terms of control, the PiCam gets closer to the uEye camera and lets hope for a synchronization time as good, or even better, than the one obtained for the latter (5ms) by (Beaudoin et al., 2015) without its software instability problems related to the driver. In practice, obtaining sufficient and stable synchronization on inexpensive light sensors is problematic because it is not a native function. We will see in the following section an original architecture solution implemented to precisely synchronize several PiCam cameras.

SYNCHRONIZATION OF ACQUISITIONS MADE BY SEVERAL PICAMS
The PiCam cannot work alone and needs to be driven by a program on a computer, a Raspberry Pi in our solution. We use as many computers as there is cameras. So the problem will be to synchronize all the acquisition programs running on these differents computers.
In a program, each task will have a variable launch delay, called jitter, which corresponds to the time elapsed between the moment when the task is placed in the execution queue and the moment when it is actually executed. Launching the same task at the same time on different computers will thus result in different delays of execution. Two conclusions are drawn from this. First, the greater the number of tasks, the more the accumulated variations in these delays will contribute to creating a high difference between two shots launched simultaneously (desynchronization). To minimize these effects, bypassing intermediate libraries to access the nearest registers of the sensor without unwanted additional processing is needed. This point is discussed in section 3.1. Second, the greater the variations in delay, the greater the desynchronization. We must therefore try to minimize them as much as possible. This other point, which directly concerns the behavior of the OS (Operating System), is addressed in section 3.2. Finally, the communication part, which allows the cameras to trigger their shots at the same time is discussed in section 3.3.

Choice of appropriate libraries to control the driver
The PiCam is connected to the GPU (Graphics Processor Unit) of the RaspberryPi, called VideoCore. So, all the software solutions need to deal with this hardware level. Amongst "off-theshelf" solutions, the most used is Raspicam that provides an API (Application Programming Interface) to use the PiCam. This library raises a major problem by its behavior: it is not, in reality, photographic captures but a recovery of frames from a video stream. The function which requests the acquisition of an image (grab) only signal to the acquisition thread the necessity to save the next available frame. Either, the frame has just been taken and it is ready for saving, or it is still being acquired and it will be saved after a short delay (see figure 3). Launching the camera flows at the same time does not guarantee a synchronization of acquisitions. This way is not adapted to our constraints.
If we go a step deeper in the software architecture, there are two main APIs that provide hardware abstractions to access and control the data of the camera in a standardized way via the GPU (see figure 4). The first one, used by RaspiCam, is MMAL (Multi-Media Abstraction Layer) which is a proprietary API created and implemented by Broadcom. The second one is OpenMax which is a semi-open API (BSD license) created by Khronos group (Industrial Consortium, which also produced the OpenGL and Vulkan APIs for 3D). Although older and less maintained, OpenMax is better documented than MMAL by its open-source nature and therefore easier to use in a manner suited to our needs. Figure 3. Process capture via the Raspicam library: grab is requested simultaneously during the capture process and is actually carried out for each sensor when it has just finished acquiring a frame, hence a certain delay between the two recovered acquisitions.
Our software solution uses a wrapper of OpenMax, called Omxcam, which allows us to finely control the shooting with a particular attention to avoid unnecessary or superfluous tasks that contribute to widening the synchronization gap.

A real-time OS to control the jitter
As seen before, precise control of the launch date and execution time (jitter) of the acquisition tasks is essential. The control of these parameters is only done by the OS, because it is the only one that has the overview of all the active processes and their priority. To facilitate maintenance and security, we choose to stay on open-source solutions of the Linux type which are supported a lot by the community. OS can be classified into two main families: classic OS and real-time OS.
The latter are distinguished from the former by the importance accorded to managing precisely the jitter and the tasks order. To respect our contrainsts, the choice of a real-time OS is therefore essential, because in classic OS this jitter can easily go up to several milliseconds per task within a program. In the real-time family, there is a distinction between hard and soft real-time OS, depending on the tolerance you choose for respecting the jitter. A hard real-time OS guarantees a launch delay in all cases limited by a constant (max jitter). If this condition is not met, the task is automatically killed because it is considered to have failed. This type of OS is widely used in critical systems in aeronautics for example. These OS include FreeRTOS or VxWorks.
A soft real-time OS does not guarantee that the launch time and therefore tasks will not be killed even if it takes a relatively long time. But is designed so that the mean jitter is comparable to the one of the "hard" real-time OS, and so remains much shorter than the average time on a classic OS.
Another pragmatic difference and not the least: a hard realtime OS is minimalist compared to a soft real-time OS which natively offers many additional features and is very close to a classic OS because it implements the POSIX API in particular (used among others by the libc, the standard library of C). In other words, developing a program on a soft real-time OS is much simpler, faster and maintainable than on a hard real-time OS. Compared to our needs, the performance offered by the soft real-time OS is sufficient and its ease of use, especially when working with images, fully justifies this choice.
Among the OS of this soft real-time family, two solutions emerge: Linux PreemptRT (standard linux kernel to which the PREEMPT-RT patch has been applied) or Xenomai. To run a program in real time, the developers of Linux PreemptRT directly modified the linux kernel in depth: as a result, all programs run in real time. The developers of Xenomai have made another strategic choice: to co-exist two Linux kernels, one real time and the other a classic one. Real time programs only run on the latter. Compared to our problem, Xenomai is not a good choice because to make our capture program be able to be executed in real time, it would be necessary to recode the driver of the camera so that it can be integrated in the real-time kernel. This is particularly complicated because the drivers are often proprietary and very little documented. This coding step is useless with Linux PreemptRT since everything runs natively in real time.
Its maximum jitter is also compatible with our use (Arthur et al., 2007, Dias et al., 2014. So we chose this OS for the final solution.

Communication between cameras to synchronize shots
Finally, the last key point of synchronization is the physical transmission of the acquisition order between the cameras. In a first attempt, we used a high / low synchronization signal (trigger). In addition to the ground and the trigger, we added a third line of control used by the cameras to tell when they finished their acquisition (sensor ready). If one failed, all the other currently acquired images are not saved.
However, with three lines per camera, we can also use another communication protocol: the I 2 C (Inter-Integrated Circuit). This protocol respects our constraints of fixed communication time between the cameras and allows a speed up to 400Kb/s. The advantage is that with this protocol, other interesting information can be transmitted bi-directionally between the cameras such as the internal parameters of each sensor (shutter speed, ISO, color balance, camera ID, etc.) for example. This is the solution that we have chosen (see figure 5).
To summarize, our solution is based on the use of PreempRT OS, Omxcam and the I 2 C protocol between cameras.

RESULTS
To assess the obtained synchronization, measurement campaigns were carried out with 3 synchronized PiCams. The first one uses a digital timer to the thousandth of a second displayed on a screen. It shows that the use of RaspiCam (see section 3.1) on a non-real-time OS (Linux) gave a synchronization delay of 50ms (reference time), which is already better than for GoPro cameras by a factor of 10. By switching to the Omxcam library, the shots of the timer do not differ: this means that the synchronization difference has fallen below 20ms. Indeed, a standard screen refreshes at 60 Hz, that is to say, every 16.7ms.
In order to be able to measure the synchronization delay more precisely, we completed the experimental bench with a chaser composed of 10 LEDs alternated in red, green and blue and framed by 2 white LEDs (see figure 6). Each RGB LED successively lights for 1 ms. The white LEDs only light up every two cycles. In this way we are able to distinguish all milliseconds over a period of 20ms. At the same time, the shutter speed of the sensor must be reduced as much as possible to limit its integration time (the entrained lack of light is not annoying because we observe bright spots). This is in order to not obtain images where all the LEDs are on (which is the case for our eyes, which have an integration time of 50ms) and where it would therefore be impossible to observe an offset in the captured sequence on the synchronized views. Thanks to this bench, we were able to measure that the synchronization difference with the Omxcam library on a non-realtime OS was 8ms on average: a 6-fold improvement over Raspi-Cam. Using the soft real time OS PreemptRT made it possible to descend strictly below the millisecond, without drift over time.
The objective of the next test campaign is to verify that the quality of the image synchronization is sufficient to deal with dynamic objects likely to be found in the study areas. If the synchronization is successful and the movement has been frozen, then the stereoscopic pair can be treated as a classic pair by the 3D reconstruction chain. Namely, all the elements of the scene can be matched correctly. On the other hand, if there remains local movements, these matched points will be contrary to the global geometry and will then be eliminated.
We will therefore rely on this matching criterion to assess the adequacy of synchronization in our tests. The multi-domain database used is composed of stereoscopic pairs containing various dynamic objects (birds, fish, anemones, coral reefs, caustics, etc.). Synchronized pairs have a time difference of about 1 millisecond whereas desynchronized pairs have a time difference of about 1 second. The cameras do not move during the acquisition series so as not to add the speed of movement of the sensor in the analyzes. We crossed the successive images taken by the sensors to form the desynchronized pairs (the nth image of sensor 1 with the nth + 1 image of sensor 2). In the case where one chooses to use the two successive images of the same sensor as a desynchronized pair, care must be taken to remove the non-overlapping portion linked to the base in order to be able to compare the results with the synchronized pairs. We also try to have images where movements are evenly distributed over the image to significantly assess its influence on the matching rate.  We will note that anemones, accompanied by clownfish (genus Amphiprion), are a complex case for matching algorithms because the textures are relatively plain and the patterns very similar. On this type of image, the results obtained are around 30% of good pairings (see figure 9).
The results on caustics, which produce very fast movements, are high (very discriminating patterns, good contrast): the number of inliers is close to 60% (see figure 10). However, as we can see in figure 11, the maching systematically fails on the light spots over the surface of the water. Their speed seems too high for our synchronisation rate.  . Example of matching result on a synchronised pair (left) and desynchronised pair (right) of anemones: we get more than three times more inliers for the synchronized pair than for the desynchronised pair. Again, most of the inliers of the desynchronised pair are on statics parts (ground). Figure 10. Example of matching result on a synchronised pair (left) and desynchronised pair (right) of caustics: we get 80% of inliers for the synchronized pair and less than 1% for the desynchronised pair.
Our results also show that if the images are out of synchronisation for even a second, we lose 40% of good pairings on moderate movements (anemones, fish, birds, etc.) as we have Figure 11. Light spots on the surface of the water (top) seem to have a higher speed than that of caustics (bottom) and cannot be matched with our synchronization delay of 1 ms. seen in the figures 7 and 9. And on fast movements, like those of caustics for example (see figure 10), we lose more than 95% of good pairings during a desynchronization of a second.
These results show that the quality of our synchronization is such that we obtain almost as many inliers on the dynamic scenes as on the static scenes.

CONCLUSION
We have shown that the PiCam is an interesting sensor with regard to the agility and dimensions constraints of the vectors for close-range remote sensing. Its image quality is close to that of the GoPro camera and the possibility of finely controlling it by computer makes it possible to obtain a sufficient synchronization rate for the study of dynamic scenes. The synchronisation delay that we have achieved through our solution that allow a precise sensor and jitter control does not exceed one millisecond, i.e. 50 times smaller than the reference time, and 500 times smaller than the GoPro or classic intervalometers. In the results, we have shown that this synchronisation rate allows a near similar matching rate than those obtained on static images of natural scenes. We have also shown that the matching rate is very good on highly dynamic scenes like those with caustics. The limit is reached for the movements of light spots on the surface of the water.
The PiCam sensor therefore presents itself as a very good alternative to GoPro for this kind of work. However, there are several points to dig. On the one hand, it must be check that the quality of the images will be sufficient to carry out dense 3D reconstructions. On the other hand, the stability of the camera in terms of blurring and contrast in relation to variations in speed and visibility must also be check. Finally, to improve the frequency of shots to allow complete coverage of an area, it will be interesting to dive deeper into the camera control APIs of the Raspberry Pi.