MONITORING MOUNTAIN CRYOSPHERE DYNAMICS BY TIME LAPSE STEREO PHOTOGRAMMETRY

In this paper, we propose a method to monitor surface dynamics in mountain cryosphere environments, based on a device with two fixed cameras at two different locations and with a convergent angle. Computer vision methods are used to derive pixel displacements between images taken at different times and to reconstruct the 3D geometry of the scene to convert pixel displacements into meter displacements. The proposed methods overcome the drawbacks of traditional methods such as lack of time repeatability or lack of spatial resolution. We present the results on two study sites instrumented with the device located in the French Alps: a temperate glacier and a rock glacier. We propose different ways to assess the accuracy of each processing steps solely based on stereo images. The method is validated using traditional measurements (GPS and LiDAR) and shows results comparable or even more robust than these traditional methods.


INTRODUCTION
The methods to monitor mountain cryosphere, such as glaciers and rock glaciers dynamics rely on two classical approaches: repeated in-situ field work to acquire data and permanent monitoring. One way to collect displacement fields at high spatial resolution is to use an expensive laser scanner (LiDAR) or drone in order to construct 3D models by stereo photogrammetry. Nevertheless these methods lack of time repeatability (Bodin et al., 2018). On the other hand, to achieve high temporal resolution, one can instrument the site with a continuous GPS to capture seasonal changes but at only few positions in space (Kenner et al., 2017). To combine both high spatial and temporal resolution, we propose a time lapse stereo photogrammetry device, made up of two single lens reflex (36Mpx) at two different locations capturing several images per day.
This device has been installed on two study sites : the Argentiere glacier (Mont Blanc massif, France) and the Laurichard rock glacier (Ecrins massif, France). The two sites present very different type of dynamics: during summer the velocity of the Argentiere glacier is around 10-20 cm/day on the study area (Vincent, Moreau, 2016), whereas the annual velocity of the Laurichard rock glacier is around 1-1.5 m/year (Bodin et al., 2018).

RELATED WORK
In Geosciences, fixed cameras are commonly used to monitor slope movements. For instance, film cameras were used to monitor the displacements of glaciers in Alaska: displacements of several meters were detected at a distance of 5 km using a focal length of 105 mm with an error around one meter (Krimmel,

Single camera monitoring
The work proposed in (Travelletti et al., 2012) involves the cross-use of a fixed camera and a digital terrain model acquired using an airborne laser scanner to monitor a landslide. The displacement between two images is calculated by cross correlation and the digital terrain model allows the conversion of this pixel displacement into a metric displacement. This work highlights the problems caused by the slight movements of the cameras that induce displacement in the entire image. The extreme environment in which the cameras are installed can cause thermal expansion of the mounting system to which the camera is attached, or even of the camera lenses, which results in a displacement in the images that may be greater than the displacement we want to observe. The other limitation is the use of a single digital terrain model to project pixel displacements, which does not take into account changes in topography over time. To overcome the limitation due to small camera motions, the authors of (Roncella et al., 2014) hypothesize that these movements are solely due to a rotation of the camera and possible changes in its intrinsic parameters. Thus it is possible to correct the effects of these movements by applying a homography to the image. This homography is estimated using the displacements obtained over supposedly fixed areas.

Multi cameras monitoring
To overcome the limitation of the use of a single DEM, the authors of (Roncella et al., 2014) propose to use a second camera to obtain a 3D model of the scene at each image acquisition by photogrammetry methods. However, in their work the displacements are not calculated in the images, but by difference of 3D models. This method suffers from a lack of spatial resolution, since the 3D models produced only have 700 points against the original 21 megapixel images. Thus they measure noisy displacements with a difference of several centimeters compared to displacements measured with a ground based SAR.

Contribution
In this paper, we introduce, a fully automated pipeline that allows the processing of time-lapse pairs of images taken by two fixed cameras. We propose a flexible calibration procedure of the cameras which reduce the number of ground control points needed. For computing pixel displacements in the images, a combination of key-points matching and optical flow is used in order to be robust to large displacements. The methods are tested on two different study sites with different dynamics, terrain characteristics and image features. Special attention is given to the evaluation of the accuracy and dispersion of the results after each processing steps and on the quality of the final displacement vector fields. These analysis are based on the stereo pairs themselves as well as on measurements from other sensors (LiDAR and GPS).

Calibration
The first problem is the calibration procedure: to achieve 3D reconstruction from stereo images, ground control points are often used (Roncella et al., 2014), but the nature of the study sites can make it difficult to measure or to put enough recognizable ground control point in the camera field of view. We propose a flexible method based on image point correspondences across left and right views which can take advantage of ground control points if available. We estimate the relative orientation of the two cameras by extracting correspondences between left and right views. The correspondences come from Oriented FAST and Rotated BRIEF (ORB) keypoints matching (Rublee et al., 2011) with a filtering strategy based on grid motion statistics (GMS) (Bian et al., 2017). If we make the assumption that the two cameras have known intrinsic parameters, we can use these correspondences to estimate an essential matrix. The intrinsic parameters are not estimated but come from the manufacturer's specifications of the camera and the lens. The distortion is assumed to be negligible as the focal length used in the field are in a range of 50 -85 mm. Experiments with several images from different viewpoints have shown that the camera distortion calculated using bundle adjustment is less than one pixel for 80% of the image. The essential matrix is decomposed to get the relative orientation of the cameras and the distance between cameras is defined up to a scale factor, thus with this procedure we are able to compute a 3D reconstruction only up to a constant scale factor (Hartley, Zisserman, 2004). To remove scale ambiguity, in the minimal case where no ground control points are available, we can directly measure the baseline by GPS with a precision of few centimeters (this precision is a combination of the precision of the GPS measurements and the fact that the true position of the optical center of the cameras is unknown). If we consider two cameras with a focal length f , a baseline b and the disparity d (the distance between two corresponding pixels in two views), the depth z is given by z = bf d . We can write the depth error δz in terms of the error on the baseline measurement In practice the term b+δ b b is close to 1: the baseline can be a few hundred meters when the error on the baseline is only a few centimeters when it is measured with a GPS. The depth error due to error in baseline grows linearly with the depth, whereas the depth error due to error on the disparity is quadratic regarding the depth (Chang et al., 1994). In the optimal case where ground control points are available, the scaling can be determined from at least two points. If three or more points are available, it is possible to estimate a similarity transform to geo-reference the 3D models and the displacement vector fields. The similarity transform has 7 degrees of freedom: three rotations, a 3D translation and the scale. If there are more than three ground control points, the similarity transform is estimated by a least squares approach. This georeferencing allows the use of an external point cloud as a first guess for the 3D reconstruction step. The calibration pipeline is summarized in Figure 1.

Registration
One of the main problematic when fixed cameras are used in an outdoor setting is the registration of the images. Even if the camera is fixed to static landscape features (rock cliff or fixed tripod), the images are not still across time due to thermal dilation of the mounting system or even to the lenses in the camera itself (Roncella et al., 2014). The first step is to estimate this shift, by computing image motion over known fixed areas with a method based on ORB features matching and a filtering procedure based on GMS. Then these shifts are used to estimate an homography matrix which is the correct mathematical model if we assume that the image motion is only due to camera rotation and the translation is negligible (Hartley, Zisserman, 2004). This homography (matrix H 3x3) is estimated such that for every pixel on fixed area of coordinates (x, y) undergoing a displacement (∆u c , ∆v c ) due to the camera movement between images taken at t0 and tn we obtain: (2) The displacement vector field being noisy, a RANSAC procedure is used to filter out aberrant displacements. The model used to calculate the RANSAC is based on an estimate using four correspondence points, using the following re-projection error to determine outliers: Since the nature of the scene remains mostly unchanged across time on fixed areas, we chose to register all the images on a reference pair. By doing this, we ensure that all the images share the same calibration, hence the stereo calibration can be done on only one stereo pair of images.

Depth map computation
When the calibration and the registration is done we are able to produce 3D models of the scene by stereo-photogrammetry for every pair of images. To achieve that we use the OpenMVS library which provides methods to perform point cloud generation and mesh from point cloud (Cernea, 2015). To compute 3D point cloud from images, the library implements a version of a patch match based approach (Bleyer et al., 2011). We have chosen this method because it is freely available in the Open-MVS library and the method performs well on the ETH3D high resolution two views stereo data set (Schops et al., 2017). The performance of these methods is improved when the calculation of the dense 3D reconstruction is initialized by a sparse 3D point cloud. In order to generate this scattered point cloud, we can use the correspondences between left and right views used to compute the calibration (section 3.1), in order to triangulate them to obtain a sparse 3D point cloud. In the case where a digital elevation model (DEM) or 3D reconstruction is available (from a LiDAR survey or by terrestrial or drone photogrammetry) and the calibration is in the optimal case where ground control points have allowed georeferencing, it is possible to use these reconstructions as initialization. Then the depth map, which stores the Z components of each pixel in a coordinate system attached to the camera, is obtained by ray tracing between the 3D reconstruction and the camera. The depth map will be used to convert the pixel position of coordinates (u, v) into 3D position of coordinates (X, Y, Z), knowing the depth d with the following equation: With R the rotation matrix given by the essential matrix in the calibration procedure, A the similarity transformation estimated from the ground control points, and K the intrinsic camera matrix. The steps to compute the registration and the depth map calculation are summarized in Figure 2. Depth map pipeline: first the images are registered on the stereo pair on which the calibration has been done. The registration is done by extracting corresponding pixels in fixed areas and use them to estimate an homography. Then structure from motion methods are used to produce 3D models and mesh. The depth map is obtained by ray tracing between the camera and the mesh.

Displacements extraction
To produce 3D displacement vector fields we propose to compute image displacements in one view and then project the start pixels with the depth map computed with the start stereo pair and the end pixels are projected with the depth map corresponding to the end stereo pair. The projection is calculated with Equation 4. Then, the similarity transform calculated during the calibration step is applied to the displacements to get scaled and georeferenced displacements. This strategy leads to a conversion between 2D displacements (pixels) to 3D displacements (meters). Since glaciers are highly textured and do not exhibit strong discontinuities in their movement, a simple calculation of sparse optical flow such as (Lucas, Kanade, 1981), is used to calculate image displacements. Then the displacements are filtered with the assumption that locally the displacements should have the same direction and amplitude. Working with high-resolution images, the choice was made for a sparse optical flow method to save computing time. For the same reason, the optical flow is only calculated in one view and the second view is not used to detect possible errors in flow. An analysis of displacement calculated on fixed areas can fulfill this role. The procedure to calculate the displacement vector field is synthesized in Figure 3.

Laurichard rock glacier
Two single lens reflex (Nikon D800) have been installed at the Laurichard rock glacier in July 2016. This 500m long and between 80m and 200m wide rock glacier is located around 45 deg N between 2400m and 2700m. It is facing north and lies at the bottom of a high rock face. This is one of the most studied rock glacier in the Alps and its interannual displacement has been measured with total station and DGPS (differential GPS) since 1986 (Francou, Reynaud, 1992, Bodin et al., 2009). The cameras are situated 500m away from the glacier, with a baseline of 200m and a 85mm lens is mounted on the cameras. The cameras are recording 7 stereo images per day, and, at the most, one image is selected manually such that there is no cast shadows in the images, which leads to changes in image gradient and can disrupt the registration process and the optical flow algorithm. We are able to process images taken from early July (when the glacier is snow free) to the end of October, before the first snowfall. To calibrate the cameras, five 35cm diameter spheres, easily recognizable in the images, are grounded on blocks around and on the glacier and have been measured by DGPS. To add more ground control points, six temporary targets have been installed in the cameras field of view and have been measured by DGPS. The targets are only visible in the reference images, and have been removed after the calibration procedure.

Argentiere glacier
Argentiere glacier is situated in the Mont Blanc massif (France) and is in a range between 3400m and 1600m with a total extend of around 10km. It is also one of the most studied temperate glacier, with measurements starting in the beginning of the 20th century, and annual mass balance has been carried since 1975 (Vincent et al., 2009). The glacier dynamics has also been studied by combining different methods such as remote sensing, photogrammetry and basal "cavitometer " (Benoit et al., 2015). Except the "cavitometer" which, continuously measures basal velocity at one location, all the methods are not suitable for measuring daily velocity variations. That is why, we installed in 2017 a time lapse stereoscopic device looking at the part of the glacier situated at 2400m. The device consists of two Nikon D810 single lens reflex, with a 50mm lens, situated at about 1km of the glacier with a baseline of 155m. We process images only when the glacier is free of snow and covered by debris or show opened crevasses that leads to texture in the images. The camera calibration is done with 4 temporary targets placed on Left: before registration. Right: after registration. After the registration the displacement over fixed areas is corrected and close to zero. The red rectangles show the areas considered for the numerical analysis of displacement over fixed areas (see Figure 5).
the right bank of the glacier and 3 permanent painted disk on rocks on the left bank.

Registration and pixel displacement accuracy
The first source of uncertainty of our methods comes from the calculation of displacement in the images and their registration. Indeed, without registration, the displacement calculated using the method described in section 3.4 is distorted, as shown for example in Figure 4 where we can see a displacement field in pixels before and after registration on the Laurichard rock glacier. The areas on the periphery of the glacier move before registration, whereas they are supposed to be fixed. After registration the displacements outside the glacier is close to zero, and the direction of displacement on the glacier is closer to the direction of greater slope.
In order to quantitatively evaluate the registration step and the displacement calculation, we have used 114 images taken between July and October 2018 of the Laurichard rock glacier by one of the two cameras. From these images we have calculated the displacement after registration on two fixed zones with the reference image. The selected areas are presented by red rectangles in Figure 4, on which we took the average value of the displacements calculated by the method described in Section 3.4. Then we have calculated the histograms of the displacement for each dimensions, as well as the mean and standard deviation in order to compare the histograms to a probability density function of a normal distribution. These histograms are presented in Figure 5. For both dimensions, we obtain a mean close to 0 pixel and a standard deviation of 0.32 pixels along the x axis and 0.52 along the y axis.The greater standard deviation along y axis is due to large registration errors in this direction for few images. A same analysis has been conducted on 132 images of the Argentiere glacier. The results are also reported in Figure 5. The zones on which the displacements are averaged is illustrated by red rectangles in Figure 12. The accuracy of the registration step is similar as Laurichard but with less outliers. The standard deviation is 0.32 pixels in both directions. In order to discard poorly registered images we have chosen a threshold on the norm of the average diplacement over fixed areas of 1 pixel. This filter rejects 30% of images for Laurichard and 5% for Argentiere. The high percentage of filtered images in the Laurichard case might be due to the low illumination of one of the validation area which is close to the image corner and is located at a bottom of steep cliff (see red rectangles on Figure 4).

3D reconstruction accuracy
To assess the quality of the depth map computed from the stereo images, we can compare the point cloud obtained from our device with one obtained by a LiDAR survey. We used an Optech R Ilris 3D long range scanner to acquire the point cloud from a position close to the left camera of the stereo device installed at the Laurichard rock glacier and with a field of view similar to the camera. Then the point cloud has been georeferenced by identifying 4 spheres in the scene which have been measured by GPS on the same day. The LiDAR point cloud has 7.6 millions points, and the 3D reconstruction from the stereo device on the same date has 4.6 millions. The spatial distribution of point density is similar in both clouds, with a higher density close to the device and when the slope is perpendicular to the device view. The average distance between clouds is around 20cm, and seems mostly due to small georeferencing errors and noise in the stereo point cloud. A profile analysis shows that the point cloud from the stereo device has more noise an is not perfectly registered, but all the features, such as boulders and slope variations, are present. The two clouds, the density, the distance between them and the profiles are illustrated in Figure 6. Unfortunatly we do not have a similar LiDAR survey on the Argentiere glacier to asses our 3D reconstruction.

Depth map dispersion
Laurichard To evaluate the reproducibility of depth map computation step we calculated a standard deviation map from a series of 460 depth maps computed from images taken between July 2018 and October 2018 of the Laurichard rock glacier: standard deviation is computed with depth map within a sliding window of one week. Then all the map are averaged to produce the final standard deviation map (see Figure 7). The size of the temporal window is chosen such that the glacier displacement is   negligible regarding the resolution of the depth map: from previous studies, the glacier velocity is at most 4mm/day (Bodin et al., 2009), which corresponds to a displacement of 2.1cm in one week, which is below the average pixel size of 2.5cm on the glacier. The surfaces with a normal pointing toward the camera have a low standard deviation (1e −3 to 1e −4 meters) whereas the zones next to strong discontinuities in depth or with a normal pointing in an opposite direction to the cameras have a greater standard deviation, synonym of a poor confidence on the depth map in these zones and hence a greater error in the displacement calculation.
Argentiere In the case of Argentiere glacier, the glacier horizontal velocity is around 10-20cm/day and vertical velocity could be in the range of 0-10cm/day (Ponton et al., 2014). Such amplitudes are not compatible with the hypothesis of negligible depth over one week. We cannot therefore apply the same procedure as Laurichard. To asses the dispersion of the depth maps we have chosen one pixel on fixed areas, on which the depth has been calculated for the 132 depth maps during summer 2019. The standard deviation is 36cm. We did the same analysis on a pixel which is on the glacier, and have plotted the corresponding glacier elevation. The results are presented in Figure 8. The result seems coherent with glacier dynamic, with a linear altitude loss during July and August, mostly due to ablation and a decrease during fall when temperature and solar radiation decrease.

Evaluation of 3D displacement vector field
Laurichard The methods described in the section 3, allow displacement vector fields to be computed between any two stereo pair of images. In order to asses the final result of our methods, we compare displacements measured from traditional methods between summer 2018 and 2019. The reference measurements are the positions of 14 rocks measured by DGPS (Differential GPS) on the 18/10/2018 and the 10/09/2019 with a centimeter precision. A second comparison data set comes from correlation of DEM computed from LiDAR surveys from the same position on the 25/09/2018 and the 22/07/2019. The spheres around the glaciers were used to geo-reference the point clouds, then the 2019 point cloud has been registered on the 2018 cloud using common fixed known areas. The co-registration error is estimated as 5cm along x and y and 8cm along z axis. Both point clouds have been re-sampled in a DEM of 50cm resolution. To produce displacement vector field, we have calculated correlation between DEM, with a template window of 64 pixels and a search window of 128 pixels with five meters spacing. More details on the procedure can be found in (Bodin et al., 2018). We calculated a velocity vector field from stereo pairs taken at the same date as the LiDAR surveys. Both vector fields are presented in Figure 9, as well as the position of the rocks measured by DGPS and devices' viewing directions. The stereo result has a higher vector density with more Figure 9. Comparison of two different methods to compute velocity vector field on the Laurichard rock glacier between the 25/09/2018 and the 22/07/2019: left, vector field computed by our stereo device and right, vector field computed by DEM (from LiDAR) cross correlation (Bodin et al., 2018). The stereo displacement vector field is 100 times denser than the LiDAR field. There is no vector on the right side of the glacier with the stereo method due to poor 3D reconstruction in this area (see Figure 7. The red arrows indicates the devices' point of view and the crosses indicate the rocks measured by GPS. than 990000 vectors on the glacier and only around 7000 vectors with the DEM correlation method. The spatial distribution of velocity is similar, with low velocities around the glaciers borders, and high velocity at the top and next to the right side glacier's front. With the stereo methods, no vectors are present on the right side, which is due to the low confidence on the depth maps in those areas (see Figure 7). In order to compare the three measurements we have plotted two profiles passing through GPS measurements, one longitudinal and one transverse. Each vector within one meter around the profile is reported in Figure 10. The stereo profile is less noisy as the DEM profile and is closer to the GPS measurements, even if the velocity seems to be overestimated. In order to conduct a quantitative analysis, the GPS measurement have been directly compared to stereo and DEM measurements. For each GPS measurements, all stereo and DEM measurements in a radius of 0.5m and 2m respectively were extracted and a median vector was calculated for both methods. Those median vectors were directly compared to the GPS measurements, by extracting difference on the three dimension and norm. We also have calculated linear regression of stereo and DEM measurements versus GPS for the 3 dimensions. The results are presented in Figure 11 and Table 1. The overall results are better with our stereo device, with less difference and better correlation (0.98) with GPS measurements. The X-axis displacements, measured by the stereo device, has the least correlation with GPS measurements of the three dimensions. This can be explained by the camera's viewing direction, which is collinear with this axis (see Figure 9), making the device less sensitive to movement in this direction. Conversely, the result according to the Z dimension measured by DEM correlation shows the best correlation of the three dimensions. This can be explained by the method which is based on the correlation of altitudes between DEM.
Argentiere Concerning the Argentiere glacier, it is difficult to obtain displacement vector fields from images taken during different summer seasons. Indeed, the nature of the glacier surface changes drastically between two summers due to ice melt, Figure 10. Velocity profiles of the Laurichard glacier from GPS, DEM and stereo. The profiles follow the GPS measurements which are presented in Figure 9. The stereo profile is closer to the GPS measurements and is less noisy even if the velocity seems overestimated.   snowfall and the glacier flow. Thus we can only qualitatively analyze the field of displacement vectors extracted using images separated by less than a few days. Figure 12 shows the surface velocity calculated by our methods between July 25 and August 11, 2019. The vector direction is given by the pixel displacement vector and the magnitude is proportional to the 3D surface norm. This displacement vector field seems consistent with glaciological interpretations: the velocity is maximal at the glacier center compared to the borders. The average magnitude of 15cm/day is also coherent with previous studies on the same area at different dates , Benoit et al., 2015.

CONCLUSION
We have presented a fully automated pipeline able to process pairs of images taken by a stereoscopic time lapse device, to produce displacement vector field and 3D models. The proposed methods are sufficiently generic to be able to adapt to different sites with different textures and dynamics. The methods were tested on two study sites, a glacier and a rock glacier. The results suggest that this device makes it possible to monitor the dynamics of the study sites with a spatio-temporal resolution unequalled by traditional methods. The comparison with GPS and LiDAR measurements shows that our methods produce similar results as traditional GPS surveys and outperform the DEM correlation methods by providing more accurate vector fields. Finally, we have proposed metrics, only based on stereo pair, to draw a confidence map of our results. Note, however, that the device has some limitations. The environment of the study site does not always allow the installation of cameras. Moreover, in the event of bad weather or snow on the glacier surface, the processing of images is not possible. Future work will focus on assessing the temporal accuracy of displacements calculated with our device in order to measure temporal changes in glacier dynamics.

ACKNOWLEDGMENTS
GM's PhD is partly funded by a CIFRE grant from ANRT. Instrumentation on Laurichard and Argentière sites was funded by Univ. Savoie Mont-Blanc, SFR-VOR and OSUG grants. Devices were designed and mounted by Emmanuel Malet in EDYTEM, who also largely contributed to the their installation on the two sites: we warmly thank him! We are grateful to the many people who helped on the field for installing and maintaining the two pairs of devices. Emmanuel Thibert and the Parc national des Ecrins are also greatly acknowledged for providing the GPS data they annually survey on the Laurichard rock glacier.