“WEBSNOW”: ESTIMATION OF SNOW COVER FROM FREELY ACCESSIBLE WEBCAM IMAGES IN THE ALPS

The alpine snow cover exhibits a high spatial variability in the horizontal and vertical directions even on a very small scale, mainly caused by the high variability of alpine terrain. To quantify the annual and inter-annual snow dynamics continuously reliable measurements of the temporal and spatial variability are required. While remote sensing from satellite and aerial platforms have been successfully used to estimate snow cover at larger scales, especially in mountain areas spatial and temporal resolution are too low to capture local changes. In the alpine region, webcam images are freely available for touristic purposes capturing images at high frequency intervals. Within the WebSnow project the feasibility of using such images for the detection of snow was investigated. With the developed workflow, processing times could be reduced and satisfactory results obtained. Our results show, that webcam networks have the potential for monitoring snow at high spatial and temporal resolution.


MOTIVATION
In the Alpine region, snow cover variability is a high socioeconomic aspect not only as local water resource and storage, but also as climate-related hazard and winter tourism. Especially winter tourism dominates the economy of many mountain settlements, and it directly depends on the presence of winter snow cover and snow depth.
The seasonal snow on the ground can be characterized by various metrics, including the snow covered area, the snow depth, the snow density, and the snow equivalent in water (SWE) (Fierz et al., 2009). For monitoring snow cover variability, the most important parameters are the amount and duration of seasonal snow cover and snow depth from where the SWE can be derived. To quantify the annual and inter-annual snow dynamics continuously reliable measurements of the temporal and spatial variability are required.
Various techniques exist to determine the spatial distribution and temporal evolution of snow. In the Alpine region, remote sensing from satellite and aerial platforms quantifies snow cover extent at large scales. To monitor the extent of melting (wet) snow areas Synthetic Aperture Radar (SAR) satellite data can be used (Nagler et al. 2000). While the wet snow mapping approach is very useful during the melting season, the snow cover extent at dry snow conditions can be retrieved from optical satellite imagery. Despite the capability of optical remote sensing there are several limitations to be considered especially in the Alpine environment. The main limitation of optical satellite data is cloud cover which limits the number of useful snow observations reducing data availability and spatialtemporal resolution. ______________________________ * Corresponding author Recent advances in snow depth quantification in mountain areas have been achieved with airborne LiDAR campaigns (Deems et al., 2013). Subtracting a DEM of a snow-free surface from a DEM of a snow-covered surface estimates snow depth, assuming snow ablation is the only process changing the surface elevations between observation times (Harder et al, 2016). In mountainous areas, LiDAR provides an accurate measurement of the snow depth (Grünewald et al., 2013) but LiDAR surveys are still costly, preventing large and frequent coverage.
Terrestrial photography is a favourable alternative to remote sensing not only due to its low costs, but especially when scale issues arise (e.g. snow evolution) since its spatio-temporal resolution can be adapted to the scale of the process (Pimentel et al., 2012). In the last decade, terrestrial photography has been used increasingly for snow cover mapping in mountain environments (Bernard et al., 2013;Härer et al., 2013;Pimentel et al., 2012, Arslan et al 2017, Portenier et al 2019. More recently, webcam networks are emerging as useful resources for large-scale environmental monitoring and estimating local weather conditions due to their widespread use and up-to-date imagery (Abrams andPless, 2013, Arslan et al 2019). In mountain areas, webcam images collected at daily or even hourly intervals are available for touristic purposes like ski resorts to judge snow coverage. Therefore, those images, properly processed, can be considered useful sources to follow in detail the seasonal evolution of snow cover. Furthermore and in contrast to point-wise station measurements, they provide area-wide snow information that allows small-scale processes to be monitored (Dizerens, 2015). Beyond the investigation done by Dizerens (2015) and Fedorov et al. (2016), current approaches rely on one or more cameras being designed and positioned ad hoc by researchers (e.g. Arsnal et al., 2017;Salvatori et al., 2011;Härer, et al., 2016;Giuliani et al., 2016).

Aim of the work
Within the WebSnow project the feasibility of using webcam images distributed over the Alps for continuously monitoring and detecting snow cover was investigated. Based on two webcams whose images are freely accessible a workflow for the georeferencing of webcam images has been developed and a threshold based approach for the detection of snow developed and evaluated.

Study Areas
Two different study areas located in Tyrol (Austria) close to Innsbruck have been selected based on the availability of webcam images and reference data. Both test sites represent challenging Alpine environments regarding the surrounding topography, land cover and viewing geometry.  The webcam located in Tuxertal ( Figure 1) is oriented to the southwest to view the whole valley. Besides forest and buildings at lower elevations, high altitudes covered by rocks and grassland are captured. Especially the regions in the far range in combination with the oblique viewing geometry are strongly affected by haze and fog.
While the Tuxertal webcam has a limited field of view, the webcams provided by Pannomax are full panoramic 360° cameras.
Due to the location of the camera, for Seefeld the image is cropped by the provider to approximately 270° ( Figure 2). Similar to the Tuxertal webcam this camera captures regions in the close range and the open valley mainly covered by forest. Due to the larger field of view and the surrounding environment stronger variations in illumination occur within this image.
In total 11 images have been processed (8 for Tuxertal and 3 for Seefeld) covering a time span of four months (January 2018 -April 2018) with one additional image in February 2019 for Tuxertal.

Image orientation
For the georeferencing of the webcam images a manual workflow was developed. A reference image (free of snow) for each webcam position is selected (e.g. Figure 1). The exterior orientation (only known approximately) and interior orientation (partly unknown) are computed using ground control points (GCPs). These GCPs are selected and digitized in an existing aerial orthophoto (X and Y of GCPs), the Z coordinate is interpolated from an existing digital elevation model (DEM).

#GCP Mean accuracy [px]
Tuxertal 21 2.5 Seefeld 58 1.1 Table 2: Image measurement accuracy of the GCP's for the reference image of both test sites after camera orientation and calibration.

Monoplotting
Figure 3. Monoplotting principle. Adapted from (Kraus, 2012) After the estimation of both the exterior and interior orientation the corresponding object point of each pixel in a global coordinate system is calculated, resulting in a georeferenced orthorectified image. This can be achieved by monoplotting; i.e. intersecting the ray defined by the projection center of the camera and each pixel with an existing digital terrain model ( Figure 3). To reduce processing times and avoid monoplotting of each image individually, the webcams were assumed to be stable over time. Therefore, the image and object coordinates of each pixel don't change, only its attributes (e.g. color, classification) change over time. Although the assumption of a webcam being stable seems reasonable, images from the same webcam separated in time showed small variations ( Figure 5). Possible causes for these small variations include instable camera mounting inducing small movements of the camera, thermal expansion, refraction or other atmospheric effects e.g. humidity or temperature. Only small differences could be observed for our test sites.
Assuming an unchanged projection center and small rotations (as observed; cf. Figure 5) these differences can be well modelled using a 2D affine transformation between the reference and snow images. Using an affine transformation derived from 120 Harris correspondences the misalignment between the reference and winter image of Figure 5  In both the reference and winter images feature points are extracted using Harris Corner Detection (Harris et al, 1988). Based on the observation that the differences between the reference and summer image are small, only feature points within 20 pixel distance are considered for matching. Figure 6. Workflow for estimating the 2D affine transformation between reference and winter image for Tuxertal.
Due to the strong visual change caused by the changing phenology in summer and winter only a small part of the feature points are identified as corresponding pairs including false matches. Therefore, to estimate an initial set of transformation parameters from this noisy data RANSAC (Fischler et al., 1981) is used. In a final step, based on the transformation parameters obtained by RANSAC, the refined image coordinates are used with ICP (Glira et al, 2015) to calculate the final transformation parameters.

DETECTION OF SNOW
The appearance of snow varies strongly throughout the year as it is influenced by various environmental parameters. In Winter mountain areas are mainly covered with fresh snow appearing very bright. With the increasing temperatures during spring the appearance changes as snow is mixed with dirt particles. Depending on the acquisition geometry, slope exposition and the time of the day some snow covered areas might be directly illuminated by the sun, whereas other areas will be in the shadow of the surrounding mountains. Furthermore, through the oblique acquisition geometry of webcams other atmospheric effects like fog, haze and refraction additionally influence the visibility. Due to these strong variations and influences, the detection of snow in webcam images is a challenging task and various methods have been developed.
The Gaussian Mixture Model approach proposed by Rüfenacht et al. (2014) is based on the idea, that the colour distribution of the pixels of an image follows two Gaussian normal distributions representing snow and not snow. Analysing our selected webcam images showed that in general the assumption of two normal distributions does not hold true.
A similar problem occurs with the approach proposed by Pimentel et al, (2012). K-means is a very powerful and widely used clustering algorithm but the number of clusters must be known a priori. While for some webcams good results could be achieved using 4 clusters, due to the much more complex scenery captured for example by the Tuxertal webcam the appropriate selection of the number of clusters is challenging. As k-means itself is only a clustering approach, still a subsequent classification step following the k-means clustering is necessary to derive the final snow cover maps. Hinkler et al. (2002) proposed an interesting method simulating the near infrared channel based on the available RGB information. Unfortunately, the proposed calculation of the RGBNDSI is vague and reimplementation requires some guessing. The results obtained using this empirically derived band from our implementation have not been satisfactory whereas we are not sure if it is related to our implementation or the discriminative power of the RGBNDSI. Giuliani et. al., (2016) andFederov et al. (2016) used a supervised classification. Besides incorporating the local neighborhood of each pixel they use a so called "daily median image" in order to reduce the influence of challenging illumination conditions.
Analyzing the selected webcam images showed, that while the appearance of snow changes throughout the year, it always corresponds to the brightest regions within each image. Based on this observation a threshold based snow detection method similar to Härer et. al (2016) was further developed.
By transforming the colour space from RGB to HSV it is possible to separate colour from lightness. Lightness is defined as the largest value among R, G or B and represented by the value channel. Manually selecting appropriate thresholds for V in each image, it is possible to separate bright snow from not snow. As two different thresholds for V are used for the classification of snow and not-snow, snow in shadow and lighter rocks without snow coverage in the far range remain unclassified (Figure 7 -middle). To further classify these pixels, additional thresholds based on all HSV channels have been empirically derived. Especially these thresholds vary strongly among the selected webcams and fine tuning is difficult. The final classification result for one webcam image from Tuxertal can be seen in Figure 7.
Using the threshold based approach in total four different classes are distinguished: (bright) snow, snow in shadow, not snow and undefined (Figure 7). Especially the undefined class is the most heterogeneous one, depending on the captured scene, containing various different land cover types e.g. forest and buildings.

Accuracy assessment
Manually classified control points selected by a random area based stratified sampling strategy (Olofsson et al. 2014) were used for the evaluation of the classification results. This guarantees that the number of control points for each class is proportional to the number of pixels belonging to the class. In total approximately 500 points have been manually classified for each image.
In Figure 7 (top) the points used for the evaluation of the Tuxertal webcam image of 29.01.2018 are shown: Green points indicate that the classification result is true, red points that the classification is wrong. Especially the undefined regions in the shadowed areas are mainly misclassified. The detailed accuracy metrics for each webcam image are shown in Table 3.

Monoplotting
The visible areas captured by both webcams are shown in Figure 8. In each case the influence of topography on the monoplotting result becomes obvious.
Due to the horizontal acquisition geometry large void areas shadowed by mountain slopes can be observed. The changing ground sampling distance (GSD), which is dependent not only on the distance to the camera but also on the slope and exposition of the underlying topography poses another limitation. For the Seefeld webcam, flat areas in the far range ( Figure 8 -top -blue rectangle) are only sparsely covered due to the larger ground sampling distance. In contrast, for Tuxertal ( Figure 8 -bottom) mainly slopes facing towards the camera are captured resulting in a more dense uniform coverage.
These are the main limitation for the detection of snow from webcam images compared to images acquired using aerial or satellite platforms. One way to overcome these limitations in the future, is the combination of multiple webcams capturing the same area from different directions.

Snow Detection
Overall accuracies for the webcam located in Tuxertal are above 0.84, whereas for Seefeld results are slightly worse, especially for 22.04.2018 with an overall accuracy of only 0.57.
The precision for snow and not-snow are the highest across all images. This approves our basic finding, that snow always corresponds to the brightest region in each image and can be classified using the value channel alone. Figure 9. Two problematic areas enlarged for Tuxertal on 28.02.2018. Green points indicate that the classification result is true, red points that the classification is wrong.
As the detection of snow in shadow is dependent on multiple thresholds using all HSV channels, not only the tuning of parameters is difficult, also classification results are worse. This clearly shows the limitation of our approach regarding the detection of snow.  Table 3. Precision for each class and overall accuracy achieved using the threshold based snow detection approach.

Webcam
In Figure 9 two representative problematic areas occurring in the Tuxertal are enlarged. Using our threshold based approach both highlighted regions are mainly classified as undefined while most points actually belong to the not snow class. While in general the not snow class is quite well classified, due to the bright snow present on the dark forest and the influence of haze (averaging the dark and bright regions) both regions are classified as undefined.
The results for the undefined class are strongly varying. This is related to the fact, that the control points used for evaluation can almost entirely be assigned to one of the other classes. Only in seldom cases e.g. border region of rocks, shadowed regions the true class was difficult to identify. Therefore only a few control points represent the undefined class.
The orthorectified snow maps for both webcams and all acquisition dates are shown in the Appendix. Especially for the Tuxertal webcam the temporal variation of snow cover can be observed from our time series. While for 29.01.2018 snow is also present in lower altitudes, towards spring snow retreats to higher altitudes and the not snow class becomes dominant.

CONCLUSION
Within the WebSnow project a workflow for the georeferencing of freely available webcam images and the detection of snow was developed. While the georeferencing for the selected images worked well, the process currently involves manual interaction. To employ such a workflow to a whole webcam network in order to combine multiple webcams to fill void areas, an automatic solution similar to the work of Baboud et. al (2011) is necessary. Besides manual input, processing times could be reduced by avoiding the monoplotting step for each image individually by modelling the observed differences with an affine transformation.
The main limitation of our proposed workflow currently is the simple thresholding approach used for detecting snow as the thresholds are manually defined. While the results for bright snow are very good, all the other classes are problematic showing the limitations of our approach. Again the manual input and hand tuning of the thresholds for each image is the main limitation with respect to processing images from a complete webcam network.