VEGETATION COVER MAPPING FROM RGB WEBCAM TIME SERIES FOR LAND SURFACE EMISSIVITY RETRIEVAL IN HIGH MOUNTAIN AREAS

: Land Surface Temperature (LST) products from thermal infrared imaging rely on information about the spatial distribution of Land Surface Emissivity (LSE). For portable, broadband thermal cameras for drone- or ground-based measurements with camera to object distances up to a few kilometres and with meter-scale resolution, threshold-based retrieval of LSE from Fractional green Vegetation Cover (FVC) can be used. As seasonal changes in vegetation LSE over the year cannot be accounted for by single satellite images or aerial orthophotos, this study evaluates an approach for FVC retrieval via permanently installed RGB webcams and derived Excess Green vegetation index (ExG) time series at a high-mountain test site in the European Alps. Daily ExG values were derived from the imagery of 27 days between 12/07/2021 and 30/10/2021 and projected to a 0.5 m Digital Surface Model (DSM). FVC reference data from 765 in-situ vegetation plots were used to assess the relationship between ExG and the vegetation cover and to determine the thresholds of ExG for no vegetation cover and full vegetation cover. Despite the bad correlation between ExG and in-field FVC with an R² score of 0.15, an approach using a well-tested orthophoto-retrieved NDVI for FVC retrieval performs just slightly better. The comparison of the remotely sensed data and the field measurements therefore remains complex. Time series analysis of both ExG and FVC for highly vegetated areas showed a significant decrease from summer to autumn, which reflects the seasonal changes of LSE for senescent vegetation. Calculated emissivities for vegetated pixels ranged from the minimum of 0.95 to the maximum of 0.985 over the season, while emissivity values for less vegetated pixels stayed constant during the season. The results of this study will be used as input to a correction model for remote LST measurements in the context of micro-scale investigations of the thermal niche of Alpine flora.


INTRODUCTION
Local plant species distribution in a high alpine environment can be traced back to topographically induced differences of microclimatic conditions (Scherrer & Körner, 2011). For micro-scale investigation and monitoring of alpine plant species distribution in relation to thermal niches, land surface temperature (LST) plays a key role. Therefore, a high spatio-temporal resolution for thermal mapping is necessary to account for the large variety in thermal niches, plants can find in a heterogeneous alpine terrain. In contrast to soil and air temperature, LST can be surveyed in a spatially explicit way by remote sensing. Most LST products are satellite-derived and therefore limited in their spatial and temporal resolution. With terrestrial infra-red (TIR) thermography, however, it is possible to map changes in environmental conditions from LST with high spatial and temporal resolution, bridging the gap between satellite derived LST and direct measurements (Morrison et al., 2020, Scherrer & Körner, 2010. Unfortunately, LST retrieval from TIR requires atmospheric and emissivity correction. LST has to be corrected for atmospheric effects even for ground-based measurements with camera to object distances up to a few kilometres (Hammerle et al., 2017). Correct estimation of Land Surface Emissivity (LSE) in the TIR spectral range has an even larger influence in terrestrial thermography, as at-cam measured brightness temperature can be significantly lower than LST and correction is especially necessary for heterogeneous surfaces (Morrison et al., 2020). Emissivity values are object-dependent and for natural surfaces (such as rocks, soil, vegetation, water or snow) range between 0.90 and 0.99 (Kant & Badarinath, 2002). There are different types of methods for retrieval of land surface emissivity (Li et al., 2013). Multi-channel-based methods are especially used in satellite-based LSE and LST estimation and rely on several bands in the thermal infrared range. Physically based methods for LSE retrieval need a wide range of different input data. Due to their complexity and need for additional data and bandwidths these methods are hard to apply in a setting with high resolution terrestrial thermography and limited data access in mountainous areas such as in the presented study. For these applications simpler semi-empirical methods based on surface classification or vegetation indices can be applied. The assumption for vegetation index-based methods is that the surface contains soil and vegetation, but they are usually less effective for estimation of rock surface emissivity. Valor & Caselles (1996) used the relation between emissivity and Normalized Difference Vegetation Index (NDVI) of different surfaces to determine LSE. By testing different vegetation indices Kodimalar et al. (2020) showed the applicability of the method for satellite data but also its difficulties with some vegetation indices (VI) being less effective than others and seasonal differences. A VI-based estimation of LSE relies on beforehand information about emissivity values for different surfaces, which can be encoded and used via look-up tables (Meerdink et al., 2019;Rubio et al., 2003 andSalisbury &D'Aria, 1992). For urban areas with homogeneous surfaces a simple classification of surface types and an assignment of emissivities to the surface types can be applied. For heterogeneous areas, where mixed pixels occur, containing different surface types and emissivities, this approach is not accurate enough and a more differentiated approach over VIs and calculation of emissivity for mixed pixels is necessary. Nevertheless, VI-based methods also need information about soil and vegetation emissivity. In most cases bare soil (~0.94) has lower emissivity values than full vegetation (~0.985). For pixels containing both bare soil and vegetation, emissivity values cannot be directly applied. To account for these mixed pixels a Fractional green Vegetation Cover (FVC) derived from VI is often used to calculate the emissivity per pixel (Sobrino & Raissouni, 2000;Tang et al., 2015). As green, vital vegetation has much higher emissivities than senescent or dry vegetation or woody parts (Meerdink et al., 2019), LSE is a dynamic variable that is changing along with the phenological cycle of the vegetation. Therefore, we strive to retrieve the Fractional green Vegetation Cover at an adequate temporal resolution to match the days where thermographic time series are acquired. Vegetation indices integrate information about both fractional cover and greenness or vitality of the vegetation, but currently, only permanently installed cameras are capable of delivering data at the spatial and temporal resolution required for our application. Therefore, the ExcessGreen (ExG; Woebbecke et al., 1995) is used as vegetation index that is retrievable from the visible spectrum captured by most webcams. For comparison, FVC is calculated from an aerial orthophoto based, monotemporal NDVI , but the focus of this study is on an assessment of FVC derived from terrestrial close-range sensing against in-situ field data. Finally, we use this FVC to retrieve LSE and evaluate the results. Figure 1. Study area below Mt. Schrankogel, with the webcam location (red), all ground control points (of which 11 were accurate enough for image processing) and the terrain view angle of the camera masked for the area of interest and areas in sight of the camera.

Study Site
The webcam used in this survey is installed in the Sulztal Valley (Stubai Alps, Tyrol), mainly for snow cover mapping and vegetation observation as part of the MICROCLIM project (https://www.mountainresearch.at/microclim/). Located northeast of Mt. Wannenkogel at an elevation of 2645 m a.s.l., the camera views the southwestern flanks of Mt. Schrankogel from approx. 2200 m a.s.l. upwards to the summit at 3497 m a.s.l. (Fig. 1 and Fig. 3). The pictured area covers several vegetation types from the subalpine to the nival zone, e.g. subalpine shrublands (Rhododendro-Vaccinion and Loiseleurio-Vaccinion), alpine grasslands (mainly Caricion curvulae, Festucion variae, Oxytropido-Elynion), snowbeds (Salicion herbaceae) and patches of nival vegetation (Androsacion alpinae). Several boulder and scree fields are located in the higher parts of the mountain. Bare rocks are exposed at several small cliffs in all parts of the area. Due to the steep and fractured terrain the study site has heterogeneous surface characteristics, which is typical for a high alpine environment.

Remote Sensing Data
The installed camera is a commercially available Canon EOS 2000D. Every 30 minutes the camera automatically sends JPEG images to a server. For this survey, 27 days with clear weather between 12/07/2021 and 30/10/2021 were chosen. The basis for the projection of the webcam images is a DSM with 1 m resolution (Dept. of Geoinformation, Province of Tyrol, 2021), which has been resampled to 0.5m via SAGA GIS B-spline interpolation for better representation of the projected images (Conrad et al., 2015). An aerial orthophoto from 2015 (provided by the Dept. of Geoinformation, Province of Tyrol, 2021) was used to calculate the NDVI and, subsequently, a monotemporal FVC for comparison with the webcam-based FVC time series. The 1 m resolution NDVI has been resampled to 0.5 m to match the DSM resolution.

In-situ Reference Data
As reference for the ExG-derived FVC, in-situ surveyed vegetation cover is used. At 765 vegetation plots scattered across the study area of the project, vegetation cover was recorded between 1/7/2021 and 27/7/2021. The area covered by the camera includes 506 of the total 765 plots within the study site. The data contains in situ estimates of cover of herbs, mosses, bare soil, litter scree and rocks, which originate from a vegetation survey on 1x1 meter plots (Dullinger et al., unpubl.). As the influence of different vegetation types on the ExG is unknown, two different approaches were chosen. First, only the cover of the herb layer was used (FVCherbs) and second, the cover sum of herb and moss layer (FVCherbs+moss), since these are the dominating vegetation types of the study site. Since herb and moss layer were estimated as independent layers that can overlap, values in the second approach can exceed 100%. From a 2D remote sensing point of view this makes less sense because a pixel cannot contain more than 100% vegetation cover. The location of the plots was measured by a differential global navigation satellite system (GNSS) with an accuracy of <1 m for 95% of the plots. Therefore, the vegetation cover data can be easily extracted and compared to projected and georeferenced images.

METHODS
The overall workflow included five steps. After preprocessing the webcam time series, a daily ExG was calculated and monoplotted onto a DSM. The rasterized ExG images were used to calculate the FVC via soil and vegetation thresholds. In the last step LSE maps are calculated for every time step. A validation was performed with an NDVI-derived FVC (Fig. 2).

Webcam Images and ExG
As shadows have a large impact on calculated vegetation indices (Jiang et al., 2019) the image time series of an entire day was utilized to avoid differences in the vegetation index calculation due to dark areas from shadows. Barbosa et al. (2019) stressed the importance of evaluating shading-affected regions in RGB imagery to be used for VI derivation (including ExG). In this exemplary study, shadows from cloud coverage were assumed to have a similar impact on the VI as the terrain dependent shadows. Therefore days with bad weather conditions or high cloud cover were excluded by manual selection. To further minimize shadow effects in the calculated VI time series, the daily VI for each pixel was calculated based on the 95 th percentile of the RGB intensity values I 95 recorded on the respective day (eq. 1; eq. 2).
where Ii,j… sorted intensity values for given pixel i,j n… number of timesteps The derived composite images for the timespan examined in this study are nearly shadow free and will be used for ExG calculation.
Another problem occurred during calculation of the vegetation index. Due to a limited bandwidth of the internet connection, webcam images are transmitted in compressed JPEG format. This introduces compression artefacts in homogeneous areas of the images, which are not evident during visual inspection. However, these artefacts emerge as blocks of 16x16 pixels when calculating ExG. A full correction of the images could not be performed within this study. As the images will be projected to a relatively coarse DSM (compared to the image ground resolution) the images were resampled from 6000x4000 pixels to a coarser resolution of 3000x2000 pixels prior to monoplotting, to reduce the influence of this issue. The artefacts are still visible afterwards, but with less difference between the artefacts borders. An example of the resulting images is given in Figure 3. Since the rock in the lower left corner of the images (Fig. 3) is not of interest, this part of the image was masked during data analysis.
Based on the resampled and shadow-reduced RGB composite images, the ExG was calculated for each pixel as suggested by Woebbecke et al. (1995) (eq. 3): where R, G, B ... Red, Green, Blue channels of the image

Monoplotting and Projection Accuracy
For mapping ExG and FVC and to make these compatible with other georeferenced data sources, a monoplotting procedure had to be applied. The steps were as follows: 1. Differential GNSS measurements of Ground Control Point (GCP) locations (X0,Y0,Z0). Objects that are well visible in the webcam images were chosen as GCPs. Most of the objects are boulders in grassland or the foot of rock pillars and rock walls.
2. Retrieving the GCPs location (x0,y0) in the webcam images manually. Out of 24 measured GCPs, 11 were identifiable on the images and thus suitable as input for the monoplotting. As inaccuracies were introduced to the monoplotting by a poor visual identification of GCPs in the images and the manual coordinate extraction, it was necessary to refine the projection parameters (see step 5).
3. Calculating camera orientation (pitch, yaw, roll) and projecting pixel coordinates (x,y) to raster coordinates (X,Y,Z). Each raster cell on the DSM is given a pair of pixel coordinates from the image. As the camera parameters and lens properties for the installed camera are unknown, a 3rd degree polynomial function was used as a simplified approximation for the lens distortion (Ma et al., 2003).
4. Estimating the effects of projection errors. The offset Δd and offset direction vd for each projected GCP is calculated. As a measure for the effect of projection errors on the projected ExG image the maximum ExG difference for a pixel and its neighbours within a search radius equal to Δd is calculated as in eq. 4. The variability of the ExG around a given pixel is used as indicator for the possible error that can occur due to an offset of the projected pixel. In homogeneous areas this variability is small, but in heterogeneous areas the error can be high even for small offsets (Fig. 4). 5. Adjust camera parameters. The calculated camera parameters and orientation can be adjusted manually according to vd and the results from the accuracy assessment.
6. Projecting images onto the DSM. Monoplotting was scripted in Python 3.0. In addition to the inaccurate manual extraction of the GCP image coordinates, the incidence angle σ of the webcam below Mt. Wannenkogel remains a problem for the accuracy of the projected image, as the offset of the projected pixel increases exponentially with a decreasing σ. σ is calculated as angle between the camera-tosurface vector and the surface normal vector which is given by the DTM slope and aspect. For these reasons, a mask was implemented into the monoplotting to avoid large offsets by excluding areas with low σ and coarse ground resolution. As indicator for ground resolution the pixel length in viewing direction Δl was calculated from σ, camera resolution and distance to webcam. As a trade-off between the required ground resolution and areal coverage, the mask was set to Δl <= 3 m, otherwise too much of the area of interest would be excluded. For the projected ExG and later FVC, Q only serves as quality measure but for the following data analysis steps, Q was implemented as a mask to exclude areas where large errors due to inaccurate monoplotting are possible. After masking with Δl the mean GCP projection error was 3.60 m. Within this range projected ExG images showed a variability between 0.0 and 0.22 (without considering the rock in the lower left corner of the image; Fig. 3 and Fig. 4). Homogeneous areas such as the scree and boulder fields are less error prone than heterogeneous areas in the lower parts of the mountain. Transitions between different surface types, e.g. snow to rock, show especially high variability.

Calculating Fractional Vegetation Cover and time series
A cloud-free day in mid-summer (10/08/2021) was chosen to determine the thresholds and relationship between ExG and the vegetation cover. Visually, the relation between ExG and the insitu vegetation cover data suggests a non-linear relationship, which is analysed as well ( Figure 5). The ExG thresholds for bare soil and full vegetation (tsoil and tveg) are hard to determine, even though data from the vegetation plots can be used as reference in this survey. Especially high cover values show a range of ExG values between approx. 0.02 and 0.2 for both datasets (FVCherbs and FVCherbs+moss). Without the measured data the thresholds would have to be determined manually from the VI (ExG or NDVI) histogram. As a simple automated curve fitting gives a too low soil threshold for the ExG, a different approach was chosen. The soil threshold (FVC = 0) was determined by taking the mean ExG of all raster values with a measured vegetation cover below 0.05. Then tveg (FVC = 1) was determined by taking the mean ExG of all raster values with FVC above 0.95. With these thresholds FVCExG,linear was calculated as a linear function between the thresholds for the two datasets of FVCherbs and FVCherbs+moss as in eq. 5 (Fig. 5). To account for the nonlinear relationship between ExG and the vegetation cover a square root function was used as a second approach (FVCExG,sqrt). Therefore tsoil was set as fixed starting point and a square root function was fitted to the data (eq. 6) (Fig.  5).  Afterwards FVCExG,sqrt > 1 was set to 1. The same procedure was then applied to the NDVI from the orthophotos of 2015 to calculate FVCNDVI for use as a reference (Fig. 6). Three highly vegetated and 3 non-vegetated pixels were randomly chosen from the projected images for analysis of ExG

Calculating emissivity
A threshold-based method was used to calculate LSE from FVC (Sobrino & Raissouni, 2000;Kodimalar et al., 2020), which estimates LSE by a linear combination of soil and vegetation emissivity.
where , … emissivity of given pixel i, j , … FVC of given pixel i, j , … emissivity of soil, vegetation … cavity term In-field measurements of and were not possible, therefore the two thresholds were chosen as suggested by Valor & Caselles (1996) with = 0.95 and = 0.985. The cavity term c considers the radiation due to internal reflections within the vegetation and is calculated as in eq. 8.
where … mean value of cavity effect , … FVC of given pixel i, j As there are no direct measurements of in the field, is assumed to be 0.005 as suggested by Kodimalar et al. (2020).

RESULTS
Both ExG and NDVI show a good correlation to FVC for low FVC values between 0 and 0.2. On the other hand, both indices show significantly more scatter the higher the FVC. Whereby ExG has a triangle shaped relation to FVC that is hard to interpret, especially for high FVC values from 0.6 to 1.0. The R² values showed the best results for a linear relation when compared to FVCherbs (0.35 for ExG, 0.58 for NDVI). FVC correlates stronger with NDVI than with ExG, likely because vegetation reflectance is stronger in the near-infrared than in the green band (Tucker 1977). The calculated FVC for non-vegetated boulder fields (Fig. 3) ranges between 0.0 and 0.40, which is probably due to the fact that the boulders are relatively dark and covered with lichen and therefore appear "green". Comparing FVCExG,linear and FVCNDVI,linear independently from the field data shows an R² of 0.65 and RMSE of 0.041 (Fig. 8). FVCExG,linear tends to underestimate vegetation cover compared to FVCNDVI,linear. The correlation of FVCNDVI,linear and field data (FVCherbs) was higher (R² = 0.42) compared to FVCExG,linear and field data (R² = 0.21).
The mean values of ExG over the surveyed time period range between 0.0 and 0.049 with a slight decrease from summer to autumn. Boulder fields show almost constant ExG, as expected. The fluctuations probably occur due to the previously described internal camera adjustments concerning different lighting conditions. In contrast, the observed vegetated areas have high ExG values around 0.17 from the start of the study until end of august. From there on they decrease to values similar to those of the boulder fields. The calculated FVC shows a similar response (Fig. 9b). Mean FVC in the study area ranges around 0.35 in summer and decreases during autumn to around 0.15. Due to the high soil threshold FVC values for boulder fields with no vegetation apart from lichen range around 0.2. The observed vegetated areas in the time series of the FVC show the same decrease as with the ExG. The values decrease from 100% coverage in summer to 0 -20% in autumn. Although the cover of the total vegetation remains more or less constant over the year, photosynthetic activity and "greenness" are strongly reduced in late summer and autumn.
By comparing the two maps in Figure 11 two differences are noticeable. First, compared to FVCNDVI,linear, FVCExG,linear overestimates FVC values on the boulder fields on the higher slopes of the mountain as well as on the two boulder fields in the center of the image. FVCNDVI,linear has FVC values of 0 in the same areas. Second, as already described above, FVCExG,linear tends to underestimate FVC for vegetated areas. Or more precisely, the variability of ExG for vegetated areas is higher than the variability of NDVI, therefore the variability of FVCExG,linear is higher, which leads to a mean underestimation of FVC for vegetated areas. An adjustment of the two thresholds could partly solve these problems. Surface emissivity shows a significant decrease from August mean to October mean for vegetated areas, which can be attributed to the decrease of emissivity values from green and active vegetation to dry and senescent vegetation. Many areas show the full spectrum between emissivity of full vegetation cover (0.985) and soil emissivity (0.950) and therefore a decrease of 0.035 (Fig. 11). Depending on the absolute level of emitted thermal radiation, this can result in a correction of measured LST by ca. ±3 K. This emissivity decrease also agrees well with the FVC from NDVI.

DISCUSSION
A visual inspection of the relationship between ExG and FVC ( Fig. 5) and the calculated FVC from ExG ( Fig. 8) suggest that a nonlinear relation, such as a square root function as explained above, would fit the data better. It is also hard to evaluate the quality of the determined soil thresholds as from a visual inspection of the resulting FVCExG,linear the threshold value of -0.002 should be higher. This is especially noticeable in areas with ExG values close to tsoil where calculated FVC values occur to be too high compared to observations in the field. Choosing the soil threshold higher and using a square root function for FVC calculation would define a clear border between FVC = 0 and FVC > 0 ("no vegetation" and "vegetation fraction"). The comparison of the remotely sensed data and the field measurements therefore seems to be problematic and complex ( Fig. 5 and 6). Despite this, the two VIs can both be attested a moderate performance for calculating FVC (Fig. 6), with the often used NDVI approach still being the more reliable method. The high RMSE for both calculated FVCs is another indicator for the differences and mismatch to the field data. Although the well-tested NDVI-based FVC matches the field data better, the ExG-based FVC compares surprisingly well to  the NDVI-based FVC (Fig. 8), despite its relatively low correlation with the field data. Both the aerial orthophotos and the webcam images contain raw digital numbers (DN). In contrast to satellite images, these DN cannot be transformed into surface reflectance (due to unknown spectral response functions of the sensors, and a lack of illumination correction). Using DN instead of surface reflectance, the consistency of calculated VIs and all further derivatives, therefore, suffers from effects of topography, illumination and sensor response. For the NDVI it is a known issue that it saturates at high biomass or leaf-area-index (e.g. forest; Riihimäki et al. 2017 For the time series analysis of webcam images this uncertainty is aggravated by the automatic exposure adjustments of the camera. They react to varying cloud cover and change even over the course of one day several times, as the camera automatically tries to find the best settings for the given situation. Nevertheless trends for different surfaces are clearly visible (Fig. 9a). These trends and changes follow the decrease of photosynthetic activity in autumn. The intensity of the green band decreases while the intensity of the red band increases due to reduced photosynthetic activity of the plants. The decrease of chlorophyll in plants during autumn leads to a relative increase of reflectance in the red band of the camera and a relative decrease in the green band. This is especially true for grassy slopes and seems less noticeable for higher vegetation such as shrubs. For total vegetation cover mapping this effect poses a problem as the fractional cover actually does not change. Aiming at LSE retrieval, however, the Fractional Green Vegetation Cover is crucial as senescent and dry grass have lower LSE values (closer to LSE values of bare soil) than green vegetation. Hence, ExG and the derived Fractional Green Vegetation Cover correctly reflect the decrease of emissivity from "green" grass in summer to dry or senescent grass in autumn. Under these circumstances, the emissivity of grass can even fall below the values of bare soil and rocks (Meerdink et al., 2019). This is reasonable if lichencovered rocks prevail, as lichen usually have a higher emissivity in TIR than bare rocks (Salisbury & D'Aria, 1992). By calculating the FVC from an orthophoto, the whole area is covered, whereas calculating FVC from a terrestrial camera some areas within the study site are out of sight due to the terrain (Fig.  10). In addition, some areas with high σ were masked as well and contribute to the reduction of the covered area. As an input to a correction model for LST measurements from a terrestrial thermal camera with the same location and viewing angle, the obtained coverage is sufficient. For FVC mapping of the entire southwestern part of Mt. Schrankogel, many gaps with no data occur. Nevertheless, a large part of the southwestern flanks of Mt. Schrankogel is visible. In future, the spatial coverage could be improved with imagery from two other webcams installed at the study site. So far, the presented approach to LSE retrieval does not take into account that senescent grass can have lower emissivity values than bare soil, as 0.95 is the limit in the chosen approach (Kodimalar et al., 2020). Future work could test an approach that estimates soil fractional cover with constant emissivity from summer FVC, combined with a variable vegetation fraction emissivity from ExG (or any other available VI), as both soil emissivity and true FVC will not change significantly during the season (Fig. 12). Emissivity changes during the season should be accounted for if measurements over longer time periods are made. The other way round, in emissivity retrieval from RGB imagery, seasonal changes in vegetation radiation reflectance have to be considered. Using the in-field measurements of FVC as basis for threshold calculation and reference seems to be problematic as the compatibility between field data and remote sensing data is questionable in this case. That is true for both NDVI and ExG. A mismatch of the two data sources occurs for areas with low FVC in form of a too low threshold (especially for ExG) and for areas with high FVC in form of large scattering (both ExG and NDVI).

CONCLUSION
The study presents a workflow for extraction of land surface emissivity (LSE) from webcam imagery and assesses difficulties concerning seasonal changes and technical equipment. The results show that ExG from webcam imagery performs slightly worse than the well tested NDVI. Despite its relatively good agreement with in situ derived FVC, the monotemporal natureof orthophoto-based NDVI limits its validity and applicability for retrieving LSE over longer time spans. Using webcam image time series to parameterize LSE allows to track seasonal changes in vegetation emissivity, which can be an advantage if inter-daily variation of emissivity is expected to be a significant factor. The results of this study will be used to correct multitemporal measurements of land surface temperature (thermal infrared) and illustrate the parameterization of LSE based on Fractional green Vegetation Cover derived from a multitemporal vegetation index. .