ASSESSMENT OF CANOPY AND GROUND HEIGHT ACCURACY FROM GEDI LIDAR OVER STEEP MOUNTAIN AREAS

: Active remote sensing systems orbiting the Earth are only a small portion of the current constellation of satellites and will increase in number and advance in technology in the future. The launch of the GEDI sensor in December 2018, for an expected life-span period of about 2 years, is a fundamental step of this revolution, as it is the first spaceborne full-waveform lidar specifically designed for measuring the structure of ecosystems, providing information of the vertical profile of forests. Accuracy assessment of GEDI height metrics in the context of an Alpine forest environment in steep terrain scenarios has been conducted in this study. We used discrete return lidar from a recent aerial laser scanner survey as reference to analyse differences of heights of terrain elevation and maximum canopy height of the vegetation detected in each GEDI footprint. The height metrics differences between the discrete lidar and the GEDI data were then analysed to verify any correlation with the following factors: morphology (terrain slope), land cover (land cover type, fraction of canopy cover, vegetation density), GEDI laser beam characteristics (day/night-time acquisition, full power vs coverage laser beam, beam ID, laser sensitivity). Further analysis involved shifting the footprints’ location in 8 different direction and 4 distances to assess the impact of geolocation errors on accuracy and precision. Results show that what most influences accuracy in this study is the terrain slope, very likely linked to the uncertainty of geolocation of the GEDI footprints, suggesting caution in using single GEDI footprints if located in steep environments. Other than slope, terrain height accuracy varies mostly with forest type (conifer vs broadleaves), but not significantly with other factors. Canopy height instead is affected by most factors; high vegetation canopy is overestimated by ~3 m in GEDI, and underestimated by 3 m over heath and bushes (median difference). Higher sensitivity pulses and night-time pulses provide better accuracy. Laser beams with full power also have better accuracy; beams with id 1000 and 1011 provide the most accurate canopy heights. Shifting the footprint position decreased accuracy except at 15 m and 270° with respect to orbit direction (left-looking).


INTRODUCTION
The GEDI mission was selected by NASA in July 2014, as part of the Earth Venture Instrument-2 program; after the various stages of development, on 5 December 2018 it was launched from the Kennedy Space Center aboard a Falcon 9 Block 5 rocket (University of Maryland, 2021). The GEDI instrument provides full-waveform data ( Figure 1) and is composed of three Nd:YAG lasers, each of which emits 242 pulses of electromagnetic energy per second with a near infrared wavelength equal to 1064 nm (Dubayah et al., 2020). One of the lasers is divided into two beams (defined coverage beams), while the other two operate at full power each producing only one beam (power beams), for a total of four laser beams, each of which produces an imprint on the surface with a diameter of about 25 m. On the ground, however, the traces produced are twice in number, because the laser pulses are subjected to dithering at alternating intervals, producing as an effect a very rapid and very small angular deviation (1.5 mrad) of each laser beam and thus obtaining eight tracks on the earth's surface approximately 600 m apart from each other; instead the distance between the footprints along the flight direction is about 60 m (see Figure 2). GEDI is a complete waveform lidar system, for each laser pulse emitted, the instrument records the energy distribution curve returned to the sensor, from which various vertical information is then extracted, such as elevation of the ground plane and the highest return, the vertical profile of the vegetation or the RH (Relative Height, intended for surfaces with vegetation) parameter, i.e. the height relative to the ground at which a certain quantile of returned energy is reached (Dubayah et al., 2020)a representation can be seen in Figure 3. The georeferencing of each GEDI footprint is calculated via orientation and positioning of the emitting source. GEDI uses its own global navigation satellite system (GNSS), an inertial measurement unit (IMU) and three star-sensors (Dubayah et al., 2020). Initial statements reported horizontal geolocation accuracy for the final calibrated products to have horizontally RMSE equal to ~10 m and vertically equal ~50 cm. However, a reduction in the geolocation error up to ~8 m horizontally and 10 cm vertically was foreseen following the post-launch data calibration processes. The reference surface to which all data refer is the WGS-84 ellipsoid.
The overall application goal of GEDI is to allow the scientific community to learn about forests and forest ecosystems, by providing 3D vertical samples of the vegetation. Upcoming space missions with partially overlapping goals are the NASA-ISRO NiSAR (Kellogg et al., 2020), which will use full polarimetric Synthetic Aperture Radar (SAR) at L-band and S-band, which will provide a large range of applications that include ecosystems study. The BIOMASS mission by the European Space Agency (ESA) is also a SAR mission but it is dedicated specifically to accurate biomass mapping globally (Tebaldini et al., 2019). The effort towards improving accuracy and revisit time is quite clear from the investment, but it is important to validate the data and products to understand the reliability of the information in different scenarios (Pirotti et al., 2014;Vaglio Laurin et al., 2016).

Study area
The study area included a series of Veneto mountain municipalities located in the provinces of Vicenza and Belluno. They were chosen on the basis of the availability, in those areas, of discrete high-resolution lidar data that temporally coincided with the GEDI data. The municipalities concerned were divided into 5 blocks whose name derived from discrete lidar data: block 1 (municipality of Rotzo), block 2 (municipalities of Gallio and Enego), block 3 (municipalities of Sovramonte and Feltre), block 5 (municipalities of Livinallongo del Col di Lana, Colle Santa Lucia, Rocca Pietore, Alleghe, Canale d'Agordo, Vallada Agordina, San Tommaso Agordino, Cencenighe Agordino and Taibon Agordino) and block 6 (municipalities of Borca di Cadore, Calalzo di Cadore , Zoppè di Cadore, Valle di Cadore and Cibiana di Cadore)all areas are located in the Alpine region in the north of Italy -see Figure 4. The average height above sea level of the areas ranged from 800 m to 1700 m and average slope from 20° to 36° with an overall average of 29°.

Airborne laser scanner (ALS)
The airborne laser scanner campaign took place after the VAIA storm (Vaglio Laurin et al., 2021) at the end of October 2018, across a period of several days in 2019 with leaf-on conditions and with a lidar sensor that returned discrete echoes. The corresponding point density ranged between 8 and 30 points per square meter, with an average of 12 points per square meter in the areas.

GEDI data
Level 2A GEDI data were used by subsetting a dataset collected for the northern part of Italy and stored and shared in the cloud through an ad-hoc web interface implemented by the authors (see Figure 5). The data were downloaded from NASA's Land Processes Distributed Active Archive Center (LPDAAC). Level 2A data provides the vertical profile of the surfaces inside the footprint through the relative height (RH) metrics. This dataset provides ~15 million GEDI footprints and is available for anyone that requests access. The interface also allows to query the single GEDI footprint and retrieve the RH metrics (see Figure 5).

Figure 5.
GEDI data interfacetop is the area with available data, bottom is data from a single footprint. Colored points are the 100 RH metrics in GEDI data. Black points are first derivative of RH metrics.
For this study a total of 54,626 GEDI footprints that fall inside the areas in Figure 2 were initially selected and further filtered to remove points falling in urban areas.

GEDI and discrete lidar processing
Each of the 54,626 footprints were analysed and compared with the discrete lidar dataset. Each footprint was used to clip the corresponding discrete lidar point cloud, which was then normalized to the ground points (see Figure 6). These two versions of the clipped data provided the necessary means to compare the GEDI RH metrics, as we have the ground plane, the canopy heights and the surface height. Intersection of each footprint with a land-cover map (LC) allowed to filter and keep only footprints that belong to three land cover categories: forests, heaths and bushes, pastures and bare rocks. The LC map was created in 2012 at 1:10,000 scale, thus with acceptable tolerance with respect to spatial location accuracy with respect to the 25 m diameter GEDI footprintsthe source was the Vento Region Cartographic Agency.
Further filtering on the three LULC categories was applied for each category: (i) broad-leaves, conifer and mixed forestsonly footprints with canopy heights between 5 m and 50 m. The lower limit was defined as significant to remove shrubs and other low vegetation.
The higher was defined from the canopy height data by calculating the third quartile and adding the interquartile rangethis removed outliers with exceeding canopy heights; (ii) heaths and bushessame as the previous category (forests) but without any forest LC intersecting the footprint area; (iii) pasturesmaximum canopy height is 2 m and no forest cover intersect the footprint; (iv) bare rockno canopy presence.
This last filtering phase avoided footprints with "mixed" categories and removed outliers. It led to the final identification of 29,377 GEDI footprints, divided into the different LC categories as follows: 13,409 for coniferous forests, 6,117 for broad-leaved forests, 1,335 for mixed forests, 1,847 for heaths and bushes, 3,320 for pastures and 3,349 for bare rocks (see table  A.1 in appendix).
The RH metrics were compared to the terrain, surface and canopy heights (h) from the discrete lidar data. The GEDI heights are referred to the WGS84 ellipsoid whereas the ALS points are in orthometric height with reference to the ITALGEO2005 geoid. The GEDI height metrics were transformed to orthometric heights using the ITLGEO2005 geoid to be consistent with ALS data. The difference between the GEDI and discrete lidar was analysed through the following metrics, two of them medianbased as the residuals did not have a normal distribution.
Where MdAD is the median of absolute deviations, Δhj are the height differences (j = 1, …, n) and mΔh is the median of the differences (Höhle and Höhle, 2009). The factor b is a constant value necessary to for MdAD to be consistent for a given parameter of interest. In this case it is the standard deviation and it therefore equal to 1.4826 (Rousseeuw and Croux, 1993). Other statistics calculated were the median absolute error (MdAE): which represents the threshold above and below which 50% of the errors are found (from the point of view of their magnitude, not of the real distribution). The last metric used is the root mean square of errors (RMSE). In this context, ther term "errors" refers to the difference GEDI -ALS as ALS is considered the control measure as its accuracy is two orders of magnitude higher than GEDI footprints for horizontal component and one order of magnitude for the vertical component.

Geolocation accuracy validation
To test for the effect of geolocation accuracy, each of all 29,377 GEDI footprints was shifted in magnitude and direction, with respect to their central theoretical position. Shifts magnitudes were 5, 10, 15 and 20 m (i.e. respectively at 0.5σ, 1σ, 1.5σ and 2σ of geolocation accuracy) along eight directions, i.e. 0, 45,90,135,180,225,270 and 315 ° with respect to the direction of flight of the ISS (see Figure 7). The latter was deduced from the progression of the shot numbers in the GEDI data and ranged approximately from W-NW to E-SE or from W-SW to E-NE. In practice, 32 alternative positions have been identified for each footprint. Height metrics from ALS-derived height rasters were again extracted in each footprint and compared with GEDI RH values. Height rasters consisted in a 0.5 m ground sampling distance (GSD) digital terrain model (DTM) and digital surface model (DSM). Median height in footprint was used from the DTM and highest value in footprint for DSM to match respectively ground return and highest reflecting surface return ( Figure 3).

Figure 7.
In red the original footprint, and shifts of the hypothetical location. Yellow arrow is the orbit direction.
The objective of replicating the measures of differences between the GEDI RH values with the newly calculated surface heights from the discrete lidar is to test how accuracy metrics change with the shift in position. It would be expected that geolocation errors, if not biased in direction, should randomly change with these shifts with zero average.

Height differences and slope
The results reported focus on terrain height above sea level and canopy height of vegetation with respect to the ground. Seven tables for terrain and canopy height GEDI vs ALS comparisons are reported in the appendix. Plots in Figure 8 and Figure 9 show the distribution of errors, i.e. height (h) differences between GEDI and ALS (∆h = GEDI -ALS) with respect to slope.  Overall metrics are reported on the table belowmore detailed error metrics are in tables in appendix.

Height differences over multiple variables
The ∆h values, for terrain and canopy, were then separated by different characteristics, to assess if there is an evident relationship with one of the following six factors: (i) land cover category; (ii) canopy cover percentage, i.e. how much of the footprint is covered by low or high vegetation; (iii) vegetation density, calculated by the ratio of number of points above ground to total number of points in the ALS point cloud inside the footprint; (iv) laser sensitivity as reported in GEDI data; (v) day vs night-time GEDI laser shots, to check if sunlight background noise influences the GEDI information; (vi) laser beam ID (see Figure 2).  Tables in appendix report error metrics using the median, interquartile, MdAD, MdAE and RMSE for all six variables for both terrain and canopy heights comparisons. GEDI vs ALS differences for terrain height and canopy height are summarized in Figure 10 and Figure 11.
The accuracy metrics were then calculated for each slope class and plotted in Figure 12 and Figure 13 along with the hypothetical vertical component of the 1σ and 2σ horizontal misplacement at each slope category.

Artificial footprint shift accuracy change
As reported in the methods section, the GEDI footprint location was changed in magnitude and direction with respect to the orbit direction. The change in the accuracy metrics is reported in Figure 14. Only changes in terrain height differences are reported as canopy height differences follow the same trend.

Terrain height estimation
The results clearly show a strong correlation between slope and accuracy of both terrain and canopy heights. The land cover categories ( Figure 10) show that in conifer forest GEDI terrain height estimate is more accurate than in broadleaves. This is likely due to the canopy structure of broadleaves that obstructs more the laser photons. Rock land cover has the higher error, but it must be noted that this might be related to slope as rocks are a land cover category in high mountain areas where slope values are usually higher. Also, the much lower error dispersion in the pastures class is to be related to the relationship with slope, as pastures are preferably in low slope scenarios and never over a certain slope value. These hypotheses should be further tested by multivariable analysis. Other factors like vegetation density, canopy cover, day vs night footprints and laser sensitivity do not impact error distribution significantly for terrain heights.

Canopy height estimation
Regarding canopy heights from GEDI data it is a bit different story from terrain heights. In Figure 11 it can be seen that the land cover category boxplot shows that high vegetation (conifer, broadleaves, mixed) scenarios tend to underestimate canopy heights by ~3 m. Opposite for low vegetation that has overestimation by ~3 m. Canopy heights are very sensible to the ability of detecting the terrain below the canopy, so these different accuracies can be caused from GEDI laser either hitting the real ground or, in case of thick canopy, have the last photons reflected by surfaces above the real ground plane. In the first case the error is very low, in the second case only underestimation can occur, thus the canopy height errors are biased toward negative values.
Canopy cover has an effect on canopy height estimation (Pirotti et al., 2017) that reflects the land cover category, as a lower canopy cover ratio in the footprint overestimates canopy heights and higher cover has the same error distribution as conifer land cover. Laser sensitivity changes the accuracy of GEDI canopy heights, with higher sensitivity with higher accuracy. Also day and night-time differ slightly, suggesting that night-time laser provides better canopy height estimation. Last box showing the different laser beams show that the full power beams (last four) have better canopy height accuracy then the first four that represent coverage beams Figure 2).

Comparison with existing literature
Several investigations concern the use of simulated GEDI data combined with other datasets to calibrate sensors and estimate the accuracy of the mission  or to monitor key attributes of forest structure (Qi et al., 2019). At present, there are few investigations found in literature with similar approaches to this presented study. Accuracy estimation of the ground elevation and canopy height in temperate forests in Germany was carried out by Adam et al. (2020) concluding that median difference of GEDI-ALS canopy height is 2.11 m with MdAD equal to 2.98 m, thus higher accuracies than the ones resulting in this investigation where these values are respectively 1.32 m and 13 m. It must be noted though that the mean slope in this investigation is significantly higher (29° in our study area and 16° in Adam et al.). According to Figure 12 and Figure 13 this difference can well account for the increase in MdAD as more footprints in this study fall in steeper slopes. Regarding the type of land cover, in their work there is a greater difference in terms of MdAD between coniferous and mixed forests (low in the former, greater in the latter) than what is reported in our study. Quiros et al., (2021) report on a very similar investigation as the one presented here, as GEDI terrain height accuracies in southeastern Spain were compared to ALS. In their work an RMSE of 6.13 m was obtained, lower then our results that reached 20 m in slopes up to 20°. More interestingly the authors shifted the footprint like in our study; they had an accuracy improvement by shifting 270° and 10 m, very similarly to our results (see Figure  14), only that in our case the best improvement was at 15 m at the same 270° direction. This deserves more in-depth investigation to define the causes. One possible cause of accuracy improvement preferentially happening at a certain direction could be due to anisotropy of the terrain morphology, i.e. mountain ridges and sides with preferential orientation, which, in combination with the orbit directions, provide a geometry that enhances errors only at a certain angle. In our study the distribution of the slope direction (aspect) for all footprints did not show any significant preferred direction, therefore this hypothesis was discarded and the cause is still an open question.
An investigation in 40 sites in the United States led to GEDI RMSE values over canopy height differences changing from 5.02 m to 3.56 m when using night-time footprints (Liu et al., 2021). The paper also reports higher accuracies when using full power laser beams (0101,0110,1000,1011) with respect to coverage laser beam (Figure 2). This is consistent with Figure 11 were it can be seen that the first four beams (coverage) have higher dispersion and low median values, whereas the last four have better accuracy. It must be noted that in Figure 10 and Figure 11 both show a different behaviour of the last two beams (1000,1011), seemingly providing higher accuracies also with respect to the other full power beams. This is also an open question that is worth further investigation.

CONCLUSIONS
Studies like this one, in a mountain environment, are important as they allow to test and explore the potential use and limitations of GEDI data in very complex and varied terrain scenarios. GEDI is distributed as open data and provides great support to researchers around the world to investigate on forest vertical structure. The results obtained here suggest caution in using GEDI data in an alpine forest environment, more generally in any steep slope environment. Vertical forest structure can be generalized over areas by aggregating multiple footprints, but the single footprint metric has to be considered with care unless over a relatively flat area.
Terrain height accuracy varies mostly with forest type (conifer vs broadleaves), but not much with other factors. Canopy height instead is affected by most factors, suggesting to use correction factors (~+3 m for high forests and ~-3 m for low forest). Higher sensitivity pulses and night-time pulses should be used for best accuracy. Laser beams with full power have better accuracy, the ones with id 1000 and 1011 provide the most accurate canopy heights. The accuracy improvement from shifting 15 m at 270° must be further investigated to address possible causes.
A last important note is that at time of finalizing this work, version 2 of GEDI data was available. It is likely that the accuracy and precision of the GEDI data have improved, but this work was finalized with version 1. Future work will use version 2 and address the open questions mentioned in the discussion, as well as test GEDI over known forest volume loss such as the recent VAIA storm (Piragnolo et al., 2021).