ANALYSIS OF CORRELATION BETWEEN FULL-WAVEFORM METRICS , SCAN GEOMETRY AND LANDCOVER : AN APPLICATION OVER FORESTS

For a correct use of metrics derived from processing of the full-waveform return signal from airborne laser scanner sensors any correlation which is not related to properties of the reflecting target must be known and, if possible, removed. In the following article we report on an analysis of correlation between several metrics extracted from the full-waveform return signal and scan characteristics (mainly range) and type of land-cover (urban, grasslands, forests). The metrics taken in consideration are the amplitude, normalized amplitude, width (full width at half maximum), asymmetry indicators, left and right energy content, and the cross-section calculated from width and normalized amplitude considering the range effect. The results show that scan geometry in this case does not have a significant impact scans over forest cover, except for range affecting amplitude and width distribution. Over complex targets such as vegetation canopy, other factors such as incidence angle have little meaning, therefore corrections of range effect are the most meaningful. A strong correlation with the type of land-cover is also shown by the distribution of the values of the metrics in the different areas taken in consideration. * Corresponding author. This is useful to know for communication with the appropriate person in cases with more than one author.


INTRODUCTION
Airborne LiDAR (or Airborne Laser Scanning -ALS) in the past ten years has seen rapid growth in various applications.It is particularly of interest applied to forest cover, due the fact that gaps in the canopy allow penetration of the laser signal, sometimes all the way to the ground level.Canopy structure can thus be represented in terms of multiple surfaces which act as separate targets each with its own reflection properties.Interest in forest applications is such that specific appointments (e.g.Silvilaser conferences) have been organized on the topic.LiDAR data is used to measure and infer various forest parameters of interest.In the case of discrete return (DR) data the available information is the point position in space, its ordinal position in terms of echoes (unique or first, intermediate or last), and its radiometric information commonly referred to as intensity (Shan & Toth 2008).The vegetation layer is differentiated from other classes (e.g.buildings and ground) by using height from ground methods and removing nonvegetation by classifiers which use LiDAR derived features both from raster products and distribution of point characteristics (Höfle et al. 2012).First and last echoes play an important role in discriminating vegetationtrees in particular from other above-ground elements such as buildings, power lines etc.The spatial distribution of echoes and their return ordinal number is used as input in descriptors, such as the slope adaptive echo ratio, which is correlated with canopy structure and density (Höfle et al. 2008, Rutzinger et al. 2008, Eysn et al. 2012).Brandtberg (2007) uses a different approach, but the same characteristic of multiple return data, to define criteria to separate vegetation from other elements using digraphs (Ross & Wright 1992) for classifying points accordingly.
Several LiDAR sensors are able to digitize full-waveform (FW) information of the return signal.Processing FW data does pose a challenge due to the size of the data and the increase in computation time required.It also has advantages which have been proven in recent research and presented in literature.One is the possibility to extract more intermediate returns with respect to DR data; depending on canopy density and the laser characteristics three times more intermediate returns are potentially detectable (Reitberger et al. 2008a(Reitberger et al. , 2008b)).The processing methods require first to detect peaks and successively to proceed with decomposition of the energy distribution around each peak to extract metrics such as amplitude and width of the Guassian-like shape.Literature reports numerous methods ranging from a relatively simpler application of a zero-crossing derivative filter with successive calculation of width with the full-width-at-half-maximum (FWHM) criterion, up to more complex methods of different types of Gaussian-fitting methods or other modelling approaches (Hofton et al. 2000, Roncat et al. 2010, Mallet et al. 2011).These methods differ for accuracy of peak detection and speed of calculation (Laky et al. 2011), see Parrish et al. (2010) for an empirical study of methods.Metrics derived from the waveform shape are not only the amplitude (or so-called intensity) and the width, but also other features can be extracted to describe the distribution of the return energy to the sensor.An informative example is found in Neuenschwander et al. (2009) where the following metrics have been tested for extracting land-cover classes against classification of only high-resolution imagery: Gaussian amplitude, Gaussian standard deviation, canopy energy, ground energy, total waveform energy, ratio between canopy and ground energy, rise time to the first peak, fall time of the last ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume II-5/W2, 2013ISPRS Workshop Laser Scanning 2013, 11 -13 November 2013, Antalya, Turkey peak, and height of median energy (HOME).The FW-derived metrics improved significantly the supervised classification accuracy.
An important point to consider when using FW metrics is their independency from factors other than what is analysed.
Correlation between metrics and flight characteristics has been a topic of research, especially in terms of radiometric calibration.
To normalize a metric ideally all other dependencies have to be removed.That is the reason many methods make use of either manually positioned targets or "natural" targets such as asphalt roads (Wagner et al. 2008) or flat roofs.Jutsi & Gross ( 2009) have reported on different methods proposed in recent literature.Höfle & Pfeifer (2007) have reduced radiometric systematic errors by applying specific corrections depending on the type of influence which determined the error.
In the following work we analyse the correlations between seven metrics extracted from the waveform with scan geometry (scan-angle and range) and land-cover (forests, urban, lowvegetation/grasslands).

Study area
The survey was done the 20th of June 2011 with a helicopter equipped with Optech's ALTM 3100 sensor and a Rollei AIC modular P45 digital metric camera;

Principles of laser backscatter behaviour
The radar equation relates the energy incoming to the receiver of the laser sensor to the scattered energy as a function of the energy incident on the target and other parameters as below (from Wagner et al. 2008): where Pi and Ps is respectively incident and scattered energy, D r is the diameter of the receiver aperture  is the emitter beam width, R is target range from the emitter and  is the target's backscatter cross-section which can in turn be derived (from Wagner et al. 2006) as: where C cal is the calibration parameter, R is the range between emitter and target, and p is width and s p is the amplitude of the backscatter cross-section.

Metrics
In the following work we analyse the correlations between scan geometry and metrics extracted from the waveform.The metrics taken into consideration are listed below, referencing Figure 2 and with an overview of the calculation method: -Normalized Intensity (I); a decimal value calculated as the ratio between the amplitude at the detected peak (Peak Energy - Each metric has been extracted for six strips, four over areas almost completely covered with forests (strip 7-9,11), two over flat areas with agricultural and urban area (strip 2 and 3).

Waveform and metrics extraction
Laser scanner data recorded during the flight are stored by the system in Optech data file formats.One file (CSD) contains the sensor position, orientation and scan angle at each emitted laser pulse, as well as range and intensity for up to four return echoes defined by on-board processing.The waveform data itself is stored in a database file (NDF) which holds variable length records, with a GPS timestamp, the outgoing pulse waveform and up to 7 segments of the corresponding return waveform.Two files are used for indexing using GPS time and index of position in CSD and NDF file.The process for extracting the metrics is depicted in figure 3. The positioning of the return pulses is calculated using transformation matrices between scan world and real world coordinates, with a correction factor to compensate for the missing information on sensor internal geometry.2).This would bring larger amplitude readings and larger widths; the four topmost plots represent the size of the waveform upper half of the peak, not normalized by range.In fact this result would seem to be opposite to what one would expect to be the effect of vegetation on the waveform, a broader width of the energy around the peak.The same result can be seen from the FWHW (the width) which is represented in the third row right plot in figure 4. Also in this case no normalization was applied, as it has been applied to the CS value and discussed later in this section.
The intensity metric is instead normalized by the method of extraction, as reported in section 2.4.In this case the plot of the frequency distribution of intensity values shows most points with high intensity values for strip 3 (urban and grassland) and a bi-modal distribution for strip 2 and very similar distributions between the remaining strips which are all in the forested areas.The bi-modal distribution in strip 2 is explained by the presence of fields with different crop-growth (with crop and bare fields), as is shown in figure 6.Half points are over fields with crops, the other half over fields without crops.This bi-modal shape is also seen in row 2 of figure 4, where LE and RE are reported, and in RE, but not in the rest of the metrics plotted.Widthrelated metrics -LW and FWHM are not influenced significantly by low-vegetation such as crops whereas RW is, as can be seen in the plot in figure 4.This suggests in this specific case that the so-called "fall-time" (Neuenschwander et al. 2009) of the first pulse is a significant discriminator of low-vegetation against grassland, along with intensity value, in flat areas.
No correlation was found between sensor/target range and intensity values, in any of the strips analyzed.This holds positively for the use of the normalization procedure of the peak intensity, namely assigning as intensity the value at peak divided by the total energy values for the waveform.This automatically calibrates the range effect which is considered to have a squared effect over the return energytwo identical targets at twice the distance from sensor will have four times the return intensity in experimental conditions.To check we have plotted in figure 5 the amplitude values without normalization.It is clear that there is a strong dependence even if not quitebut almostwith distance squared.Some points show a deviation from the main distribution, almost all below it.These are echoes after the first echo, where energy loss is mostly related to previous target obstructions than to range.
Figure 6.Amplitude values as function of range in strip 8 The WAI indicator, as figure 4 at the bottom-left plot shows, has values' distribution below one for all strips except strip 3 (table 3).Values below one indicate that the left width is larger than the right width, thus the width of the waveform around the peak is shifted towards the right, and viceversa for values above one.Results show that the WAI is not correlated with range and little with type of land cover.It is interesting to note that most values (up the 3 rd quantile) are below one for most strips except 2 and 3.This leads to think that vegetation cover increases rise time of first peak at a greater degree than the fall time of first peak.The CS metric, the normalized cross-section, when plotted, reports significant differences between strips.The forest-cover returns more variance in the values of cross-sections with respect to the two non-forested strips.In this case we have used the backscatter cross-section and not the backscatter coefficient as described in Wagner et al. (2008) as it is indicated for singleecho returns and not for multiple targets with small areas, such as in the case of vegetation, which is the objective of this study.

CONCLUSIONS
Metrics from FW can be used in various applications as they are related to the texture of the targets which they intercept and on the structure of elements which only partially obstruct the laser cone (e.g.vegetation).Careful consideration has to be used when using such metrics as feature vectors in un-supervised or supervised classifiers for segmenting the area into classes of interest.The results show that scan geometry does not have a significant impact scans over forest cover, except for range affecting amplitude and width distribution.From a practical point of view and in the specific case of vegetation land-cover, scan geometry correlation is removed in calculating the crosssection applying a range correction as in equation 3 and using the normalized intensity value which considers the peak energy over the total return energy above the baseline energy.
Several factors which in literature have shown to directly or indirectly influence components of FW metrics are not interpretable (e.g.incidence angle) over complex targets such as vegetation canopy.The cross-section is the optimal metric to use over forest cover to further investigate the possibility for further class specification, maybe on the ability to differentiate between upper forest structure or dead trees vs. live trees.This will be the topic of further investigation.

Figure 2
Figure 2 Schema of FW and metrics' graphical representation -Right Energy (RE), the total energy from the peak value time to half maximum of falling energy; -Waveform Asymmetry Indicator WAI; this metrics represents the shape of the waveform by reporting its asymmetry by the ratio between left and right width, LW RW .Values above one

Figure 3 .
Figure 3. Work-flow of data extraction

ISPRS
Figure 4a.Metrics' values frequency distributions for each strip

Figure 7 .
Figure 7. Strip 2 -aerial image of area (left) and intensity map (right)

Table 2 .
As reported in figure1the area is located in the north-east part of Italy.Six survey strips are taken in consideration for data processing and result assessment, three in forested mountainous areas, and two in flat crop/urban areas for comparison.In table 2 a short summary is provided listing the number of the flight strip, the number of laser pulses emitted, mean and standard deviation of the distances between sensor and target in each strip considered.Characteristics of analysed strips table 1 summarizes technical characteristics.Figure 1.Study area with data strips representation