NON-LINEAR METHODS FOR INFERRING LIDAR METRICS USING SPOT-5 TEXTURAL DATA

Although many studies have demonstrated the utility of airborne lidar for forest inventory, the acquis ition and processing of the data can be cost prohibitive for small areas. In such ca ses, it may be possible to emulate lidar metrics us ing more affordable optical data. This study explored processing methods for predicti ng lidar metrics using SPOT-5 textural data. Multip le-linear regression (MLR) was compared with non-linear machine learning techn iques including multi-layer perceptron (MLP) artifi c al neural networks (ANN), rational basis function (RBF) ANN and regressi on tree (RT). For this purpose, 11 grey level co-occ urrence matrix (GLCM) indices were calculated for bands, band ratios and principal components (PCs) of SPOT-5 multispectral i mage. SPOT-5 metrics were correlated with 25 lidar metrics collected over a Pinus radiata plantation. After dimensionality reduction, random forest feature selection was applied to select the most relevant S POT-5 textural attributes for inferring each lidar metric. The results showed that the non-linear methods including MLP and RBF methods are more promising for modelling lidar metrics usin g SPOT-5 data than MLR and RT. * Corresponding author.


INTRODUCTION
The quantification of forest structure is utilised for many forest management purposes including the assessment of productivity and wood volume based on parameters such as basal area, stand volume, and stocking (Wulder, 1998).Accurate quantification of forest structure variables and their characteristics assist local or global decisions on forest harvesting, management and protection (Boyd and Danson, 2005).
Airborne lidar has been shown to be an efficient data source for quantification of forest variables (Popescu et al., 2002;Bortolot and Wynne, 2005) as well as fusion with other types of data, especially optical data (McCombs et al., 2003;Popescu et al., 2004); however, the use of lidar data is constrained due to the high data acquisition cost, time-consuming data processing, and limited existing coverage (Sexton et al., 2009).This can be a particular issue for small isolated estates which cannot take advantage of economies of scale afforded with large lidar projects.In order to overcome these limitations, recent remote sensing studies have aimed to predict lidar metrics using optical data with different resolutions (Hilker et al., 2008;Chen and Hay, 2011) for the areas which lack lidar coverage.
Lidar-derived mean and maximum canopy heights are commonly the main lidar metrics modelled using a multiplelinear regression (called MLR hereafter) modelling approach (Wulder and Seemann, 2003;Hilker et al., 2008).But according to other studies, although mean and maximum canopy heights are useful parameters for quantifying the structure of the pine forests or plantations (Pascual et al., 2008), other lidar-derived height metrics such as variance (Zimble et al., 2003), median (Pascual et al., 2008), coefficient of variation (Ritchie et al., 1993), skewness, and percentiles of lidar-derived canopy heights (Shamsoddini et al., 2013b), are also useful for estimating structural parameters when lidar data is used individually or in synergy with other remotely sensed data.Various studies have shown that spectral derivatives and textural information extracted from optical data can be correlated with lidar metrics (Pascual et al., 2010;Chen and Hay, 2011).The grey level co-occurrence matrix (GLCM) method is one approach that has been frequently used in forest structure mapping (Ota et al., 2011).This method is used in this study to extract the textural information of SPOT-5 multispectral data.
Several non-linear machine learning methods, such as multilayer perceptron ANN (called MLP hereafter), radial basis function ANN (called RBF hereafter), and decision or regression tree have been commonly used for different remote sensing applications, especially for land cover classification (Atkinson and Tatnall, 1997;Keramitsoglou et al., 2005;Hsieh, 2009), as well as for estimating different structural parameters of forests (Shamsoddini et al., 2011;Gómez et al., 2012).However, most lidar metric prediction studies have focused only on MLR, and hence there is a need to compare this approach with other alternatives such as non-linear modelling methods, including machine learning techniques such as regression tree (called RT hereafter), MLP, and RBF.To undertake such an analysis, a plot-based method was used to investigate the performance of textural attributes of SPOT-5 multispectral data for predicting 25 lidar metrics using nonlinear and linear methods.Hence the aims of this paper are to: ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume II-5/W2, 2013ISPRS Workshop Laser Scanning 2013, 11 -13 November 2013, Antalya, Turkey

•
Evaluate the utility of SPOT-5 textural data for predicting lidar metrics.

•
Compare MLR as a common modelling method with non-linear machine learning techniques including MLP, RBF, and RT for predicting lidar metrics using optical data.
In the next sections, the study area and remotely sensed data used for this study are explained.Then, the methodology and results are given and finally the results are discussed and the paper is concluded by the last section.

Study Area
The study area covers a 5000 ha Pinus radiata plantation from 35° 23 / 35 // S to 35° 29 / 58 // latitude, and 147° 58 / 48 // E to 148° 04 / 02 // E longitude, located within Green Hills State Forest (SF), near the town of Batlow in New South Wales, Australia.Green Hills SF includes 835 compartments with a net planted area of 20,400 ha. Figure 1 shows the study area on a false colour composite image of SPOT-5.
Figure 1.The study area shown on false colour composite SPOT-5 image; the yellow rectangle shows the boundary of 5000 ha study area used for this research while the green line shows a part of Green Hills FS boundary.

Remotely Sensed Data and Pre-processing
Multispectral SPOT-5 imagery, including green, red, near infrared (NIR) and shortwave infrared (SWIR) bands, was acquired on 5 April 2008.The orthorectified SPOT-5 data was provided with spatial resolution of 10 m; the SWIR image whose pixel size was originally 20 m, was resampled to 10 m.The orthorectified SPOT-5 multispectral image was registered to an orthorectified WV-2 image (2 m pixel resolution) based on 50 identified common points using a first order polynomial function followed by nearest neighbour resampling method.
Registration accuracy was estimated to be better than half pixel.A Dark object subtract method (DOS3) was exploited to atmospherically correct the optical data and digital numbers were converted to reflectance values.The value of cos (i), which is the incident angle between the sun and a horizontal surface, was calculated according to Riano et al. (2003).There was no need for topographic correction as the examination of the relationship between cos (i) and the radiance of each band did not show significant correlation after removal of path radiance.
Airborne lidar data supplied by FNSW was acquired in July, 2008 using a HARRIER 56/G3 fully-integrated sensor with LMS-Q560 laser scanner (Riegl, Austria).The acquisition parameters were set to achieve a pulse rate of 88,000 Hz, 60 cm footprint size and 2 pulses per m 2 with a maximum scan angle of 15°.Following the collection of the lidar data, a 0.5 m resolution DTM was generated by applying a standard triangular irregular networks (TIN) modelling technique.A digital surface model (DSM) with matching pixel resolution was generated by selecting the highest lidar point elevation value per cell.Finally, the DTM was subtracted from the DSM to construct a canopy height model (CHM).The quality of the CHM was further improved by removing canopy pitting based on a new method developed by Shamsoddini et al. (2013b) which incorporated an adaptive mean filter (AMF) within a 7×7 search window.

Sample Design
Sampling data was required for model training and testing purposes and to produce a validation dataset for adjusting the machine learner parameters and feature selection method.The sample population within the plantation estate was defined using existing Geographic Information System (GIS) vector layers supplied by FNSW.Stand information such as ground slope, age class and thinning condition was used to stratifiy the forest based on a previous FNSW study (Stone et al., 2010;Turner et al., 2011).Irrelevant areas such as eucalyptus stands, bare and grass lands were also masked out using GIS data.Lidar samples were collected in three different strata including: slope (less than 10°, more than 10 and less than 20°, and more than 20°); thinning condition (unthinned, first thinning and second thinning); and tree age (less than 20 years and more than 20 years) using the GIS map of the plantation.
The potential issue of spatial autocorrelation violating the assumption of sample independence (Congalton and Green, 2009) has been shown to be a function of the spatial resolution of remotely sensed data, the scale of study, and age classes of coniferous stands (Cohen et al., 1990;Hyppänen, 1996).Therefore, it was required to examine spatial autocorrelation for all the remotely sensed datasets and strata to determine the minimum separation between samples to prevent the occurrence of autocorrelation.Semi-variograms are the most common method for determining the minimum distance at which spatial autocorrelation is expected to occur among pixels of remotely sensed data (Hyppänen, 1996;Popescu et al., 2004).The thinning conditions and age classes are factors affecting the autocorrelation distance over pine plantations (Atkinson and Danson, 1988;Cohen et al., 1990;Mason et al., 2007).For this reason, three sites including three age classes were considered along with two extreme thinning conditions, unthinned and second thinning by the random-systematic method.It is required to calculate the semi-variograms for the optical image band which reveals more information of the plantation structure (Pascual et al., 2010).According to the findings of Shamsoddini et al. (2012), the NIR band of SPOT-5 is relatively highly correlated with the structural parameters of the pine plantation, especially mean height.For this reason this band was selected along with lidar-derived CHM for calculating a semi-variogram over the selected areas.

Attribute Extraction
Statistical attributes were calculated for the collected samples within 30 m radius plots as discussed in Shamsoddini et al. (2013a).Lidar metrics included mean (ME); median (M); maximum (MAX); standard deviation(ST); variance (VAR); coefficient of variation (CV); relative range, range divided by the mean, (RRA); standard error of the mean-standard deviation divided by the square root of the number of pixels-(SEM); skewness (SK), and kurtosis (KU).Moreover, all pixels representing heights which were above the mean height within each plot were also used to calculate the mean, median, standard deviation, variance, range (R AM ), coefficient of variation, skewness and kurtosis values above mean (denoted by the subscript AM).In addition, 10 th , 20 th , 30 th , 40 th , 60 th , 80 th , and 90 th percentiles of height were calculated.For SPOT-5, 11 GLCM indices, as used in Shamsoddini (2012), were calculated for four window sizes, 3×3, 5×5, 7×7, and 9×9, and for four orientations, 0 °, 45 °, 90 ° and 135 °.According to Shamsoddini et al. (2013a), bands, band ratios and principal components (PCs) provide different types of information.For this reason, the textural indices were calculated for bands, band ratios and PCs derived from SPOT-5 multispectral image.

Dimensionality Reduction and Feature Selection
Due to the large amount of redundancy among the generated textural attributes of SPOT-5 data, the number of attributes was reduced based on the absolute value of the Pearson correlation coefficient calculated for each possible pair of attributes.Then the summation of the absolute value of the correlation coefficient derived for the examination of the relationship between each attribute with the others was calculated.Each pair of attributes whose correlation coefficient was higher than 0.90 were considered to be redundant and the one whose total correlation coefficient was higher than the other was removed.This reduction process identified 403 attributes for SPOT-5 data.
A Random forest feature selection method (RFFS) was used before applying the machine learning methods in this study.To implement the RFFS method the following steps were utilised (Svetnik et al., 2004): 1-Fitting Random Forest (RF) for the dependent variable using all independent attributes and calculate mean square error (MSE) that is derived from the fitted model for the validation dataset.2-Calculating the absolute values of residuals of the predicted and measured values.3-Calculating the variable importance of each independent attribute using permutation.4-Removing a suitable number of attributes which are less important.These are determined by multiplication of the total number of attributes and dropped number which is set to 0.2 (i.e.20% of the attributes).5-Repeating step 1 to 4 but avoid repeating step 3.
To smooth out the results derived from each RF, step 1 was repeated 10 times for each epoch and the average value of MSEs derived for 10 trials was used.Traditionally, the minimum number of attributes providing a minimum rate of error is selected as suitable attributes.Also, it is recommended to use the minimum MSE value plus one standard deviation of MSEs, which is called error range, for selecting the final number of attributes.But, in this study the second condition is also applied along with the first condition for selecting the suitable number of attributes.After calculating the absolute residuals for each random forest (step 2), a paired-samples t-test was used to examine whether the difference of residual sets derived from a different number of attributes was significant, provided MSEs were within the error range.If the difference was significant for all residual sets, then the number of attributes which present the minimum rate of error was selected.Otherwise, among those attribute sets presenting a lower rate of error without a significant difference in prediction error, the attribute set which contains the lower number of attributes was selected.
The number of iterations was the only parameter adjusted prior to using RF.Consequently, in order to find the suitable number of trees for each random forest, the number of iterations was set to 10, 50, 100, 200, and 300.After the selection of the suitable attributes for each number of iterations, the selected attributes were tested on a validation dataset to calculate the prediction accuracy of each set of attributes for each lidar metric.The set of attributes which provided the most accurate predictions over the validation samples for each lidar metric was selected as suitable attributes for that lidar metric.The required program was coded in MATLAB 7.9.0.

Modelling
The textural attributes selected by RFFS method for each lidar metric were provided for the four modelling methods, MLR, MLP, RBF, and RT.As mentioned, MLR has been the common method for modelling lidar metrics using optical data.MLP is a form of artificial neural network (ANN) applied for supervised and unsupervised learning.Predictor data (x 1 , x 2 ,..,x n ) and response data (y 1 ,y 2 ,…,y n ) are provided for supervised learning with the aim to minimise an objective function (also called cost function or error function).The MLP architecture is formed by a number of interconnected nodes (or neurons) which are simple processors of weighted inputs.In general, the MLP architecture contains three layers, input, hidden and output layers (Atkinson and Tatnall, 1997).There are different types of activation functions such as Gaussian, sigmoid, hyperbolic tangent and sinusoidal functions used in hidden layer of MLP (Tan et al., 2011).In this paper, a Gaussian activation function ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume II-5/W2, 2013ISPRS Workshop Laser Scanning 2013, 11 -13 November 2013, Antalya, Turkey was used (Tan et al., 2011).The implementation of MLP requires five parameters to be adjusted prior to beginning the training step, the number of epochs, desired value of error, number of neurons, learning rate, and momentum parameters (Atkinson and Tatnall, 1997).RBF is a common neural network method (Benmokhtar and Huet, 2006) which approximates an unknown function using a weighted sum of h-dimensional radial activation functions (Benoudjit and Verleysen, 2003).A gradient descendent algorithm proposed by Karayiannis (1999) was used for RBF training and determining the centres and widths of RBF along with their weight and bias.Prior to starting the RBF training six parameters including, the learning rate of the weights and bias, the learning rate of the centroids, the learning rate of widths, the number of centroids, the number of iterations, and the desired error were adjusted.The MLP and RBF parameters were adjusted using a validation dataset in a recursive process which alters the value of each parameter until it reaches the minimum value for the prediction error.This process was repeated 10 times for each value of each parameter and the average root mean square error (RMSE) was used for selecting the appropriate values.
RT aims to recursively partition the input samples in a binary manner (Steinberg and Colla, 1997).The algorithm starts from the root node where all samples are provided for the training.The root node is split into two child nodes which are further split into child nodes.The process of the splitting continues until no further splitting is allowed due to the lack of samples (Steinberg and Colla, 1997).After constructing the maximum size of the tree, there is a 'pruning phase' required to improve the generalization capability of the trees.The method proposed by Drucker (1997) was used to prune the constructed trees.

Assessment and Comparison
After developing different models for predicting 25 lidar metrics based on the training dataset, the RMSE and coefficient of determination (R 2 ) were calculated to determine the accuracy of the lidar metric predictions of 161 test samples.The RMSE derived from each lidar metric was divided by the mean of the measured values of that lidar metric and multiplied by 100 to show the relative percentage error.To compare different methods, the absolute residuals between the predictions and field measurements were calculated and the paired-samples ttest was applied to examine whether the difference between the predictions derived from different algorithms were significant.
To determine which algorithm was more efficient than others, lidar metrics with errors of prediction significantly different for two machine learning algorithms were utilized to calculate the scoring matrix equation (Shamsoddini et al., 2013b).

RESULTS AND DISCUSSION
As stated in Section 3.2, RFFS was applied on the textural attributes derived from SPOT-5 data and the suitable attributes for predicting each lidar metric were selected.Mean (ME), homogeneity (HOM), and contrast (CON) were selected by RFFS more than the other extracted textural attributes derived from SPOT-5.According to Shamsoddini et al. (2012) and Shamsoddini et al. (2013a), the performance of the GLCM indices is a function of the spectral layer from which these attributes are calculated.The attributes which were selected from SPOT-5 data were mostly band ratios, especially those derived from the ratio of SWIR and the other bands.In addition, the red band and its band ratios were selected more often than the other bands of SPOT-5.Pascual et al. (2010) showed that the SWIR-related indices of Landsat ETM+ are strongly correlated with lidar metrics.Also, the usefulness of the red band of medium resolution data was shown by Donoghue and Watt (2006) for predicting lidar-derived heights using Landsat ETM+, respectively.The performance of MLP, RBF, and RT were compared with the more commonly used MLR method.The same attributes as those used for non-linear machine learners were provided for the MLR.The results of R 2 and the relative error derived from individual machine learners for some of the lidar metric which were predicted more accurate than the others using SPOT-5 textural data are shown in Figure 2.   To determine the best method among these individual machine learners, the paired-samples t-test was used for examining whether the prediction error of two methods for each lidar metric is statistically significant.The scoring matrix was then calculated using lidar metrics with significant differences in prediction error, to compare the relative performance of each method.1. Scoring matrix results for comparing individual machine learners over the lidar metrics predicted using SPOT-5 textural data in percentage As Table 1 shows for SPOT-5 data, among non-linear methods, MLP performs statistically significantly up to 9% and 19% better than RBF and RT, respectively.Also, RBF is the only non-linear machine learner which performs significantly 13.9% better than MLR.Regarding the overall performance, as shown in the last column of Table 1 in bold and italic text, the best performing method among individual machine learners is MLP, whereas the weakest machine learner is RT.Also, the overall performance of MLP and RBF is significantly better than multiple-linear regression when SPOT-5 data are used.It seems that the relationship between textural attributes derived from SPOT-5 data and lidar metrics is defined as a non-linear relationship rather than a linear relationship.For this reason non-linear methods including MLP and RBF performed better than MLR; however, the inherent weakness of RT, which is a non-linear regression method, should not be ignored.

CONCLUSION
In this study the performance of the different types of machine learners including MLP, RBF, and RT were compared with MLR for predicting 25 different lidar metrics using textural information of SPOT-5 multispectral image.Prior to using these machine learning approaches, the RFFS method was applied to select the most suitable textural attributes of optical data for predicting lidar metrics.Correlation, contrast, and mean, selected as the suitable textural attributes by both RFFS and stepwise methods, are the most useful GLCM indices.Also, the SWIR and red bands of SPOT-5 data and their ratios with the other bands are useful for predicting lidar metrics.Among 25 lidar metrics, KU AM and MAX were predicted with higher accuracy compared to the others.In overall, The MLP methods performed better than the other individual machine learners including MLR for predicting lidar metrics using SPOT-5 textural data; however, RBF was the only method which performed better than MLR.
Figure 2. (a) and (b) are respectively R 2 and relative error results derived from individual machine learners using SPOT-5 textural data.The colour of the box surrounding the values refers to the method used to derive the value inside the box.According to Figure 2, the performance of different methods varies for each lidar metric; however, Figure 2(a) demonstrates that the variations of most of the lidar metrics are explained better using the MLP and RT methods, while Figure 2(b) shows most of lidar metrics predicted by MLR method have lower relative error values than those derived from the other methods.Figure 2(a) indicates that the highest R 2 values among different Figure 2(a) indicates that the highest R 2 values among different ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume II-5/W2, 2013 ISPRS Workshop Laser Scanning 2013, 11 -13 November 2013, Antalya, Turkey After determining the required distance for sample collection of 70 m, a systematic-random sampling approach was used to select sample plots.The collection of training and test samples was conducted with following considerations: ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume II-5/W2, 2013 ISPRS Workshop Laser Scanning 2013, 11 -13 November 2013, Antalya, Turkey lidar metrics were derived from MLP for ST AM and VAR.Moreover, according to Figure 2(b), the lowest relative errors of 11.6% and 16.8% pertain to KU AM and MAX when MLR is utilised.
Table1shows the results of the scoring matrix analysis for each method.