OPTIMAL WAVELENGTH SELECTION ON HYPERSPECTRAL DATA WITH FUSED LASSO FOR BIOMASS ESTIMATION OF TROPICAL RAIN FOREST

Above-ground biomass prediction of tropical rain forest using remote sensing data is of paramount importance to continuous largearea forest monitoring. Hyperspectral data can provide rich spectral information for the biomass prediction; however, the prediction accuracy is affected by a small-sample-size problem, which widely exists as overfitting in using high dimensional data where the number of training samples is smaller than the dimensionality of the samples due to limitation of require time, cost, and human resources for field surveys. A common approach to addressing this problem is reducing the dimensionality of dataset. Also, acquired hyperspectral data usually have low signal-to-noise ratio due to a narrow bandwidth and local or global shifts of peaks due to instrumental instability or small differences in considering practical measurement conditions. In this work, we propose a methodology based on fused lasso regression that select optimal bands for the biomass prediction model with encouraging sparsity and grouping, which solves the small-sample-size problem by the dimensionality reduction from the sparsity and the noise and peak shift problem by the grouping. The prediction model provided higher accuracy with root-mean-square error (RMSE) of 66.16 t/ha in the crossvalidation than other methods; multiple linear analysis, partial least squares regression, and lasso regression. Furthermore, fusion of spectral and spatial information derived from texture index increased the prediction accuracy with RMSE of 62.62 t/ha. This analysis proves efficiency of fused lasso and image texture in biomass estimation of tropical forests.


INTRODUCTION
Tropical rain forests store large amount of carbon in plant material and soil (Jaenicke et al., 2008).Disturbance of those forest, such as deforestation, forest degradation, forest fire, and illegal logging, provides emission of carbon dioxide (CO2) which is a major driver of climate changes (Werf et al., 2009;Hansen et al., 2013).Especially tropical peatland forest is a notable carbon dioxide source, which sinks and stores huge amount of carbon consisted of dead and decomposed plant material accumulated over thousands of years (Jaenicke et al., 2008).Carlson et al. (2012) estimated that peatland carbon emissions amounted to 35% of gross carbon emissions from 1990 to 2010.Therefore, successive monitoring tropical peat forests is important for controlling and/or mitigating the global climate changes (DeFries et al., 1999;Foster et al., 2003;Patenaude et al., 2005;Morehouse et al., 2008).A key driver for those monitoring activity is the United Nations Reducing Emissions from Deforestation and Degradation (UN-REDD+) programme, which needs for the development of robust and replicable methods for net accounting of carbon emissions (Cutler et al., 2012).In practical cases, government and environmental scientists have depended on official forest statistics to calculate the gross emission, which has poor temporal coverage and various definitions of forest degradation (Grainger, 2010).However, those monitoring demands quantifying of the carbon emission with accurate and precise methods considering forest carbon dynamics (Brown, 2002;Le Toan et al., 2011).Forest inventory data, which is one of those quantified data, are collected by two types of activity: follow-up surveys of forest resources over large areas, such as at the national level, and the * Corresponding author accumulation of detailed forest data at the ground level.To achieve high accuracy, such detailed information has been mainly collected in visual field surveys by experts (Hyyppä, 2001).However, considering the acquisition of large-area forest inventories, these field measurement activities require a large amount of time, money, and human resources (Segura et al., 2005;Seidel et al., 2011;Wang et al., 2011).In contrast, since remote sensing data, which is obtained by wide-area observation at a single time, has become popular in the last few decades (Rosenqvist et al., 2003;Masek et al. 2008), estimating forest biomass from remote sensing is expected to contribute the monitoring activity.Several studies have already reported the benefits of remote sensing data in reducing the total survey cost and improving the estimation accuracy by frequent observation (Dalponte, 2014).Over the last decade, the remote sensing data used in many studies on biomass estimation were acquired from multispectral sensor, synthetic aperture radar (SAR), and LiDAR (Lu et al., 2014).In recent years, airborne hyperspectral sensors with around 100-200 observation bands and high spatial/spectral resolution have become common.Those hyperspectral sensor data with continuous spectral information enable detailed analysis for distinguishing of tree species (Clark et al., 2005;Martin et al., 1998;Dalponte et al., 2009) and biomass estimation (Axelsson et al., 2013), with LiDAR data (Anderson et al., 2008;Clark et al., 2011;Vaglio et al., 2014).Furthermore, fusion of not only those spectral data but also spatial information, such as morphological profiles and texture information, has enough potential to improve the accuracy in case of non-parametric models (Fauvel et al., 2008;Feret et al., 2013;Fauvel et al., 2013).
To handle high-dimensional feature spaces of hyperspectral data, many studies have used support vector machines (SVM) as a classification method (Mountrakis et al., 2011) in the nonparametric model and Partial Least Squares (PLS) regression as a biomass prediction model (Vaglio et al., 2014); however, there is a concern about a small sample size problem which is widely exist as overfitting in using high dimensional data where the number of training samples is smaller than the dimensionality of the samples due to limitation of require time, cost, and human resources for field surveys in practical monitoring cases.Therefore, the prediction models often result in the poor generalization capability (Pappu, 2014).Considering sensor mechanism and practical measurement, energy acquired in each band is not enough to generate high signal-to-noise ratio (S/N) due to the very narrow band interval of typical hyperspectral imaging spectrometers (Gao et al., 2013).The noise in the hyperspectral data can be categorized in two types: periodic noise and random noise.The periodic noise can be removed using suitable procedures considering its fixed pattern.However, the random noise cannot be removed completely due to unpredictable pattern (Landgrebe et al., 1986;Corner et al., 2003) Therefore spectral features in hyperspectral imagery can provide the poor performance of the prediction model as a result of noise influence.Also, mass spectrometry data, in not only remote sensing but also other fields, usually have unwanted local or global shifts of peaks due to instrumental instability or small differences in experimental conditions.This peak shift may weaken the strength of rich information from hyperspectral data in statistical analyses.Our contribution of this study is three-fold: we 1) introduce a pixelwise biomass prediction model with hyperspectral data which have strength to the small-sample-size problem; 2) propose a biomass prediction methodology mitigating the effects of noise and peak shift; 3) apply combination of those spectral model and spatial information to the prediction models to improve the prediction accuracy.These proposed methodologies were applied to airborne hyperspectral data collected over Hampangen in Central Kalimantan, Indonesia.After preprocessing consisting of atmospheric correction and mitigation of radiometric distortion, the corrected hyperspectral data and biomass data from field surveys are subjected to our proposed regression models to obtain a pixelwise biomass prediction model.To evaluate these performance, we compared the prediction accuracy with other ordinary regression model in 5-th fold cross-validation.

Study Area
The study area is located in Hampangen in Central Kalimantan, Indonesia (113º 30' 18" E, 2º 6' 43" S) (see Figure 1), which has typical and many kinds of peat swamp forests.Hampangen also has disturbed forests damaged by wild fire in multiple years, especially in 1997 and 2002 when forest fire on a broad scale was occurred.The north part of Hampangen still has large nondamaged forest area with high biomass in which many trees diameter at breast height (DBH) are much large-size with more than 50 cm.

Remote Sensing Data
Hyperspectral data were obtained by HyMAP on July 16th 2011.The HyMAP sensor comprised 124 bands covering wavelengths of 436-2485 nm, with average spectral resolutions of 15 nm (436-1313 nm), 13 nm (1409-1800 nm) and 17 nm (1953-2485 nm).The ground sampling distance (GSD) was 4.5 m. Figure 2 shows an enlarged RGB image from the hyperspectral data.The study site was covered by five image strips.

Field Survey Data
Field surveys were conducted in the study area in 2011 and 2012 to assist with the gathering of training and validation data for the biomass prediction.31 plots were selected for survey of tree species, tree stem diameter at breast height (DBH), and tree height for every tree inside a small quadrat which has a 20m square.Coordinates of each quadrat were determined with a handheld Global Positioning System (GPS) unit.The total above biomass (t/ha) of each tree was estimated with using those observations, specific density of each tree species (S, g cm -3 ), and the appropriate moist life zone allometric equation provided by (Brown et al, 1989) as where  (kg/tree) is biomass, (cm) is DBH, and (m) is tree height.

Pre-processing
The image strips acquired by HyMAP were atmospherically and geometrically corrected.The atmospheric correction was carried out using ATCOR-4 (Richter et al., 2002), which is based on MODTRAN code (Berk, 1989).However, there are still radiometric distortions along cross-track direction because of a sun-sensor-target geometry and airborne sensor with wide field of view, which can cause classification error and poor prediction in building statistical model (Galvão et al., 2013).There are some researches that solve the distortion (Müller et al., 1990).In this study, our proposed methods based on the former method (Palubinskas, 2003) combined with supervised classification.In a general radiation theory, the total radiance detected at sensor is shown as where L  () is the total radiance at sensor,   is the radiance of the target, which is not depend on the view angle  and is based on Lambertian surface,   is an atmospheric transmittance and   () is the path radiance between the target and the sensor.
In our assumptions, the equation ( 2) is valid in all classes (e.g.forest, bare soil) and the coefficient is depend on the each class.
To normalize the radiance at the various viewing angles  to the radiance at the reference angle  0 , the following equation is applied to the image shown as where  and  are unknown coefficients to be estimated.The equation ( 3) describes the linear dependence at some viewing angle.Since the coefficients are different between each class, we use supervised or unsupervised classification to the radiance image before finding out coefficients of each class.In considering capture condition from an airplane, nadir angle from the sensor position is not suffer from the radiometric distortion.Therefore, we assume that the reference angle  0 is set to nadir direction.Based on these theoretical conditions, we build the empirical image-based radiometric normalization method for correcting the radiometric distortion from sensor viewing angle effects and BRDF effects.This correction of the radiometric distortions consists of several steps.Firstly, support vector machine (SVM) is applied to the atmospheric corrected data for extracting only forest area which is our main target of the following statistical analysis.Then, both an average line profile and a variance line profile were calculated.Two polynomial regressions were built for fitting the average line profile and the variance line profile based on the equation (3).Acquired correction formula is applied to the atmospheric corrected data for mitigating radiometric distortion.

Texture Information:
Existing researches (Lu 2005;Fauvel et al., 2008;Feret et al., 2013;Fauvel et al., 2013) show that combination of spectral data and spatial data is effective for high accurate forest classification models and biomass prediction models.In this study, texture information from the HyMAP data is calculated with Grey Level Co-occurrence Matrix (GLCM) texture measures (Haralick et al., 1973) describing spatial dependences in which two neighboring pixels separated by a given distance and a given angle occur within a moving window.We select 9 x 9 pixels as window size for the GLCM and 8 kinds of GLCM (Mean, Variance Homogeneity, Contrast, Dissimilarity, Entropy, Second Moment, and Correlation) are applied to the first principal component of all the hyperspectral bands.

Lasso Regression for Small-Sample-Problem:
While high dimensionality of hyperspectral data includes a large amount of rich information for biomass estimation, the small sample size problem due to the limited number of field survey plots results in risk of estimation model being overfitted to the training data.The common approach to solving this problem is to reduce the dimensionality of the dataset.Several techniques have been proposed to reduce the number of dimensions highdimensional data, such as transformation of the dimensionality, compression, or feature selection maintaining original dimensional space.As the predictors for feature transformation, PLS regression has been previously used in spectral analysis of tropical forest for biomass estimation (Peerbhay et al., 2013).In contrast, band selection as a method of feature selection maintaining original dimensional space is preferable owing to not only avoiding the overfitting but also its interpretability from the view point of end-users.In this study, lasso regression (Tibshirani, 1996) as the band selection method is applied as a pixelwise biomass prediction model owing to its robust and accurate model as well as its interpretability.
Given a set of training data  = {(  ,   ),  = 1,2, ⋯ , ,  = 1,2, ⋯ }, where x are spectral data, y are value of biomass acquired from field survey in this study, n is the number of bands, and p is the number of training data in this study.The ordinary least-squares (OLS) method is used to minimize the empirical error in the following least-squares to estimate the coefficient vector w: (4) To decrease the dimensionality and obtain an interpretable model without sacrificing the prediction accuracy, sparse regularization is applied to the solution of the least-squares problem in (4) with a L1 penalty term attached as shown in ( 5), which is called lasso regression: Here  is a non-negative trade-off parameter, and w is coefficient vector.The L1 penalty term, the second term in (5), introduces sparsity to the optimal coefficient w, where the increasing  shrinks the coefficient w towards zero.

Fused Lasso Regression for Noise and Peak Shift:
The lasso regression can select the limited number of important spectral bands for biomass prediction by increase coefficients as zero.Although those selected bands with sparsity have strength to the small-sample-size problem, the random noise due to the very narrow bandwidth of the hyperspectral sensor negatively affects to the prediction model.One simple noise estimation algorithm uses the mean of standard deviations of several visually homogeneous regions as noise estimate (Gao et al., 2013).However, the homogeneous areas within an image need to be manually selected in this method.This is not suitable for large images, such as remote sensing data.In contrast, reflectance of neighbour bands can be distinguished as homogeneous spectral value in the same pixel because each bandwidth of hyperspectral data is very narrow.As another problem, peak shift due to instrumental instability or small differences in experimental conditions may have a risk of poor prediction performance in the case that the selected bands by lasso regression don't cover the shifted peaks.Therefore, selecting successive bands for the biomass prediction have potential for contributing to increasing S/N and avoiding poor prediction from peak shift.In this study, fused lasso regression is applied for the prediction model to select those successive bands which has sparsity in both the coefficients and their successive differences (Tibshirani et al., 2005).Also, the fused lasso regression simultaneously selects a group of strongly correlated adjacent covariates.In this fused lasso, sparse regularization is applied to the solution of the leastsquares problem: Here  1 is a non-negative trade-off parameter for L1 penalty term, and 2 is also a non-negative trade-off parameter for selecting groups of strongly correlated adjacent coefficient w and encouraging sparsity in differences of the coefficient w, in the third term of in (6).

Preparation of Training and Validation Data
Estimated biomass from field inventory measures in 31 field survey plots ranged from 11.57 to 349.69 t/ha.Since band 121-124 of acquired hyperspectral data contained much noise, we used band 1-120 for our study.Figure 3 shows the comparison between without and with correction of radiometric distortion by our proposed method.The corrected image shows the enough performance for following processing.

Comparison between Biomass Estimation Models
Optimal band selection in lasso regression needs to be fix the parameter λ in equation ( 5).5th fold cross-validation was conducted to estimate optimal λ value.The effect of λ value on the overall prediction accuracy is shown in Figure 4(a), and the number of selected bands in each λ value shown in Figure 4(b).The minimum root-mean-square error (RMSE) was 68.24 t/ha when λ=2.57×10 -4 was used.In that case, 13 bands were selected for the prediction model by the lasso regression.Table 1 shows a comparison of the prediction performances between the lasso regression, multiple regression analysis (MRA) and PLS regression.The lasso regression provides much higher accuracy than other two methods, which confirms that the method of selecting bands have high performance in this case under the small-sample-size problem.To assess the selected spectral bands for biomass prediction model, we counted their appearances derived from the lasso regression in 50 times 5th fold cross-validation.Figure 5 shows the accumulated selected number of each band by the lasso regression in the 50 times cross-validation.Those selected bands had sparsity and the number was limited.This result indicates that the important bands for the prediction are limited and have sparsity.These sparsity has strength to the small-sample-size problem; however, it is susceptible to the effects of the peak shift and doesn't have strength to the random noise.

Successive Band Selection for the Biomass Estimation
In the process of fused lasso regression for optimal band selection, the parameters  1 and  2 are necessary to be fixed to minimize the RMSE between prediction value and validation data value in equation ( 6).5th fold cross-validation was conducted to estimate optimal  1 and  2 value.The effect of  1 and  2 values on the overall prediction accuracy is shown in Figure 6(a), and the number of selected bands in each  1 and  2 value shown in Figure 6(b).The minimum RMSE was minimum 66.16 t/ha when λ 1 =2.68×10 -4 and λ 2 =1.89×10 -4 were used.In that case, 29 bands were selected for the prediction model by the fused lasso regression.The prediction performance of the fused lasso was higher than other methods, as shown in Table 1.This    results proved that the fused lasso can provide high accurate biomass prediction model in considering practical use under the effects of the small-sample-size problem, the random noise, and peak shift.Figure 7 shows the accumulated selected number for the biomass prediction model in the 50 times cross-validation by the fused lasso to assess the selected bands.Comparing with that of the lasso regression, the selected bands by the fused lasso are successive and the values of coefficients are stable in neighbor bands.

Comparison between Estimated Biomass Maps
We evaluated the estimated biomass maps by applying our proposed methods with all the 31 samples to the hyperspectral data.Figure 8 shows the coefficient values of selected band by the lasso and the fused lasso regression, and Figure 9 is the estimated biomass map by each method (PLS, lasso, and fused lasso).Since this study site had limited area with very high biomass over 300 t/ha, the estimated map by PLS (Figure 9 (a)) is not reflected from local conditions.In contrast, the maps by lasso (Figure 9 (b)) and fused lasso (Figure 9 (c)) regression are very similar and describe the local condition accurately.Figure 10 shows spectral curves acquired from all the 31plots by the field survey plotted on spectral ranges of selected bands by the fused lasso and spectral bands of existing multispectral sensors.Although spectral ranges of the existing multispectral sensor don't match the selected wavelength by the fused lasso, some spectral bands cover those selected wavelength excepting in 1200-1600nm.Therefore, the comparison of the prediction performance between the existing sensor and the fused lasso is important as the next steps.Qualifying the effect of absence of 1200-1600nm for biomass estimation is also necessary for the future practical use with existing multispectral sensors.

Fusion of Spectral and Spatial Data
We carried out the fusion of spectral and spatial data for improving the performance of biomass prediction.GLCM indexes was applied to the hyperspectral images for calculating spatial information, and 5th fold cross-validation in 50 times was conducted for the prediction performance assessment.In case of the lasso regression, 120 spectral bands and 8 GLCM indexes were used as training and validation data.Figure 11 shows the accumulated selected number in each band and GLCM index.This result shows "Variance", "Mean", "Correlation", and "Contrast" of GLCM indexes are important for biomass prediction.In the procedure of the fused lasso regression, the selected bands by the fused lasso with only spectral data shown in Figure 8 Table 2 shows comparison of prediction performances between each method with fusion of spectral and texture information.This result proves the fused lasso with spectral and spatial information has enough potential to provide most accurate biomass prediction model.

CONCLUSIONS
We presented the efficiency of the fused lasso regression for forest biomass prediction with spectral and spatial information from hyperspectral data.MRA, PLS, and lasso regression were used for comparison of prediction performance.The hyperspectral data were captured over the Hampangen in Central Kalimantan, Indonesia, which has typical and many kinds of peat swamp forest and diversity of forest biomass caused from disturbance by forest fire, logging, etc.Our results indicated that the lasso regression and the fused lasso regression had a high accuracy and generalization performance without overfitting from the small-sample-size problem.Especially the fused lasso proved most accurate prediction performance, which has strength against random noise and peak shift on hyperspectral data in considering practical use.The estimated biomass map by the fused lasso proved the high prediction performance and high consistency with local condition.Moreover, addition of spatial information by GLCM to spectral data improved the prediction accuracy in all the methods.Especially, the performance of the fused lasso with combination of spectral data and GLCM indexes achieved highest accuracy (RMSE=62.62t/ha).
The number of samples acquired from the field survey in this study may not be enough for existing biomass prediction methods; however, our proposed method proved high performance under this condition.It seems that the prediction accuracy can be improved depending on the increasing the number of samples from field survey.Since hyperspectral data and field survey data may contain potential problems as discussed above for practical forest monitoring, the high biomass prediction methodology as shown in this study considering practical monitoring conditions has enough potential to be used widely for future monitoring activity of tropical rain forest and contribution to the REDD+ activities.
Those selected bands by the fused lasso have some similarity to the sensor bands of WorldView-3 and other kinds of existing sensors.Therefore, fusion of those sensors and upcoming sensors may have possibility to provide high performance models for biomass estimation.Furthermore, the selected successive bands by the fused lasso can be important information for designing spectral configuration of new type multispectral sensors focusing on biomass estimation with limited number of spectral bands.Since this small number of spectral bands contributes to decreasing the total size and weight of sensor devices, it may enable to be mounted on small-and/or nano-satellites and light weight unmanned aerial vehicles (UAVs).The use of those satellites and UAVs with the new sensors, designed based on the information from the fused lasso focusing on biomass estimation, facilitates forest monitoring in terms of cost and accuracy.Therefore this high accurate monitoring method and devices can be deployed and used widely for contribution to conservation of forest condition.

Figure 3 .Figure 4 .
Figure 3. Reflectance data in this study site (a) without, and (b) with correction of radiometric distortion

Figure 5 .
Figure 5. Accumulated selected number of each band by the lasso regression in 50 times cross-validation

Figure 6 .
Figure 6.Comparison of the results in fused lasso regression (a) effect of  1 and  2 on the overall prediction accuracy, (b) the number of selected bands in  1 and  2 1 10 -5 10 -1 10 -2 10 -3 10 -4 λ 1 (b) and the selected GLCM indexes by the lasso regression shown in Figure 11(b) were applied to MRA as training and validation data.

Figure 7 .Figure 9 .
Figure 7. Accumulated number of each spectral bands selection in 50 times cross-validation by fused lasso regression

Figure 10 .
Figure 10.Spectral curves acquired from all the field survey plots with spectral ranges of selected bands by fused lasso and spectral bands of existing multispectral sensors

Figure 11 .
Figure 11.Selected number of spectral bands (a) and texture index (b) in 50 times cross-validation by LASSO regression

Table 1
Comparison of prediction accuracy in 50 times crossvalidation