MODELLING ABOVE GROUND BIOMASS OF MANGROVE FOREST USING SENTINEL-1 IMAGERY

Many studies have been conducted in the estimation of forest above ground biomass (AGB) using features from synthetic aperture radar (SAR). Specifically, L-band ALOS/PALSAR (wavelength ~23cm) data is often used. However, few studies have been made on the use of shorter wavelengths (e.g., C-band, 3.75 cm to 7.5 cm) for forest mapping especially in tropical forests since higher attenuation is observed for volumetric objects where energy propagated is absorbed. This study aims to model AGB estimates of mangrove forest using information derived from Sentinel-1 C-band SAR data. Combinations of polarisations (VV, VH), its derivatives, grey level cooccurrence matrix (GLCM), and its principal components were used as features for modelling AGB. Five models were tested with varying combinations of features; a) sigma nought polarisations and its derivatives; b) GLCM textures; c) the first five principal components; d) combination of models a – c; and e) the identified important features by Random Forest variable importance algorithm. Random Forest was used as regressor to compute for the AGB estimates to avoid over fitting caused by the introduction of too many features in the model. Model e obtained the highest r of 0.79 and an RMSE of 0.44 Mg using only four features, namely, σ ̊VH GLCM variance, σ ̊VH GLCM contrast, PC1, and PC2. This study shows that Sentinel-1 Cband SAR data could be used to produce acceptable AGB estimates in mangrove forest to compensate for the unavailability of longer wavelength SAR.


INTRODUCTION
Mangrove forest only represent roughly 0.7% of the total world forest.Nevertheless, it stores about 20 petagrams of Carbon in its ecosystem (Jones et al., 2014).Mangrove forests also play an important role of providing habitat for marine organisms in addition to the goods and services they deliver for humans.In spite of this, there have been global records stating that there is a significant mangrove forest cover loss in the past five decades.The rate of loss is estimated to be ~1-2% annually, which is greater than the deforestation rate in non-mangrove forest classes (Alongi, 2002).The loss is mainly attributed to anthropogenic activities such as land conversions (mangrove forests to agriculture/aquaculture), over extraction of timber products, and coastal population increase, among others.These land cover changes lead to the emission of the stored carbon in the above and below ground carbon deposits.Hence, there is a need to monitor these changes which can be done by accurate and large-scale monitoring schemes such as the use of remote sensing systems.
SAR or synthetic aperture radar is an active sensor producing its own energy to transmit microwave (radio) signals to image a particular scene.It is independent of sun illumination, hence, it can operate both day and night.Also, it is considered as an allweather imaging sensor since cloud cover does not pose an issue.Radar sensors use the microwave portion of the electromagnetic spectrum ranging from 0.3 GHz to 300 GHz or 1 m to 1 mm in wavelength (Toan et al., 2007).
The European Commission (EC) and European Space Agency (ESA) initiated the Sentinel-1 mission (launched on April 2014), which aims to provide information services for security and environmental applications.Information is derived from earth observation satellites in conjunction with ground based data.Sentinel-1 mission uses a C-band frequency (8 -4 GHz or 3.8-7.5 cm).It comprises of two satellites with the same orbital plane, namely, SENTINEL-1A and SENTINEL-1B with a revisit time of 12 days.Moreover, Sentinel-1 provides a much higher spatial and temporal resolution compared to its predecessors such as the ERS-1, ERS-2, JERS, SIR-C/X-SAR, RADARSAT, SRTM, EnviSAT-ASAR, RADARSAR-II, LIGHTSAR, ALOS-PALSAR, TerraSAR-X (Attema et al., 2007).
Advantages of using Sentinel-1 images is its short revisit time which is favourable in forest change detection applications.Also, the said images are free of use and can be easily accessed.However, SAR with longer wavelengths (e.g., ALOS PALSAR products) is more suitable for forest applications since it is more penetrating than shorter wavelengths (e.g., Sentinel-1 SAR products).In the case of tropical forests, information below canopies such as tree trunks/stem and the underlying undergrowth vegetation, deadwood (including standing deadwood), and soil organic matter have significant contribution to the amount of the above ground biomass and carbon pools (Vashum and Jayakumar, 2012).These are considered as the primary scatterers in radar applications in forests such that elements with half the size of the wavelength (λ) have little backscatter produced and induce signal attenuation (Walker, 2016).Therefore, it is more appropriate to use longer wavelengths SAR as it is able to penetrate through vegetation canopies.
To compensate for this limitation, more information were derived aside from the provided VV and VH polarisation of Sentinel-1.Thus, the objective of this study is to extract several statistical information from the default information (VV and VH) such as the grey level co-occurrence matrix (GLCM textures) and the principal components.This is done to gather more explanatory power to be able to derive an accurate model to estimate mangrove above ground biomass (AGB).Moreover, Random Forest (RF) algorithm was used to model AGB rather than the traditional linear regression techniques due to its robustness in complex and nonlinear machine learning problems.

Field data
There are two mangrove sites considered in this study.One is located in San Juan, Batangas which lies in the southern part of Luzon Island, Philippines and the other is in Masinloc, Zambales in the northern part of the Philippines.The former site is considered to be a dense mangrove forest while the latter is sparse.Figure 1 show the two study sites and the inventoried trees (red circles) are superimposed in the corresponding Google Earth imagery while Figure 2 shows the location map.Field inventory surveys were conducted by the Phil-LiDAR 2 -Project 3: Forest Resources Extraction from LiDAR Surveys, which was funded by the Department of Science Technology (DOST) of the Philippines.Every mangrove tree species with a height of ≥ 1.3 m and DBH of ≥ 5 cm were considered.This threshold was based on minimum DBH of sampled trees done by Komiyama et al. in 2005.The x and y coordinates of each tree were determined using a total station with point reference taken using a survey grade GNSS equipment.Field surveys were conducted in March and November of 2015 for dense and sparse mangrove forests, respectively.

Sentinel-1 Data
Two corresponding Sentinel-1A products with a sensing period of March 24, 2015 and November 12, 2015 for dense mangrove forest and sparse mangrove forest, respectively, were used in this study.The product is a level 1 GRD type which has been detected, multi-looked, and projected to ground range using WGS84 ellipsoid model.The sensor is in interferometric wide swath mode with swath width of (250 km) with a geometric resolution of (5 m by 20 m).The polarisations available are VV and VH.

METHODOLOGY
The methodology is categorized into the following procedures: Sentinel-1 pre-processing, computing for the GLCM textures, creating the main data matrix, extracting new information from Principal Components Analysis (PCA), selecting important features, and modelling AGB using RF.

Sentinel-1 pre-processing
The Sentinel-1 data were pre-processed using the Sentinel-1 Toolbox (S1TBX) embedded in the SNAP (Sentinel Application Platform) software version 5 (SNAP Development Team, 2016).The steps included were the application of the orbit file to provide accurate satellite and velocity information, radiometric calibration for backscatter representation of the reflecting object (conversion from DN values to sigma nought values), speckle filtering for speckle suppression using Refined Lee adaptive filter, terrain-correction using SRTM 3 sec grid to correct for SAR geometric distortions, and re-projection from WGS84 to UTM Zone 51 N.

GLCM textures
The grey level co-occurrence matrix or GLCM textures are arrays of values that determines the varying combinations of pixel brightness values (grey levels) present in an image (Beyer, 2017).These textures are represented using a moving window (9x9) to determine the spatial co-occurrence of the pixel grey levels (Dorigo, 2012 as cited by Deus, 2016).For each polarisation the sigma nought VV (˚  ) and sigma nought VH (˚  ), ten (10) textures were computed: ASM (angular second moment), contrast, dissimilarity, energy, entropy, correlation, mean, variance, homogeneity, and max.GLCM textures are generated using SNAP (Sentinel Application Platform) software version 5 (SNAP Development Team, 2016).

Creating the feature matrix
Features included in the main data matrix were the aggregated raster values of σ˚ polarisation, GLCM textures, and σ˚ polarisation derivatives such as: Hence, the resulting feature matrix is: ℎ  ∈ 0      ℎ

Extracting new information from PCA: Principal
Component Analysis (PCA) is a multivariate technique wherein correlated dependent variables are represented in lower dimension but retaining the largest of information possible.This is done by transforming the dataset into new variables (known as principal components) by rotating the axes orthogonally in which the transformed data contains maximum variation possible.The principal components (PC) are the eigenvectors of a covariance matrix (Shlens, 2003).Aside from reducing the original dimension of the data, new information can be derived from the principal components since the rotation of the axes provides a new perspective of the original data.Hence, the resulting principal components will be added as additional features.
The PCA was done in matrix X.First, elements of the said matrix were normalized (scaled from 0 to 1) using sklearn's MinMaxScaler version 0.19.1 (Pedregosa et al. 2011) using the formula: (5) The covariance matrix was computed using the transpose of the normalized data.Eigenvectors and eigenvalues were then computed from the resulting covariance matrix.The amount of variation explained by each PC was determined by dividing the respective eigenvalue to the total eigenvalues to get the explained variance ratio.These computations were all done using numpy module version 1.14.0 (Walt et al., 2011).Figure 5 shows the graph of the explained variance ratio of the PCs as well as the cumulative explained variance.The said graph said was generated using matplotlib module version 2.1.1 (Hunter, 2007).For the field data, AGB values per tree were computed using AGB allometric equation (Komiyama et al., 2005as cited by Gevana et. al., 2008) for mangroves: AGB = 0.247 ρ (DBH 2 ) 1.23  (6) where ρ = species wood density (g/m 3 ) DBH = diameter at breast height Information regarding the species wood density was obtained at the Tree Functional Attributes and Ecological Database accessed through db.worldagroforestry.org.In cases where the species wood density is not listed in the database, the mean wood density of the genus was used instead.The target column vector y with total AGB as elements was constructed: ( ∈    10   10  ) (7) Appending matrix X and column vector y, the main data matrix, M, was created:

Modelling AGB using Random Forest
Random Forest (RF) is an ensemble of decision trees where ensemble learning is done by combining weak learners to create a strong learner that is robust to over fitting and provides a higher accuracy.A bootstrap sample will be drawn that is 70% of the total size of matrix M. Random selection is done with replacement and will be assigned in a node.At each node, features will be selected with size of the square root of the total number of features.This will be the training dataset.Coefficients of the features will be used in the remaining 30% of the sample size (test dataset or the out-of-bag).The node will split to the left if the mean square error (objective function) is lower than the previous test dataset otherwise, it will split to the right.Hence, the out-of-bag of RF is similar to cross validation such that it serves as the left out dataset in the leave-one-out method.These steps will be iterated up to k times where k is the number of trees grown in the forest.Several terminal nodes will be created if its objective function cannot be lowered furthermore and the final prediction will be the average of these terminal nodes (Raschka, 2016).Averaging all the predictions of the terminal node cancels out the biases since each sample is independent from each other and the total sample size is large enough to follow the law of large numbers.
Hyper parameters of the RF algorithm were tuned to obtain the best settings of the algorithm.Specifically, hyper parameters tuned were the optimum number of trees to build when averaging the prediction nodes, the minimum samples splitting required for each node, and the number of features to be used in a node.
The model that yielded the highest r 2 and lowest RMSE was the final model proposed in this study.Moreover, the important features of the final model were examined such that explanations were provided to give insights on how mangroves forest structures (dense vs. sparse) affect the features of the data matrix (polarisation, textures, etc.) in estimating AGB.

Selecting the important features
Important features were determined using the variable importance algorithm embedded in the Random Forest algorithm (Liaw and Wiener, 2015).The variable importance of the said algorithm measures the difference of accuracy before and after random permutation of a feature.A feature having a high variable importance suggests its contribution to the model's accuracy decreases when it was permutated while those features having low variable importance show its contribution to the model's accuracy had no significant difference as that of a randomly permutated feature (Strobl and Zeileis, 2008).Values in the variable importance are unitless and the higher the values, the greater the contribution.

Site
Total  The figure above shows that both sites are skewed to the right since majority of values are concentrated on the lowest range.The total and average AGB with respect to the area is higher in a dense mangrove forest as compared to the sparse mangrove forest.The overall statistics shows there is a large difference of values between the two areas thus, the inclusion of GLCM textures could provide significant explanatory power in estimating the AGB values for two statistically different areas.
Mangrove species observed are Avicennia marina, Avicennia officinalis, Bruguierra cylindrica, Excoercaria agallocha, Rhizophora mucronata, Sonneratia caseolaris, and Xylocarpus granatum.For the two field sites, Rhizophora mucronata has the highest count with 2450 individuals while Excoercaria agallocha has the lowest with 95 individuals.Figure 7 shows the graph of count for all species observed in the two field sites.
Figure 7. Count of mangroves species A correlation matrix was computed for matrix M which is composed of polarisation and its derivatives, GLCM textures, the first five principal components, and the total AGB.A heat map generated using seaborn module version 0.8.1 in Python (Waskom, 2017) is shown in Figure 8, illustrating the multicollinearity among features as well as its correlation to the target (AGB).

Figure 8. Correlation heat map
It can be noticed that most textures derived from ˚  polarisations are correlated with each other.Establishing a linear relationship using these features will not provide new information and could lead to over fitting the model.However, the capability of RF to randomly select features assigned for each node takes care of this information redundancy.Moreover, there is a noticeable difference of correlation between AGB and the textures derived from ˚  and ˚  such that, in general, the latter has a higher correlation to AGB.

Hyper parameter tuning and accuracy assessment
For every model, an intensive hyper parameter tuning was done by setting initial value and for each iteration (with increasing or decreasing parameter values), the highest accuracy using the corresponding hyper parameter settings was determined.The optimum number of trees in the RF algorithm is determined to be within 900 -1000 trees.Using number of trees greater than the upper limit produces accuracy that has no significant difference with the cost of slower computing time while number of trees lower than the lower limit leads to an unstable model with significant variations in accuracies.The optimum number of trees is determined by inspecting Figure 9 which shows that the outof-bag (oob) error rate (datasets which is independent from the original dataset) levels off as the number trees increases.Aside from the optimum number of trees to be grown in an RF model, the number of features to be considered when looking for the best split was determined.Possible values for this hyper parameter are summarized in Table 2.

Max features Hyper parameter value Description
A "auto" = n features B "sqrt" sqrt(n features) C "log2" log2(n features) The hyper parameter value for max features was determined to be "sqrt".This was determined by testing which of the possible hyper parameter values yield the highest accuracy.This is consistent with the recommendation of Brieman and Cutler when doing regression using RF.
The RF's variable importance scheme was used to determine the important features of matrix X and is shown in Figure 10.The said graph shows a scree plot that is superimposed on the bar chart to show where the slope of the curve is levelling off, indicating the number of significant features to be considered.For each region, features with the highest variable importance will be selected.
All features in R1 were disregarded due to its low importance.For R2, PC2 is selected (black circle) since it is the highest and represents the rest of the features in its group.The same thing can be inferred for R3, R4, and R5.Hence, four features were included in this model.
In total, there were five models created and tested in this study and each model has unique set of features.This is summarized in Table 4.  5 shows that there is little difference of r 2 among the five models and this only shows the robustness of RF in handling multivariate regression problems.Model e is the best model with an r 2 of 0.79 and an RMSE of 0.44 Mg where it uses only four out of the available 30 features namely; PC2, PC1, ˚  GLCM contrast, and ˚  GLCM variance.Figure 11   Figure 11 shows that large errors comes from extremely highs and lows with respect to the line of best fit.This is shown in Figure 12 where these extremes are present in a dense mangrove forest.Dense mangrove forest has wider range of values as compared to the sparse mangrove forest and the wider the fluctuations of values, the harder it is for the model to predict its values.

Making sense of the important features
Features included in model e as well as the observed AGB have values with varying magnitudes.Such that, differences of magnitude hinder in understanding the relationship of the important features to the observed AGB.To solve this, values of features considered as well as the observed AGB is scaled or normalized as before using Equation 5.
The selected four features are examined and shown in Figure 13.
It can be noticed that, trend-wise, GLCM contrast follows the behaviour of the AGB for both dense and sparse mangrove forests.This is attributed to the fact that contrast tells the dispersion between the reference (GLCM diagonal) and the neighbour pixels which is in correlation with the observed AGB values for both sites.This is also shown in Figure 8 (correlation heat map) wherein positive correlation to AGB is 0.33.The most evident change is the divergence of the trend for GLCM variance and PC1.There is a higher GLCM variance and lower PC1 for a dense mangrove forest while the opposite is true for sparse mangrove forest.The negative relationship of GLCM variance and PC1 are also reflected in Figure 8 where its direct and indirect relationship to AGB are 0.37 and -0.31 respectively.The higher standard deviation of AGB values from the dense mangrove forest contributes to a higher GLCM variance.Hence, the GLCM contrast actually describes the complexity of the structure of the two mangrove forest.PC1 explains 42% of the variance of the 25 features (σ˚ polarisations and its derivatives, and GLCM textures).As what can be observed in Figure 13, this explained variance ratio is enough to differentiate the AGB values of the two mangrove forest sites as it changes according to the characteristics of the two areas.Though nothing much can be inferred from PC2 as to its relationship with AGB (which accounts 27% of the explained variance ratio), it represents features of R2 such as PC3, PC4, sigma nought derivatives and some GLCM textures (see Figure 10).
Figure 13.Line graph of selected features and AGB for both sites The orientation of the incoming linearly polarised field at 45˚ to the vertical will yield a maximum cross-polar response (Richards, 2009).This is evidenced by examining Figure 10 where the features that contributes significantly to the model are mainly composed of GLCM textures that were derived from ˚  .Therefore, vegetation parameters is sensitive to ˚  , a cross polarisation, due to the occurring depolarization.This observation is also in line with the findings of Thiel (2016) where the r 2 between ˚  and stem volume is 0.65.
Mangrove forest structure is considered a dihedral corner mechanism wherein there is a double bounce in the backscatter values due to the trunk to water/ground bounce.This characteristic will yield stronger backscattering responses.However, foliage in the forest canopies attenuates the backscatter produced by the double bounce response.Such that the radar wave may not penetrate deeper into the canopy due to the size of the wavelength and the high moisture condition.Another factor Dense Sparse that influence signal attenuation in mangrove forests is the size of the leaves.Objects that are smaller than the wavelength (Cband is 3.75 cm to 7.5 cm) produces less backscatter (Walker, 2016).In general, leaf size from the sites ranges from 3 cm to 12 cm.Since the sparse mangrove forest is mainly composed of Rhizophora mucronata with a leaf size of 3 cm to 5 cm (Durst, 2007), lesser ˚  is observed as compared to the dense mangrove forest.Difference between the ˚  of the two mangrove forests is shown in Figure 14.
Figure 14.˚  for dense and sparse mangrove forest

CONCLUSION
Sentinel-1 C-band was used to model the total AGB per 10 m x 10 m grid.Since the energy propagated by C-band SAR (3.75 to 7.5 cm wavelength) is attenuated by forest canopies due to its short wavelength and the information gathered is mainly on the leaves and small branches, the cost of obtaining an acceptable r 2 is to extract several statistical features will aid in modelling the AGB values.These statistical features are mainly the grey level co-occurrence matrix (GLCM) and the first five principal components for a total of 30 features.Since more features are added, the model is prone to over fitting, thus, Random Forest, which is robust to these kind of complex machine learning problems, was used as a regressor.RF was also used to identify the important features that contributed to the accuracy of the model where the best model (model e) which only uses four variables, namely, ˚  GLCM variance, ˚  GLCM contrast, PC1, and PC2, produced an r 2 of 0.79 with an RMSE of 0.44 Mg.The significance of these features were mainly due to the sensitivity of cross polarized radar to vegetation.In future studies, it is recommended that additional samples should be added to include wider representations of field observed AGB values.Hence, it is very likely that there will be an improvement of the model given that the hyper parameters of the RF algorithm is exhaustively tuned.

Figure 5 .
Figure 5. Explained variance ratio of the principal components Based on Figure 5, using only the first five principal components will yield a cumulative explained variance of 95%.Hence, from the original 25 features (matrix X), five new features containing new information based on the original features can be added.

Figure 6 .
Figure 6.Histogram of AGB of the two field sites

Figure 9 .
Figure 9. Out-of-bag error rate

Figure 10 .
Figure 10.Variable importance plot.Boxes represent regions and circles represents selected important variables Based on Figure 10, important features can be grouped into five regions (R1 -R5) where each region is determined by the similarities of values as shown by a relatively flat slope of the line.An abrupt break from the line/slope signifies new region.For each region, features with the highest variable importance will be selected.
and 12 show the scatter plot and line graph, respectively, of the predicted and observed AGB values..

Figure 11 .
Figure 11.Observed versus predicted using model e

Table 1 .
Total area and total AGB area (Ha) Total AGB (Mg)

Table 4 .
Summary of models and respective featuresUsing the optimum hyper parameters of the accuracies (r 2 ) and the RMSE for each model were determined (see Table5).

Table 5 .
Summary of accuracies and error for each modelTable