MULTI-SENSOR APPROACH TO LEAF AREA INDEX ESTIMATION USING STATISTICAL MACHINE LEARNING MODELS: A CASE ON MANGROVE FORESTS

Leaf Area Index (LAI) is a quantity that characterizes canopy foliage content. As leaf surfaces are the primary sites of energy, mass exchange, and fundamental production of terrestrial ecosystem, many important processes are directly proportional to LAI. With this, LAI can be considered as an important parameter of plant growth. Multispectral optical images have been widely utilized for mangrove-related studies, such as LAI estimation. In Sentinel-2, for example, LAI can be estimated using a biophysical processor in SNAP or using various machine learning algorithms. However, multispectral optical images have disadvantages due to its weather-dependence and limited canopy penetration. In this study, a multi-sensor approach was implemented by using free multi-spectral optical images (Sentinel-2 ) and synthetic aperture radar (SAR) images (Sentinel-1) to perform Leaf Area Index (LAI) estimation. The use of SAR images can compensate for the abovementioned disadvantages and it then can pave the way for regular mapping and assessment of LAI, despite any weather conditions and cloud cover. In this study, generation of LAI models that explores linear, non-linear and decision trees modelling algorithms to incorporate Sentinel-1 derivatives and Sentinel-2 LAI were executed. The Random Forest model have exhibited the most robust model having the lowest RMSE of 0.2845. This result poses a concrete relationship of a biophysical entity derived from optical parameters to RADAR derivatives to which opens the opportunity of integrating both systems to compensate each disadvantages and produce a more efficient quantification of LAI.


Background
Leaf Area Index (LAI) is a dimensionless quantity used to characterize canopy foliage content, defined as the total area of one side of the leaf tissue per unit area of ground surface (Breda, 2008). It is commonly used in studies concerning vegetation and ecosystems as leaf surfaces are the primary sites of energy, mass exchange and primary production of terrestrial ecosystem. Many important processes such as canopy interception, evapotranspiration, and gross photosynthesis are directly proportional to it (Liang and Wang, 2020). LAI quantifies the photo-synthetically active part of forest canopies and allows the examination of rapid response to stress factors (Stankevich et al., 2017) therefore, an effective indicator for vegetation status. Current methods to measure LAI are through ground-based and remotesensing methods.
Ground-based LAI measurements can be done either directly or indirectly (Liang and Wang, 2020). LAI can be measured directly by harvesting leaves, using either destructive sampling or litter traps, and then measuring their area (Liang and Wang, 2020). In contrast, the indirect way to measure LAI exploits either the allometric relationship between leaf area per tree and diameter at breast height (DBH) or to measure the canopy transmittance and then convert it to LAI (Liang and Wang, 2020). Ground-based * Corresponding author methods, however, can be labor intensive for large areas that would need a lot of samples to adequately characterize its spatial variability (Liang and Wang, 2020). Nevertheless, ground-based methods are often considered suitable for a specific study site or small patches of vegetation (Liang and Wang, 2020).
A more practical method to obtain LAI measurements, especially over large areas with multi-temporal coverage, is through the use of remotely sensed data. However, LAI is not directly accessible from remotely-sensed images due to the possible heterogeneity in leaf distribution within the canopy volume. Note that LAI is an intrinsic canopy characteristic, hence, it should not be reliant on the observation conditions of remotely-sensed data. Thus, LAI from remotely-sensed observations correspond to the 'effective LAI', i.e. the value that would produce the same remote sensing signal as that actually recorded, while assuming a random distribution of leaves (Weiss and Baret, 2016). Still, the possibility of utilizing remotely-sensed data in estimating LAI should be advantageous over large areas such as mangroves with a difficulty in accessibility.
LAI estimation through optical satellite images have been explored through numerous studies with various degrees of success (Phinn et al., 1999). In the case of Sentinel-2, LAI can be computed using a SNAP tool that uses tested, generic algorithms based on specific radiative transfer models. One study was conducted to validate Sentinel-2 derived LAI with ground-measured LAI on winter wheat in Poland (Bochenek et al., 2018). Results showed that their compatibility is highly-dependent on vegetation phase and that the SNAP tool tends to underestimate LAI. They further concluded that the less reliable LAI results could be attributed to the less-straightforward method, i.e. the use of artificial neural networks (ANN). On the contrary, another study assessed the accuracy of Sentinel-2 derived LAI of plant canopies and other land use/cover type in tropical mangrove areas in Palawan, Philippines and concluded on a positive note (Apan et al., 2018). They compared Sentinel-2 derived LAI and in-situ LAI measured using hand-held C1-110 Plant Canopy Imager. Although the range of fieldbased LAI, which was 0 -2.71 (mean = 0.86), and Sentinel-2 derived LAI, which was 0.19 -4.32 (mean = 1.64), differs, the regression models demonstrated high correlation agreement of observed and predicted LAI values with 0.9419 for linear regression and 0.9435 for support vector machine (SVM).
However, several limitations arises from the retrieval of LAI in multispectral optical images such as Sentinel-2 images, specifically (1) being constrained by the requirement of cloud-free daylight conditions; (2) being able to capture information mainly from top of canopy rather than the vegetation structure; and (3) the prospect of surface reflectance saturation that occur to moderate to high vegetation cover (Wang et al., 2019). These limitations, however, can be addressed by using Radio Detection and Ranging (RADAR) images.
RADAR become more commonly used because of its allweather and all-day capability (Filho and Paradella, 2002). RADAR utilizes microwave signals from which the longer wavelengths allows the canopy penetration and therefore acquiring under canopy vegetation parameters and surface characteristics (Bourgeau-Chavez et al., 2001). Additionally, the sensitivity of RADAR to dielectric constant directly associated to water content (Baghdadi et al., 2001) proves its suitability to mangrove studies since it thrives in tide inundation zone and water quantity easily varies. Studies have reported the effectivity of Synthetic Aperture RADAR (SAR) for crop monitoring. Such studies explored satellites EUMETSATs Metop Advanced SCATterometers (ASCAT), JAXAs Advanced Microwave Scanning Radiometer 2 (AMSR2), ESAs Soil Moisture Ocean Salinity (SMOS) mission and NASAs Soil Moisture Active Passive (SMAP) (Wagner et al.,1999;Kerr et al., 2012;Parinussa et al., 2015;Liu et al., 2011) but encounters a problem when the spatial resolution of these products were observed to be relatively coarse with pixels covering tens of kilometers. Another study then suggest a relationship between VH backscatter and Leaf Area Index (LAI) having both exhibit an increase in rapeseed site in Italy and Sweden (Macelloni et al., 2001). Additionally, another study explored the fusion of optical and SAR imageries for LAI gap filling where it has proven its effectivity cloudy periods to which optical satellites suffers (Pipia et al., 2019).
As of 2014, the European Space Agency (ESA) Copernicus Sentinel-1 was launched having backscatter observations with revisit time of 1.5 -4 days and a spatial resolution of 20m (Vreugdenhil et al., 2018). These advantages, however, does not entirely make Sentinel-1 a superior sensor. SAR Signals are affected by soil background and topography. Furthermore, unlike optical sensors, vegetation indices and biophysical parameters cannot be directly derived in Sentinel-1. Thus, the need to establish relationships through models and other methodologies. The use of backscatter for LAI is continuously being studied since RADAR data application is still rare (Stankevich et al., 2017). LAI is not directly available from Sentinel-1 thus, arises various methodologies and models in establishing LAI relationships (Stankevich et al., 2017, Wang et al., 2019.
This study explores the potential use of Sentinel-1 for LAI estimation using a number of statistical machine learning algorithms for establishing a relationship between Sentinel-2 optical parameters and Sentinel-1 RADAR derivatives that opens an opportunity for integrating both multispectral optical and SAR satellites images for better analysis of vegetation dynamics.

Study Area
The Philippines used to have over 400,000 hectares of mangrove cover before it declined to 120,000 hectares in 1994 (Primavera, 2000). The largest threat to mangrove forests are the establishment of fishponds for commercial fishing and shrimp farming (Primavera, 2000). Despite these sustained threats, the mangrove cover in the country is increasing considering the 2010 estimate of the Department of Natural Environment and Resources of 310, 593 hectares (FMB, 2010). And this increasing trend is also observed in the study area located in Noveleta and Kawit in Cavite, Philippines.
The mangrove cover in Kawit, Cavite increased from 33.65 hectares to 133.75 hectares between 2003 and 2010 (Salmo et al., 2017). Upon visual inspection on satellite imagery (See Figure 1), the mangrove covers continued to increase through the years. However, in early 2019 mangroves and trees were cut down in Animal Island in Brgy. Binakayan in Kawit, Cavite to give way for the construction of 20-hectare Philippine Amusement and Gaming Corporation (PAGCOR) Philippine Offshore Gaming Operations (POGO) Hub Covelandia. The PAGCOR POGO Hub Covelandia used to be part of the mangrove-enclosed Island Cove Resort and Leisure Park (Ronda, 2019). Moreover, LAI is an important parameter of plant growth (Trimble, 2019) and thus, changes can be observed through the LAI values as well. Due to mangrove phenology, LAI drastically increases during the mangrove's early stages and then it undergoes a process called self-thinning resulting to a decrease in LAI. As it matures, though, the LAI values become more stable. This study analyzes mangrove patches found in four locations namely. (1) Brgy. San Rafael, Noveleta, (2) Brgy. Poblacion, Kawit, (3) Brgy. Pulvorista, Kawit and (4) Brgy. Binakayan, Kawit. In this mangrove patches, the surrounding areas have been subjected to active urbanization to which some mangroves were cleared out such that the LAI values on those are expected to approach zero however presented itself a different case. With the sustained threats on mangrove forests updated information are important for the development of costeffective resource management approaches (Go, 2019) that can aid in the creation of relevant interventions for policy makers. And one possible way to have an updated information regarding mangrove conditions would be through regular mapping and assessment of LAI.

Objectives
In this study, the relationship between derivatives of Sentinel-1 (backscatter and interactions) and Sentinel-2 (LAI) is modeled for LAI estimation.

Significance
Leaf Area Index (LAI) is one of the most common parameter used to quantify vegetation attributes, specifically to characterize canopy foliage content. Although multi-spectral images are effective and widely-used for mangrove-related studies, these images have drawback (Wang et al., 2019). They are not independent of weather conditions. Moreover, the capability of backscatter to penetrate the canopy, and its sensitivity to dielectric constant linked to water availability is suitable for mangrove studies. Thus Sentinel-SAR images were used here to establish the relationship between its derivatives and that of multispectral images. This should be helpful for areas with persistent cloud cover. Furthermore, the use of SAR images for LAI estimation should compensate for these disadvantages inherent to multispectral images and it could pave the way then for regular mapping and assessment of LAI.

Scope and Limitation
This study was done when a global pandemic was occurring. Access to the actual mangrove forests were physically impossible, in consequence, no field-based data were used in validation of resulting maps and in calibration of LAI models. Figure 2. Methodology overview for generating a model that is aimed to establish a relationship from LAI-derived data from Sentinel-2 with RADAR derivatives extracted from Sentinel-1 SAR

Data and Materials
The satellite imageries utilized in this study were Sentinel-2 Multispectral Imager Instrument (MSI) Level 1-C images and Sentinel -1 Synthetic Aperture RADAR (SAR) Level 1 Ground Range Detected (GRD) whose extents frames the study area and having collected one image per year from 2016-2020.

Pre-processing
Sentinel-2 L1-C products are already orthorectified, georeferenced, and radiometrically calibrated into top-ofatmosphere (ToA) reflectance. Atmospheric correction was applied using Sen2cor standalone tool to convert L1-C to Level-2A products, i.e. ToA reflectance data to surface reflectance data. This atmospheric correction was carried out using image-based retrievals from a set of Look-Up tables generated from the libRadtran model to correct singledate Sentinel-2 L1-C ToA product from atmospheric effects and to arrive with L2-A surface reflectance product (Main-Knorn et al., 2017). All L2-A bands were resampled to 10m pixel size using SNAP v. 7.0 geometric operation tool.
The Sentinel -1 GRD products contains SAR data that has been detected, multi-looked and projected to ground range using WGS84 ellipsoid model. Polarisations included VV and VH. Pre-processing was done in the Sentinel Application Platform (SNAP) software with the following order: (1) the application of a precise orbit file, (2) thermal noise removal, (3) border noise removal(4) calibration using the st1bx in SNAP, (5) terrain correction by applying an external Digital Elevation Model (DEM) extracted from an Interferometric Synthetic Aperture RADAR (IFSAR) data sensed in 2013 as acquired from the National Mapping and Resource Information Authority (NAMRIA), lastly, (6) converting the unitless backscatter coefficient to dB using a logarithmic transformation deriving with the sigma 0 for each polarizations.

Imagery Derivatives Data Preparation
The vegetation index considered in this study was the Leaf Area Index (LAI) which was generated for all pre-processed Sentinel-2 images using the SNAP biophysical processor.
Meanwhile, the primary RADAR parameters considered in this study that were present in the Sentinel-1 imageries were the sigma 0 backscatter coefficients of each polarization (VV and VH). From these coefficients, RADAR derivatives were computed using Equations 1, 2, 3, and 4.

Cross Polarised Ratio
Unlike Sentinel-2, Sentinel-1 has no LAI included in its product. To obtain LAI from Sentinel-1, this study generated a model aimed to establish a relationship between LAI and the six (6) RADAR parameters. Modeling was done for each year from 2016-2020 to monitor consistency and robustness of model and to characterize fluctuations both on LAI and RADAR variables.
Since Sentinel-2 and Sentinel-1 are different products, scaling, aligning, resampling and normalizing were conducted in order for Sentinel-1 and Sentinel-2 images to have the same resolution and pixels. A mangrove mask generated from a Mangrove Vegetation Index (MVI) (Baloloy et al., 2020) was applied to isolate the mangrove patches for both imageries.
Feature selection was then done to verify that the chosen predictor variables are significant to estimate LAI. More variables do not directly imply a good model rather there is a possibility of overfitting and removing unnecessary variables prevents biases. The Boruta algorithm, readily available in RStudio, was considered since it works well in taking account of multi-variable relationships and considers all features with respect to a certain outcome variable.

Statistical Machine Learning Algorithms
Since we are dealing awith per pixel values with around > 1000 pixels per image, machine learning-based models were deemed applicable for analyzing these data of vast volumes. Effectiveness of the model fit, however, falls in the specific algorithm to be utilized. In this study, three algorithm types were explored in order to identify the best estimation (seen in Figure 3). All modeling were done in RStudio. For the linear models, Multiple Linear Regression was utilized since it accepts more than one predictor variable. Meanwhile, ridge regression was also considered since it can be used with data that suffer from multicollinearity.
For the non-linear category, Generalized Additive models was considered because it has been used for environmental sciences studies which is grounded with the idea of adding a smoothing function to the contributing variables to make the fit more flexible and adaptive. On the other hand, Support Vector Machine models does not depend on distributions of the data and the model estimates a continuousvalued multivariate function. SVM limits the error at an acceptable range.
Similarly, Random Forest was primarily used as a classifier but has the predictive capacities as well. The idea of RF is grounded on the use of a number of decision trees that form nodes containing a high proportion of the data samples. It cleanly divides the data and would continuously do so depending on the number of trees programmed therefore, narrowing down specific estimates. And this done through random sampling with replacement thus, providing an element of randomness to prevent overfitting.

Accuracy Metrics and Model Evaluation
Accuracy of the models is measured by the closeness of the predicted estimates with respect to the derived values. In this study, determinants of accuracy used were the Rsquared value and the RMSE.

Feature Selection
Seen in Figure 4 is the result of the program run of the Boruta algorithm for feature selection. Parameters labeled in blue are the shadow features min, max and mean while those features labeled in green were considered significant since they have exceeded the maximum Z score among shadow attributes (MZSA). It is observed that all six parameters from Sentinel-1 are deemed significant predictors for the Sentinel-2 derived LAI with Polar Mean and VH backscatter to be the top variables that affect LAI

LAI Modelling
Presented in Figures 5 and 6 is a comparison of the derived and predicted values by the models for the years 2016 and 2020. Upon overlaying the predicted on the derived values, the distribution of predicted values can be inferred. Algorithms under linear models exhibit a concentration in median values similar as well to non-linear models however are more constrained as it approaches a certain value. Lastly, RF have presented a distribution that is flexible and mimics the behavior of the LAI values.  Extremum and mean values were also tabulated for analyzing distribution as seen in Tables 1 and 2 Additionally, Quantile-Quantile Plots (QQ Plots) were also generated in analyzing and comparing the probability distribution of the data set by plotting "fractions of the data" or the quantiles. The QQ plots are in Figure 7 . Algorithms under linear and non-linear models again exhibited similar traits but are far off to the trend of the derived. RF on the other hand is to be the likely most similar trend.   From here, it can be inferred that the Random Forest is effective in LAI prediction. An in-depth analysis of the model revealed how the residual error in the model have exponential decrease as the data go by the number of the decision trees as seen in Figure 8. The maximum number of decision tree set in the model was 100 for computational efficiency.
The RF model was able to demonstrate the use of Sentinel-1 derivatives in generating a relationship with LAI. This identifies the potential of Sentinel-1 in determining a mangrove's biophysical parameter to which a optical satellite is capable of. A possible indication that the two satellites can be of compatibility for integration in further studies.

CONCLUSION
Generation of a robust model for LAI estimation from Sentinel-1 SAR parameters was conducted based on the exploration of diverse model types that presented varying prediction results to which provides insight on LAI behavior. From these models, Random Forest had showed the highest accuracy as compared to the other models.

RECOMMENDATIONS
This study was made during a global pandemic such acquiring ground samples is not physically probable. In the time that fieldwork could be allowed again, the use of LAI measurements from ground-based methods in calibrating the LAI models furthermore, for validation is highly recommended. It is also suggested that other machine learning algorithm or extending to utilizing deep learning for model generation could be explored.

ACKNOWLEDGEMENTS
We would like to acknowledge the JICA-funded project entitled Comprehensive Assessment and Conservation of Blue Carbon Ecosystems and their Services in the Coral Triangle (BlueCARES) that have produced studies reports regarding our country's mangrove forest to which was utilized in this research. Additionally, we extend our thanks to the Environmental Systems Applications of Geomatics Engineering (EnviSAGE) laboratory of the UP Department of Geodetic Engineering.