STABILIZATION OF SENTINEL-1 SAR TIME-SERIES USING CLIMATE AND FOREST STRUCTURE DATA FOR EARLY TROPICAL DEFORESTATION DETECTION

In this study we analyse the factors of variability of Sentinel-1 C-band radar backscattering over tropical rainforests, and propose a method to reduce the effects of this variability on deforestation detection algorithms. To do so, we developed a random forest regression model that relates Sentinel-1 gamma nought values with local climatological data and forest structure information. The model was trained using long time-series of 26 relevant variables, sampled over 6 undisturbed tropical forests areas. The resulting model explained 71.64% and 73.28% of the SAR signal variability for VV and VH polarizations, respectively. Once the best model for every polarization was selected, it was used to stabilize extracted pixel-level data of forested and non-deforested areas, which resulted on a 10 to 14% reduction of time-series variability, in terms of standard deviation. Then a statistically robust deforestation detection algorithm was applied to the stabilized time-series. The results show that the proposed method reduced the rate of false positives on both polarizations, especially on VV (from 21% to 2%, α=0.01). Meanwhile, the omission errors increased on both polarizations (from 27% to 37% in VV and from 27% to 33% on VV, α=0.01). The proposed method yielded slightly better results when compared with an alternative state-of-the-art approach (spatial normalization).


INTRODUCTION
Weather and seasonal-related conditions of the surface can considerably affect SAR measurements modifying SAR timeseries characteristics (Benninga et al., 2019). SAR variability can be either due to long-term factors, like seasonal droughts and rain scarcity and to short-term events, such as storm surges. Several authors have studied susceptibility of C-band measurements to dense rain cells (Atlas et al., 1993;Kasilingam et al., 1997;Lin et al., 1997), to intercepted precipitation water in the canopy (Dobson et al., 1991;Henderson and Lewis, 1998;De Jong et al., 2000;Cisneros Vaca and Van Der Tol, 2018), and to canopy humidity (see for example Quegan et al., 2000).
Reportedly, C-band SAR backscattering can suffer attenuations between -2 and -2.4 dB when crossing dense storm cells (Moore et al., 1997;Danklmayer et al., 2009) and can increase 1 to 1.5 dB (Dobson et al., 1991) due to intercepted rain. Benninga et al. (2019) found increases of 0.5 dB after medium to severe rain showers (1.8-4.5 mm/12 h). Seasonal changes on water content of the canopy can lead to oscillations of 2.5 dB (Quegan et al., 2000, on ERS-C) to 1.5 dB (Benninga et al., 2019, on Sentinel-1) over mature temperate forests. Frolking et al. (2011) found a strong negative anomaly on SeaWinds active microwave Kuband backscatter data collected over the Amazon Basin during the 2005 drought and detected a striking correlation between water deficit measurements and Ku signal.
This reported instability can affect several kinds of techniques that make use of radar backscattering. In this work we will focus on deforestation detection on an Early Warning context. Early Warning Systems (EWS) are defined as a collection of algorithms and procedures able to identify tree loss or disturbance, on a periodic (monthly, weekly or daily) basis . EWS has been a crucial element to reinforce public policies that have led to significative * Corresponding author deforestation rates decrease in Brazil (Soares-Filho et al., 2010;Assunção et al., 2013;Nepstad et al., 2014) and in Peru (Finer et al., 2018). A recent survey among users pointed to cloud cover as the most important effectiveness limiting factor to the actual EWS (Weisse et al., 2019). Hansen et al. (2016), reported 80% cloud cover during the wet season on Peru Landsat-7/8 data. In Brazil, frequent observations over Amazonian basin are seriously affected, as mean annual cloud cover on the Brazilian part of the biome is approximately 74%. This observational gap caused by cloud cover can be filled by orbital active microwave sensors, namely Synthetic Aperture Radar (SAR) satellites. However, the referred variability in backscattering time-series can mislead a potential automatic deforestation detection algorithm based on time-series analysis, decreasing its accuracy. Here we have reviewed some successful mitigation strategies: (1) Lin et al. (1997) were able to accurately model attenuation due to storm rain cells using meteorogical radar measurements and an attenuation model proposed by Atlas, Rosenfeld and Wolff (1993). (2) While searching for an optimal method to retrieve soil moisture using SAR data, Benninga et al. (2019) developed a complete set of rules and algorithms to deal with Sentinel-1 (S1) variability. Regarding seasonal trends, they proposed an 80-day window moving average computed over σ 0 values. Additionally, they propose a masking threshold based on the accumulated rain on the last 12 hours before SAR acquisition. (3) Reiche et al. (2017) proposed a different approach, using the regional backscatter P95 percentile value to define a mean forest response for every recorded date and then normalizing all the image pixels by this value. This approach has an added advantage: it will deal with sensor oscillations as those reported by El Hajj et al. (2016). And (4) Reiche et al. (2018) fitted a harmonic function to S1 γ 0 VH time series to allow detrending of long-term variations.
Lin's approach, as any climatological modelling strategy dealing with rapid, small scale events such as storm surges should use climatological data with very high spatial and temporal resolution, such as those made available by meteorological radars. Unfortunately, this kind of data is far from being widely available over entire tropical basins. Regarding Benninga et al. approach, the suggested 80-day moving average is unfeasible on an EWS context, as we can't foresee future's SAR measurements. Reiche's spatial normalization seem very suited to EWS and would be used as a benchmark for our results. Reiche's harmonic fitting strategy can be very valuable (and, in fact, has inspired this work), but it can fail and produce significant errors as it doesn't account for inter-annual variability of rain and temperature regimes. As this, a harmonic fit trained on a humid year would probably overestimate backscatter values on the next, drier year. Climate change should potentialize this issue, as erratic rain patterns, altered dry seasons and extreme events gets more frequent (IPCC, 2014). Additionally, harmonic fit approach would not deal with short-term variations of backscattering due to heavy-rain episodes.
In this work we will investigate a novel approach to SAR variability mitigation. We will use precipitation, evapotranspiration, forest structure and drought indexes to model backscatter signal over intact forest tracks on the Amazonian basin, and then we will use this model to predict and stabilize SAR signal. The main advantage of a model built with such kind of data is that it won't depend on any optical measurement of land surface, and it will be sensible to seasonal changes and abrupt alterations of climate patterns.

MATERIALS AND METHODS
All data collection and pre-processing tasks were performed using the Google Earth Engine (GEE) platform (Gorelick et al., 2017).

Study area and sampling space definition
The selected Area of Interest (AOI) covers a 67.000 km² tilted rectangle over the Brazilian Eastern Amazon (Figure 1). The area follows a ~500 km stretch of the Transamazonian highway , between the cities of Altamira and Itaituba, on the Brazilian state of Pará. The AOI belongs to a tropical region, characterized a wet season from November to May, followed by a drier (but not completely dry) season, from June to October. Total precipitation in Altamira varies around 1800 mm/yr. Regarding vegetation, dense ombrophylous forest covers all the intact vegetation areas. Flooded forests are not practically not present in the studied area.
The main drive of deforestation in the AOI is land clearing for cattle-ranching in small to medium patches, corresponding to a low productivity, unsustainable production framed by land ownership concentration by largeholders (Brondizio et al., 2009). Figure 2 shows a classic deforestation setup, with patches of forest being cleared on rectangular shapes next to previous deforested areas, along a secondary road. The resulting 1203 polygon dataset was thinned, by removing all the areas which weren't observed, due to cloud cover, on the observation that predated the polygon creation date. This procedure, which retained 407 polygons, assured that the date of the polygon would be close to the date of the real deforestation; and (iii) the resulting polygons were intersected with the PRODES 2018 dataset. This final operation resulted on 198 fine-tuned, manually confirmed deforestation polygons, with a mean size of 0,92 km² and a total size of 182 km². These sets are illustrated in Figure 1 by yellow and black areas, respectively.

Input SAR data
The SAR data used on this study were delivered by ESA's Cband Sentinel-1A satellite. Sentinel-1 data are made available in the GEE platform as pre-processed, terrain-corrected σ 0 values, on VV and VH polarizations, and were converted to γ 0 values using the following expression (Woodhouse, 2006): where represents the local incidence angle (LIA) and 0 , known as gamma naught, is the backscattering coefficient normalized by the incidence angle.
In order to reduce speckle noise, SAR data were filtered using a time-series filter (Quegan and Yu, 2001) and then a Frost 5x5 adaptative spatial filter (Frost et al., 1982). The adopted spatial resolution of the Sentinel-1 pixel was 20 meters.

Climatologic and hydrological stress information
Daily precipitation data was obtained from the Climate Hazards Group Infrared Precipitation with Stations (CHIRPS) data set (Funk et al., 2015). Hourly data, used to compute the 12 hours accumulated precipitation, was adquired from NASA's Integrated Multi-satellitE Retrievals for GPM (IMERG), on its version 06 (Huffman et al., 2019). CHIRPS delivers global calibrated daily precipitation data on 0.05° (roughly 5x5 km) resolution. IMERG systems produce 0.1° (~10x10km) global precipitation data with a cadence of 30 minutes.
Water Deficit (WD) was computed as the accumulated negative difference between precipitation and evapotranspiration (ET), and represents a measure of the meteorologically-induced water stress (Aragão et al., 2007). For every location and S1 acquisition date, WD was computed following the rule: , … , } where WDi and Pi represent the n-sized 180-day daily time-series of WD and precipitation before the date of interest. For every location and acquisition date, WD will be considered as the WDn value of the WD time-series computed following eq. 2. Daily ET was supposed to be constant and equal to the mean value of the PML-V2 (Zhang et al., 2019) ET time series for every location.
Dry periods (DP) were computed by counting the number of days without a significative rain before the date of interest, for every location. Up to 5 thresholds were used to define what would be considered as a significative rain: 0, 1, 2, 5 and 10 mm·d -1 , generating 5 different indexes: DP0, DP1, DP2, DP5 and DP10.

Forest structure data
Above ground biomass density (AGBD) information was obtained from two different sources: (1) the 500 m. resolution WHRC Pantropical National Level Carbon Stock Dataset (Baccini et al., 2012), which was assembled from a combination of co-located field measurements, orbital LiDAR observations, and imagery recorded from the Moderate Resolution Imaging Spectroradiometer (MODIS), and (2) the 250 meter resolution EBA 2 dataset, computed using 836 LiDAR airborne transects randomly distributed across 3.5 million km 2 of the Amazon forests, and data from MODIS, SRTM, TRMM and ALOS-PALSAR datasets. It's worth noting that, although both datasets refer to the same variable, they can have significative differences, due to the different dates of reference (2010 for WHRC, 2016 for EBA), data, resolution and methods employed. Mean canopy height (CH) was extracted from the 2005 Global Forest Canopy Height data (Simard et al., 2011).

Backscatter modelling
To estimate backscattering on both VV and VH polarizations using the cited precipitation, drought and forest structures data we developed two different models. Firstly, we built a multivariate linear regression model. Due to the large number and strong collinearity of some of the predictors, an exploratory analysis was performed in order to reduce the number of variables and to evaluate eventual transformations to be performed on the variables before modelling. Non-normality of some of the selected variables lead us to build and use a second, non-parametric model using the Random Forest model (Breiman,2

Pixel-wise time-series stabilization and deforestation detection
Once an optimal model was defined for VV and VH backscattering, we used a GEE script to extract filtered backscatter values and all the selected climatological and structural indexes on a set of 2000 points inside the AOI, being half of them intact forest and the other half deforested areas. The sampling areas for each of the forest/non-forest categories were built following the procedure described in 2.1. The sampled data were then injected into the optimized random forest VV and VH models to build a predicted backscattering series (̂0). Finally, the predicted values and the observed values were combined to form a stabilized time-series. The expression used to compute the k element of a n sized time-series was: where 0 is the stabilized backscattering value, 0 is the original filtered backscatter and ̂0 =̂0 − 1 ∑̂0 =1 , is the mean-normalized predicted value.
After stabilization we applied a change detection procedure to the data (detailed below) and compare the results with the results of the same methodology applied to the original backscatter series. The adopted deforestation pixel-wise detection procedure follows the steps detailed below: -Fitting of a statistical distribution for every time-series using a designated training period (in our case, Nov/16 to Jul/17. -Determination of a change threshold based on the fitted distribution and a significance level (for this study we tested 1%, 2% and 5% significance levels). -Thresholding of every pixel's time-series during the detection period (in our case, Aug/17 to Jul/18). -A 'direct alert' will be flagged if a backscatter value falls below the threshold. -Two consecutives 'direct alerts' will trigger a 'confirmed alert'.

Predictors characterization
Regarding statistical characteristics of the predictor variables, none of the considered rain and drought indexes can be considered as normally distributed (Figure 3). This fact will affect modelling results.
All the time-dependent studied variables (backscattering, precipitation, WD and DP) show a periodic behaviour (see for example, VH in Figure 4). Sentinel-1 γ 0 values show a minimummaximum range of 1.08-1.53 dB, with a mean of 1.27 dB for VV Biome-MSA/Amazon Fund -Subproject 7-Estimating Biomass in the Amazon (EBA), to be available on Pangaea. and 1.01-1.38 dB, with a mean of 1.17 dB for VH polarizations, slightly more stable than the temperate forests studied by Benninga et al. (2019), which showed mean ranges around 1.5 dB for both polarizations.

SAR backscatter modelling
Although precipitation indexes showed a strong linear correlation with backscattering values (120-day accumulated rain had ρ=0.67 correlation with γ 0 VV, for example), and γ 0 VH values had also significative linear correlation with short-term rain indexes (12 hours had a ρ=0.49), non-linearity of the main predictor variables led to a biased multivariate linear regression model. The results of the linear regression modelling lead us to test a nonparametric approach, more suited to deal with the non-normality of the used predictors.  In this sense, we tested a Random Forest (RF) approach (Breiman, 2001), following the methodology detailed in 2.5. The optimal mtry parameter (or number of variables available for splitting at each tree node1) was determined by iterative modelling using R² as a measure of model quality and was fixed to half of the used predictors. The optimal number of predictors was fixed to 22 and 25 for VH and VV respectively. Figure 5 lists the selected predictors and their relative importance for each polarization. The values of explained variance of the final models were 71.64% and 73.28% for VV and VH respectively.

Forest SAR measurements stabilization
Once defined the final random forest models, we use them to predict the Sentinel-1 measures corresponding to a whole year after the training period, as a validation process. Figures 6 and 7 detail the results of the validation. The predicted series showed a good overall accord with the actual values, being able to follow the seasonal trends and eventually to model peaks of backscattering due to short-term variations of precipitation.
Extreme backscattering values were not so faithfully modelled as the moderate ones, probably due to the difficulties that random forest regression algorithms have to predict values out of the range of the training dataset. We then measured the variability of the stabilized time-series, that is, the variability of the model residuals, as a measure of the performance of the stabilization process (Table 2). Overall, there is a reduction of 39% of the variability of both VV and VH polarizations, in terms of standard deviation.

Application to deforestation detection on stabilized timeseries
To test our method on operational conditions, we applied the stabilization procedure detailed on section 2.6 to time-series extracted on 2000 points of the AOI, half of them being recently deforested areas and the other half intact forests. For benchmark purposes, we applied the same methodology using the 0 P95 value of the 2 km neighbourhood of the pixel as the stabilization value, instead of the climate predicted value.
This last technique, called spatial normalization, was successfully used by Reiche et al. (2017) in Bolivian tropical forest to improve deforestation detection. We will use "SpNorm" to refer to this methodology, and "ClSt" (Climate Stabilization), for the methodology proposed here. Figure 8 shows the initial result of the climate stabilization procedure, showing the original time γ 0 series, the filtered values and the predicted values of 8 randomly chosen samples.
Overall, the stabilization procedure lowered the γ 0 VH standard deviation values from 6.86·10 -2 to 6.16·10 -2 , and the γ 0 VV standard deviation values from 0.029 to 0.025, which represents a 10% and 14% reduction on VH and VV instability, respectively. After stabilization, we followed the procedure described in 2.6 to determine the accuracy of deforestation detection on the stabilized time-series. The first step (statistical modelling) yielded the following results: -None of the pixel-based backscatter time-series, in linear or logarithmic scale, can be considered as normal, having Shapiro-Wilk normality test p-values consistently above 0.4. -Kolmogorov-Smirnoff tests against gamma and lognormal distributions are substantially more significant for the linear-scaled filtered and stabilized time-series, with p-values around 0.08. -As this, for every pixel, thresholds were fixed based on a fitted lognormal distribution, with levels of significance of 1%, 2% and 5%, based on the predetection period (10/2016 to 07/2017) time-series.
Once the thresholds were fixed, a simple detection procedure was performed, being the alert triggering criteria the occurrence of two consecutive values under the threshold. Figure 9 depicts the deforestation detection procedure results on 8 random points. Note how point 841 deforestation wasn't detected, causing an omission error. Remark how, in point 48, the proposed methodology triggered an SAR alert (red line) more than two months before the optical alert date (in blue). The results of the detection procedure using observed, filtered, spatially normalized and climatic stabilized values are detailed in the following table, in terms of commission error (CE), omission error (OE) and weighted overall error (WOE). WOE was computed as: The rationale behind WOE computation is to penalise the occurrence of commission errors, which are to be absolutely avoided on an operational EWS. For brevity purposes, only the results for a significance level of 1% are shown in Table 3. Higher significance values showed inferior results.  The results of the stabilized series are clearly superior to the nonstabilized ones, showing that the use of raw or filtered S1 backscattering for deforestation detection is non-advisable. The bigger improvement occurs on the commission errors rate, that had reductions of 9% and 19% in VH and VV respectively.
Finally, we computed the delay of detection compared to the optical DETER system. We wanted to determine if the more frequent but cloud-impacted optical observations could overcome our proposed method, in terms of rapidity of response.
The results for the nearly 1000 sampled points in that the mean difference for stabilized alerts is -11 days, and the median is 11 days (positive values imply fastest SAR detection) (figure 10). The benchmark method, spatial normalization, showed similar values: a mean of -12 days and a median delay advantage of 11 days. Thus, we can say that most of the SAR detections came before the optical ones.

DISCUSSION
The performance of random-forest regression, with R² values superior to 0.7, proves that non-parametric modelling is well suited to deal with the proposed problem. The computed precipitation indexes have a strong bimodality, and WD and DP variables have a complex distribution. In this particular situation, non-parametric models, such as the one used here, should overpass parametric models like linear regression. Random trees optimization did not reduce significantly the number of used variables (from 26 to 23 in the case of VH and 25 for VV), showing the complexity of the modelled system. Also, the random forest model had difficulties predicting backscattering values lower or higher than the ones used for training. Other types of algorithms, like support vector machines (SVM), should be tested against this specific issue.
The most important information for the models was the above ground biomass (AGB). The optimal random forest models used two different AGB datasets (WHRC and EBA), bringing information of both datasets, indicating that they respond for different forest structure characteristics, as pointed out by Longo et al. (2016). Other than AGB, long term precipitation and water deficit were useful on VV modelling. The 3 to 4 months periodicity of VV-used rain variables is consistent with the seasonality detected by Benninga et al. (2019). VH was more susceptible to short-term precipitation (12 hours accumulated rain) and local angle of incidence (LIA), indicating that intercepted rain possibly enhances depolarization of the radar signal thus increasing volume scattering. The overall evaluation of the random forest variable importance ( Figure 5) suggests that co-polarized backscattering is more sensible to forest long-term phenological changes, represented by the long-term precipitation indexes, while cross-polarized backscattering variability is mostly dominated by short-term precipitation. Drought indexes, namely WD (water deficit) and DP (dry period) were important as well on both polarizations, confirming the influence of phenology cycles on SAR backscattering variation.
Regarding detection after stabilization results, the most striking result of the stabilization procedure is the reduction of the false deforestation alerts, especially on the VV polarization, that goes from 21% to 2% for a 0.01 level of significance. This decrease should be ascribed to the seasonality reduction after stabilization. Meanwhile, this improvement comes with a price, as omissions grew 5% and 10% for VH and VV, respectively. These increases can be explained by the reduction on backscattering range following the stabilization, especially during the dry season (as precipitation diminish, backscattering signal decrease, increasing γ 0 values after stabilization, and making slightly more difficult to detect deforestation during this period). Luckily, during the dry season most optical sensors are active, and the consequences of the increased error can be controlled using combined EWS information.
It's worth noting that the thresholds of deforestation detection were based on pixel-wise statistical modelling of the original and stabilized series over a short period of time (10 months, from the start of regular acquisition of Sentinel-1A satellite over the AOI until the start of the detection period). Longer series should increase performance of the detection.
Regarding to the state-of-the-art alternative approach to the problem (spatial normalization), we consider that both approaches are equally efficient, in terms of precision and computational costs. Spatial normalization will spend more effort looking for reference forest areas, while climatic normalization will be costlier during the model development phase.

CONCLUDING REMARKS
SAR measurements instability can have different origins, among others: electronic noise, ionospheric activity, tropospheric attenuation, presence of water droplets on the canopy, changes on water content and structure of the vegetation and soil moisture. In this work we have tried to model and reduce the variability related to short-term and seasonal precipitation and the lack of it, making use of globally available climatological and forest structure data.
Modelling results were satisfying and contributed to reduce the variability of the S1 time-series, thus improving the deforestation detection accuracy by drastically reducing the commission errors. Additionally, the results of the deforestation detection procedures indicate that deforestation detection using stabilized SAR time-series will anticipate the alert trigging time by approximately 11 days, if compared to the stablished manual, optical-based approach. This value will be significantly higher for deforestation produced at the beginning of the rainy season, that will remain undiscovered until the dry season, several months later, if only optical sensors are used for deforestation detection.
The results of our work suggest as well that filtering and stabilization of backscattering time-series are effective and should be a premise of any SAR-based deforestation detection method that seeks operational reliability.
Regarding future developments, longer SAR time-series, new forest structure measurements coming from the Global Ecosystem Dynamics Investigation (GEDI) and refined algorithms should improve future models built upon the proposed methodology. More research is needed to determine the performance of the proposed climatological stabilization method in other regions of the Amazon basin having different rain regimes. Analogous modelling of non-forest land covers (grass, crops, deserts) can be an interesting way to gain insight on SAR backscattering variability.