EVALUATION OF DEEP LEARNING TECHNIQUES FOR DEFORESTATION DETECTION IN THE AMAZON FOREST

: Deforestation is one of the main causes of biodiversity reduction, climate change among other destructive phenomena. Thus, early detection of deforestation processes is of paramount importance. Motivated by this scenario, this work presents an evaluation of methods for automatic deforestation detection, speciﬁcally Early Fusion (EF) Convolutional Network, Siamese Convolutional Network (S-CNN) and the well-known Support Vector Machine (SVM), taken as the baseline. These methods were evaluated in a region of the Brazilian Legal Amazon (BLA). Two Landsat 8 images acquired in 2016 and 2017 were used in our experiments. The impact of training set size was also investigated. The Deep Learning-based approaches clearly outperformed the SVM baseline in our approaches, both in terms of F1-score and Overall Accuracy, with a superiority of S-CNN over EF.


INTRODUCTION
The Amazon Rainforest accommodates a large biodiversity.It is home to a large number of species, including endemic and endangered flora and fauna.It contains 20% of the fresh water of the planet (Assunc ¸ão , Rocha, 2019) and produces more than 20% of the world oxygen (Butler, 2008).Therefore, Amazon provides essential resources for the maintenance of our planet (De Souza et al., 2013), (De Souza , Junior, 2018) and its preservation is of paramount importance.
For many years, the Amazon region has faced several threats as a result of unsustainable economic development, such as the extension of agricultural activities at industrial scale (e.g., soybeans, cattle), slash-and-burn land grabbing by underprivileged rural communities, forest fires, illegal gold mining and logging, expansion of informal settlements, and infrastructure construction (roads and train tracks) (Goodman et al., 2019), (Malingreau et al., 2012), (Barreto et al., 2006).Therefore, it is imperative to promote sustainable development to achieve an ecological balance and to contribute to the mitigation of climate change (Sathler et al., 2018).Controlling and monitoring this ecosystem is fundamental to enforce public policies and to avoid illegal activities in the region.Remote sensing has proven to be a cost-effective information source to attain such objectives.
Given the dynamics and complexity of the Amazon region, there have been large government investments aimed at controlling, preventing and combating illegal deforestation (Diniz et al., 2015).The Brazilian National Institute for Space Research (INPE) has developed and maintained a number of projects to provide surveillance reports over the Brazilian Legal Amazon (BLA).The best known-action is the Amazon Deforestation Monitoring Project (PRODES) (Valeriano et al., 2004), which supervises the deforestation in areas with native vegetation of BLA since 1988.
The near real-time deforestation detection (DETER) (Shimabukuro et al., 2007) project, was developed to support land use policies in BLA and controls the illegal deforestation and forest degradation.The Brazilian Amazon Forest Degradation Project (DEGRAD) (Shimabukuro et al., 2015) measures areas in the process of deforestation where the forest cover has not yet been completely removed.
Finally, the Land Use and Land Cover Mapping of Amazon Deforested Areas (TerraClass) project (De Almeida et al., 2016) is responsible for qualifying deforestation in BLA and investigating the possible causes of logging.These projects, however, adopt methodologies that involve a lot of manual operations.There is, therefore, a demand for automatic procedures that can improve accuracy and alleviate the human work process, as well as reduce the time needed to generate results.
Numerous change detection techniques have been proposed thus far.Some of the traditional no supervised methods are based on image algebra such as Image Differencing (Jensen , Toll, 1982), Image Ratioing (Howarth , Wickware, 1981), Regression Analysis (Ludeke et al., 1990) and Change Vector Analysis (CVA) (Nackaerts et al., 2005).In addition, techniques based on transformations such as Principal Component Analysis (PCA) (Deng et al., 2008) and Tasselled cap (KT) (Han et al., 2007) have been also used for this purpose.However, these methods require the selection of a proper threshold to identify the changed regions and the features adopted by these conventional algorithms are hand-crafted, which may lead to poor image representations (Zhan et al., 2017).Support Vector Machine (SVM) is one of the most popular supervised algorithms used in satellite image classification (Dhingra , Kumar, 2019), (Kranjčić et al., 2019) due to its good performance and robustness when labeled samples are scarse.Additionally, random forest (Pal, 2005) and methods based on artificial neural networks (ANN) are also widely used (Maxwell et al., 2018).Recently, Deep Learning (DL) techniques have been successfully applied to Remote Sensing (RS) image analysis.Using Deep Neural Networks (DNNs), it is possible to learn multiple levels of data representation and to extract more robust and abstract features (Zhan et al., 2017), which usually provide more meaningful information than hand-crafted ones.In this sense, DNNs variants, such as Convolutional Neural Networks (CNNs) and Siamese Networks, are potential candidates for automatic deforestation detection.
In (Zagoruyko , Komodakis, 2015), the authors proposed and explored different CNN architectures to learn similarity functions between images pairs that implicitly suffered some transformations and other kinds of effects (due to e.g., rotation, translation, illumination, etc.).These algorithms presented good performances in comparison to methods based on hand-crafted feature descriptors.Examples of such algorithms are the Early Fusion and the Siamese CNN approaches, which were also used by (Daudt et al., 2018) to detect changes in urban areas.Similarly, (Zhang et al., 2018) successfully applied a Siamese CNN to identify building and tree changes, and also to distinguish between real changes from false ones caused by misregistration errors or false matches.
Moved by the success of DL methods for change detection applications, in this work, we adapt and evaluate Early Fusion and Siamese networks for deforestation detection in the Amazon rainforest.We take as baseline a binary SVM classifier for comparison purposes.
The remainder of this paper is organized as follows.Section 2 presents the change detection methods considered in this work.Section 3 describes the dataset and the adopted experimental protocols.The experimental results are presented in section 4 and some concluding remarks, which also point to future works are included in section 5.

CHANGE DETECTION METHODS
In this section, we shortly describe the methods evaluated in this work for deforestation detection: Early Fusion (EF) and Siamese Convolutional Network (S-CNN).

Early Fusion (EF)
The EF method is inspired by the CNN model proposed in (Daudt et al., 2018), which demonstrated good performance for change detection in urban areas.It is composed of several convolutions and pooling layers, followed by a fully connected (FC) layer, and a softmax layer to carry out the final classification.
The name Early Fusion is related to the concatenation of the images from two different dates, before applying the CNN model.The images are stacked along their spectral dimension to generate a unique input image for patch extraction.These patches are extracted in a sliding windows procedure.Then, the class label is assigned to the central pixel of each patch.The procedure is illustrated in (Figure 1).

Siamese Network (S-CNN)
The Siamese CNN is an adaptation of a traditional CNN, which comprises two identical branches that share the same hyperparameters and weights values (Zhang et al., 2018).
The architecture adopted in this work is inspired by (Daudt et al., 2018), which was also used for urban changing detection.Both input images are treated independently.Each branch of the Siamese network receives as input one patch cropped from corregistered image pair.The two outputs are concatenated producing the final feature vector (Zhang et al., 2018), (Zagoruyko , Komodakis, 2015).Such vector is the input to a classifier that assigns it to a class: deforestation and no-deforestation.Similar to EF, the class label is assigned to the central pixel of each patch.This process is summarized in Figure 2.

Data Set Description
The study area is located in BLA, more specifically in Pará State, Brazil, centered on coordinates of 03 • 17' 23" S and 050 • 55' 08" W. This area has facing a significant deforestation process that has been tracked and monitored by PRODES (Valeriano et al., 2004).Figure 3(c) shows the reference change map of deforestation occurred between December 2016 and December 2017.This data is freely available at the PRODES database (http://terrabrasilis.dpi.inpe.br/map/deforestation).However, some polygons of the reference were unconsidered because they had been deforested in the previous years.
The dataset comprises a pair of Landsat 8-OLI images, with 30m spatial resolution.We applied an atmospheric correction to each scene, and then, clipped them to the target area.The final images have 1100 × 2600 pixels and seven spectral bands (Coastal/Aerosol, Blue, Green, Red, NIR, SWIR-1, and SWIR-2).The first image is from August 2nd, 2016 (Figure 3(a)) and the second one from July 20th, 2017 (Figure 3(b)).These dates were chosen due to the lower presence of clouds, a common problem over all BLA region.

Experimental Setup
Our experiments relied on a pair of optical images acquired approximately one year apart from each other.
In addition, the Normalized Difference Vegetation Index (NDVI) was calculated for every pixel as in Equation 1.This index quantifies the presence and quality of vegetation and it is calculated using bands 5 and 4 for Landsat 8, corresponding to the spectral reflectance measurements acquired in the near-infrared and red regions.
The NDVI was stacked along the spectral dimension of the corresponding images, resulting in images with eight bands.
The spectral bands of each image were normalized to zero mean and unit variance.The input to EF was a tensor of a size of 15-by-15-by-16 and to S-CNN a tensor of a size of 15-by-15-by-8 in each branch and the input.We used as baseline a SVM classifier, whose input was a vector of dimension 15 × 15 × 16.In all cases, the patches were extracted using a sliding window procedure with stride equal to three.The window size for each method and the stride size were chosen empirically.Similar to (Zhang et al., 2018), we divided the input images into tiles.We obtained 15 tiles as shown in Figures 3(a) and 3(b).Tiles 1, 7, 9 and 13 were used for training, tiles 5 and 12 for validation, and tiles 2, 3, 4, 6, 8, 10, 11, 14 and 15 for testing.
The number of available samples of class no-deforestation was much higher than that of class deforestation.So, we performed data augmentation on samples of deforestation class.Each training pair was rotated by 90 To assess the influence of the number of training samples, we also considered three different scenarios: using only training samples from a single tile ( 13), from two tiles (1, 13) and from three tiles (1, 7, 13), yielding 717, 2,127 and 5,421 samples per class, respectively.
We selected the Radial Basis Function (RBF) as SVM kernel with the γ parameter set to 0.00027 based on following relationships: γ = 1 d , being d the number of features, as proposed in (Gola et al., 2019).The parameter C was set to 10.This choice was based on a k-fold cross-validation procedure, where k was set to five.The experiments were implemented and carried out in the Python environment using the SVM implementation of the Scikit-Learn (Pedregosa et al., 2011) library.
The CNN architecture used for EF approach is illustrated in Figure 4.It was composed of three Convolutional layers (Conv) with ReLU as activation function, two Max-pooling (MaxPool) layers and two Fully Connected layers (FC) at the end, where the last one is a softmax with two outputs, one associated to deforestation and the other one to no-deforestation class.Regarding the S-CNN model, the two branches comprises the same network architecture Figure 4, but it this case, the network has only a fully connected layer at the end, then, the vectors at the output of each CNN branch were concatenated to compose a new feature vector, which represented the image pair.
The parameter setup of the CNN was: batch size was set to 32 and the number of epochs was set to 100.To avoid over-fitting, we used early stopping to break after 10 epochs without improvement and dropout with rate set to 0.2 in the last fully connected layer.In contrast to (Daudt et al., 2018), where Average Stochastic Gradient Descent (ASGD) was used, we employed the Adam optimizer, which presented a better performance in our preliminary experiments with learning rate of 10 −3 and weight decay of 0.9.

RESULTS
Figure 5 summarizes the results of our experiments in terms of F1-score of class deforestation achieved by the three methods described in Section 2. The figure shows the performance obtained by each method for different number of tiles used for training.
S-CNN achieved the best performance in terms of F1-score in all experiments.As expected, the methods improved their performance as the number of training samples increased.When just one tile was used for training, we recorded F1-scores equal to 46%, 44% and 48%, for SVM, EF and S-CNN respectively.SVM outperformed EF but was still below S-CNN.This was not unexpected because SVM tends to generalize well under scarce labeled data.In contrast, when two, three and four tiles were taken for training, the EF and S-CNN presented better performance than SVM.With four training tiles, EF and S-CNN outperformed SVM in 10% and 13%, respectively, in terms of F1-Score.Clearly, the DL methods benefited from the increase of training samples than SVM.
We should bear in mind that, in the target application, the classes are highly unbalanced with a predominance of no-deforestation class.Then, under these conditions, the F1-score often tends to decrease for deforestation class.
The results in terms of Overall Accuracy (OA) are presented in Figure 6.As in the F1-Score, the results were improved when the number of training tiles increased.In all scenarios, scores above 90% were achieved.The scores went from about 95%, for one tile, to 97% when four tiles were used.In comparison to the F1-score results, the higher values for OA are related to the higher number of no-deforestation samples that were correctly classified.
Figures 7, 8 and 9 show the RGB composition of tile 2, 6 and 14, respectively, using four tiles for training.They show the tiles in both dates as well as the change maps delivered by each method.
The maps show that S-CNN better identified deforested areas (Figures 7-e, 8-e, 9-e).It achieved the highest true deforestation rate, so it presented a lower false deforestation rate than SVM and EF, demonstrating a more accurate result in these three tiles.On the other hand, EF produced the lowest number of false detections, but it did not correctly identify many areas that suffered deforestation as revealed in Figures 7-d, 8-d, 9-d.Notably, much of the false deforestation (reddish) and false no-deforestation (blueish) occurred at the borders of true detected deforested areas (yellowish).This type of error might have resulted from inaccuracies in the delimitation of deforestation polygons.Figures 7-c, 8-c, 9-c shows that SVM presented low performance for deforestation detection.The false deforestation rate was relatively high in all the cases: many pixels were incorrectly identified as deforested areas.We can also observe a salt-and-pepper effect in the SVM outcomes.The same aforementioned trends are presented in the rest of the test area.

CONCLUSIONS
This work reported an evaluation of recently proposed deep learning based methods for detection of deforestation in the Amazon forest.Three methods were tested: Early Fusion (EF), Siamese Convolutional Neural Network (S-CNN) and the Support Vector Machine (SVM), the last one taken as the baseline.We used as database a region of the Brazilian Legal Amazon, which has suffered under intense attacks in the last few years.In our experiments, S-CNN was consistently superior to its counterparts in terms of F1-score and Overall Accuracy.The difference to the second approach, EF, was in the range of 3% in terms of F1-score.Actually, in just one experimental setup SVM outperformed EF but not S-CNN by a small margin.Yet, in this case, as in all other tested configurations, S-CNN and EF were much superior to SVM in terms of F1-score.

Figure 1 .Figure 2 .
Figure 1.EF approach.Images at different dates (T1 and T2) are concatenated to produce an image pair; then, patches are extracted and fed to the CNN model.
change map: from December 2017 to December 2018.

Figure 3 .
Figure 3. RGB composition of the selected Amazon Forest region at dates T1 (a) and T2 (b); and the deforestation reference set (c).The study area is divided into 15 tiles.

Figure 4 .
Figure 4. Parameters of the EF and S-CNN architecture.
• , flipped in the horizontal and vertical axis.In addition, we applied an under-sampling technique on the majority class (no-deforestation) to balance the number of training pairs for both classes.This way, we obtained 8,118 training pairs for each class.