CLASSIFICATION OF VEGETATION CLASSES BY USING TIME SERIES OF SENTINEL-2 IMAGES FOR LARGE SCALE MAPPING IN CAMEROON

: Sentinel-2 satellites provide dense image time series exhibiting high spectral, spatial and temporal resolutions. These images are in particular of utter interest for Land-Cover (LC) mapping at large scales. LC maps can now be computed on a yearly basis at the scale of a country with efﬁcient supervised classiﬁers, assuming suitable training data are available. However, the efﬁcient exploitation of large amount of Sentinel-2 imagery still remain challenging on unexplored areas where state-of-the-art classiﬁers are prone to fail. This paper focuses on Land-Cover mapping over Cameroon for the purpose of updating the Very High Resolution national topographic geodatabase. The ι 2 framework is adopted and tested for the speciﬁcity of the country. Here, experiments focus on generic vegetation classes (ﬁve) which enables providing robust focusing masks for higher resolution classiﬁcations. Two strategies are compared: (i) a LC map is calculated out of a year long time series and (ii) monthly LC maps are generated and merged into a single yearly map. Satisfactory accuracy scores are obtained ( > 94% in Overall Accuracy), allowing to provide a ﬁrst step towards ﬁner-grained map retrieval.


INTRODUCTION
The increasing needs in food supply in the near future will require higher agricultural yields (Foley et al., 2011). Other factors increasing the pressure on agricultural lands are urban sprawl and densification as well as the production of bio-fuels (Godfray et al., 2010, Rounsevell et al., 2005, Searchinger et al., 2008. These pressures will also have negative consequences on natural ecosystems (Green et al., 2005, Tilman, 2001. Indeed, agricultural activities are a major cause of ecosystem degradation at the global scale (Benayas, Bullock, 2012), and therefore, land use change monitoring related to farming is crucial for sustainable land management (Schwilch et al., 2010). Vegetation classes extent estimates and vegetation classes mapping provide crucial information for vegetation classes monitoring and management. Remote sensing imagery in general and, in particular, high spectral, spatial and temporal resolution data as the ones which are available with spaceborne optical systems such as Sentinel-2, are suitable for this kind of application.
The current Sentinel-2 for Vegetation Class Mapping project at the National Cameroonian Mapping Agency aims at fully exploiting the Sentinel-2 observational capabilities and its offered possibilities for vegetation monitoring through the development of open source processing chains such as the ι 2 chain, capable of large-scale land-cover map production (Inglada et al., 2017). Three Earth Observation products -dynamic vegetation class masks, vegetation type, and vegetation class status -have been identified for this processing chain and will be demonstrated at the national scale.
The national Cameroonian mapping agency aims at implementing such a fully operational automatic Vegetation Class map production system using both time series of Sentinel-2 optical images and available Very High Resolution (VHR) aerial images. So far, the vegetation classes are established on the * Corresponding author basis of the visual interpretation of the latter images complemented with field surveys. The nomenclature of the vegetation classes is not yet exhaustive (5 classes). It is targeted at investigating how Sentinel-2 and aerial images can be jointly beneficial for this task since they exhibit complementary strengths. This paper focuses on the first step of this exploratory workflow: the automatic analysis of time series of Sentinel-2 images for yearly LC map generation. It aims at assessing the relevance of a state-of-the-art operational classifier to this specific context (Mallet, Le Bris, 2020). Indeed, a first difficulty lies in the lack of reference data at the spatial resolution of Sentinel images for the training of a supervised classifier. Most datasets exhibit higher spatial resolution which leads often to spatial discrepancies between both sources, detrimental for the learning step. The proposed solution consists in a direct use of existing Very High Resolution topographic database within the existing ι 2 LC classification framework. Another challenge consists in working on areas with numerous cloudy days and significant intra-annual variability (Mertens, Lambin, 2000). In this paper, we assess various solutions which consists in performing either direct yearly classification of time series of images or first classifying monthly datasets followed by a fusion decision step. Section 2 introduces the area of interest and the datasets. Section 3 presents the adopted workflow and the experimental setup, while Section 4 provides both a quantitative and qualitative assessment of the results. Conclusions are drawn in Section 5.

DATA AND STUDY AREA
In order to design a reliable fully operational automatic Cameroon LC map system out of Sentinel-2 time series, experiments are conducted over a study area in the Extreme-North (EN) of Cameroon.

Study area
Current experiments and analyses are limited to one specific site, located over Maroua3, Mora and Petté in the EN of Cameroon, (see Figure 1). Covering 189.072 km 2 , it was selected both for its challenging conditions and the availability of reference and Copernicus data. It has a short rainy season from June to September and a long dry season from October to May. The slope in the whole scene greatly varies with some hilly areas and rocks. The soil is sandy in the dunes with sandy clay at the median part levels, and essentially clayey at the shallow levels. The drainage is almost nonexistent. The essential waterways, Mayel Tchilé and Mayel Petté, are loosing their activities in September. The principal LC categories are vegetation (principally accacias, balanites, and goods of thorns), water areas, buildings and roads.

Data
2.2.1 Remote sensing data Sentinel-2 is the only image data source used in this work: 10 spectral bands from 10 to 60 m, (see Table 1). All Levels 2A and 2B data available through the Copernicus Open Access Hub 1 over the study area for the year 2018 were used. For each month, 4-7 images were available (Table 1). More specifically, a focus was made on a sub-area of 10 km × 10 km.

Reference data
The proposed approach relies on existing geodatabases available over the study area to build the reference datasets, needed for the supervised classification of the LC maps. The topographic LC database of the Cameroonian National Cartographic Institute (BDINC, simplified for our purpose) was the only one data source used in order to build a unique reference dataset, without a tedious harmonization step. In further experiments, five generic vegetation classes are considered for the classification: Tree Savannah (TS), Shrub Savannah (SS), Crop Area without Trees (CA), Crop Area with Trees (CAT), and Floodable Crop Area (FCA).

ι 2 framework
Land-cover map generation is cast as a supervised classification task as routinely adopted in the literature. The iota2 or ι 2 versatile framework (Inglada et al., 2017) is adopted for generating land-cover maps from generic remote sensing and reference data source. Indeed, it is specifically dedicated to process Landsat or Sentinel time series and it has proven to be an efficient and highly parametrizable open-source solution. ι 2 has already demonstrated its high efficiency in performing country wide LC classification out of multi-temporal high resolution Landsat and Sentinel imagery at pixel level (David et al., 2021). It can run either on standard desktop computers or High Performance Computing clusters. A per-pixel machine learning classification scheme is retained, on which simple post-processing steps can be plugged. Contextual classifiers or object-based approaches (Derksen et al., 2018) would have been interesting but have been discarded yet.

Random Forest
A Random Forest (RF) classifier is used (Breiman, 2001). This classifier is an aggregation of a set of decision CART trees. Indeed, to improve the performance of single CART classifier, RF classifier is based on two main principles: bagging and randomsubspace. A fusion of the different basic classifiers is carried out by majority vote: each decision tree predicts a label for a new sample to be classified and the label finally assigned to it is the one which receives the greatest number of votes. This study uses the RF algorithm as base classifier as it has proven to be efficient for land-cover discrimination at large scales under noisy labels, both in terms of computation times and classification quality (Rodriguez-Galiano et al., 2012, Pelletier et al., 2016. Several parameters come into play during the construction of a RF classifier: the number K of trees (arbitrarily set to a large value), the number m of attributes drawn during each cutting of the node of one of the trees (usually set to the value of the square root of the number of attributes), the maximal depth max depth of each tree, and the minimal number of samples min samples per node.

Process workflow
The proposed workflow can be decomposed into the next steps, detailed afterwards: 3.3.1 Data preparation This step ( Figure 2) consists in adapting the Copernicus Sentinel-2 data to fit the input format requirements of ι 2 . First, images bands are converted from .jp2 to .tif format using GDAL translate functionalities. Then, for each Sentinel-2 image, three different masks are computed in a manual way using the r.mask.rast command of the Raster(r. * ) library of GRASS in QGIS. They correspond to unusable image areas because of clouds (CLM R1*), saturation (SAT R1*) or diverse reasons (EDG R1*).

Feature computation
After data preparation, the feature vector used for the classification of each pixel has to be built. For each Sentinel-2 image, four standard features are computed out of the original bands: • Normalized Difference Vegetation Index (NDVI) (Tucker, 1979) to highlight vegetation cover; • Normalized Difference Water Index (NDWI) (Gao, 1996) to enhance water and wet areas; S2 L2A: Sentinel-2 Level-2A; I: total number of images per month; IS: images size (in pixel); TR: temporal resolution; B: spectral bands; SpeR: spectral resolutions (in µm); SpaR: spatial resolutions (in m); F: features variables; SB: spectral band; NDVI: normalized difference vegetation index; NDWI: normalized difference water index; SAVI: soil-adjusted vegetation index; NDRI: normalized difference ratio index; M: total number of features depending on the month (for January here). Selecting batchProcessing parameter in ι 2 configuration file improves feature computation times. At the end, each pixel is characterised by the original spectral bands (limited to the 10 bands exhibiting a native 10 or 20 m GSD) and the spectral indices corresponding to the different dates of the time series. For a yearly time series, it amounts to 760 features: 10 spectral bands plus 4 indices on 70 dates (Table 1). For monthly and yearly classifications, all features are stacked into a single set.

3.3.3
Training the classifier As the proposed approach relies on a supervised RF classifier, training is an important step of the workflow. It consists in (i) sampling data to obtain a training set and (ii) building a classification model per image that will be applied in the next step (Section 3.3.4).
Training set design. Five generic vegetation classes Tree Savannah (TS), Shrub Savannah (SS), Crop Area without Trees (CA), Crop Area with Trees (CAT) and Floodable Crop Area (FCA) are considered in these experiments. Training samples are extracted for these classes from the reference dataset (Section 2.2.2), i.e., the existing Very High Resolution national topographic database. Using existing databases as training data has proved to be a suitable solution (Gressin et al., 2014, Postadjian et al., 2017, even if noisy labels may appear due to misregistration, changes and geodatabase specifications. Pixels are randomly taken as a 20% subset of each class, keeping the same ratio between initial reference and training sets. In practice, a maximum sampling of 1,051,919 pixels is performed over the area covered by the (month or year long) Sentinel-2 time series. The number of training samples selected for each class is proportional to the area covered by this class over the useful parts of the image. These areas are different (see Figure 4) depending on classes and so the class samples are unbalanced.
Building the RF model. In supervised classification, the balance between class samples is important. Data augmentation is used here, generating synthetic samples with jitter (strategy jitter standard factor of 10) and smote (strategy smote neighbors of 5) methods by using the minNumber samples strategy to set the minimum number of samples by class required. RF is not very sensitive to the parameter choice (Pelletier et al., 2016). Thus, it is here applied with the standard ι 2 configuration (see Table 1): a number K of 1,000 trees, a number m of features randomly selected at each node equals to the square root of the total number of features, a maximum depth max depth for each tree of 25 levels and a minimum number min samples of 5 samples per node have been used. RF was training with the polygons of classes in the study area sub-tile and testing on the entire T33PVN tile.

Prediction This step simply consists in classifying
Sentinel-2 data according to model obtained at the previous step. Here, a LC map was generated for year 2018 at a spatial resolution of 10 m over the areas of interest for the five generic vegetation classes listed above.  Bris et al., 2019). It consists in merging classifications by letting each of them vote for its label and choosing the final label as the winner of this majority vote. Such vote can be applied to hard label classification results, but it can also exploit confidence information associated to these maps. This strategy is here applied to the twelve monthly hard label classifications (assuming a perfect coregistration).

Experiment set up
This work aims at demonstrating the ability of state-of-the-art RF classifier to classify time series of Sentinel-2 for large scale land cover mapping in the specific Cameroonian context. Several strategies are considered and compared. On one hand, a LC map is calculated out of the complete year long time series for 2018. On the other hand, the year is split into epochs (months), and one map is computed for each epoch. The quality of these twelve monthly classifications is then assessed to identify whether some periods are more prone to deliver good results. At the end, these monthly classifications can be merged into a unique map. One interest of such per month strategy compared to year-long time series analysis states in its ability to handle smaller amount of data for each classification.

Quality assessment
Results are all assessed quantitatively and qualitatively. 3.5.1 Qualitative assessment A qualitative and visual assessment of obtained results is done. Indeed, it may sometimes be useful to assess the quality of the classifications locally (here in a given yellow squared area of the study area ( Figure 5)). For this aim, a classification indicating the confidence of the RF classifier on the decision for each pixel of the classification can be provided. Finally, a visual evaluation in order to highlight anomalous characteristics will also be presented. 3.5.2 Quantitative evaluation Classification results are compared to the reference data (section 2.2.2). Then, several classic metrics are derived from the confusion matrix : Overall Accuracy (OA), Kappa coefficient (K) and F-Score averaged (Cohen, 1960), (Fleiss, Cohen, 1973) for global assessment, as well as F-Score, producer's and user's accuracies for per class analysis.

RESULTS AND DISCUSSION
The different classifications described in Section 3 were calculated over the study area for the year 2018 and evaluated as explained in Section 3.5. A visual assessment of a yellow squared particular area displayed was also proposed to highlight some phenomena which can not be detected with the metrics. Figure 4 presents the mean F-Score and Kappa coefficients as global metrics as well as the per-class F-Score, recall and precision quality metrics for the two classification modes (monthly and yearly) and for the fusion scheme. These metrics were calculated over one run using the same training and validation sets (of 1,051,919 pixels each). Kappa coefficients are very good with values above 0.81 (0.90-0.99 for monthly classifications, 0.99 for the yearly one and 0.97 for the fusion). Table 15 shows the confusion matrix for the fusion of monthly classifications. It can be highlighted that the Overall Accuracy and the Kappa coefficient values are 98.10% and 0.97, respectively. One can observe that the May and yearly classifications yield the same better results, followed by the September and November ones, than the other monthly classifications and the fusion of the twelve monthly classifications for both metrics. The fusion classification results are also better than the January, July, October and December ones, which validates the necessity of the yearly synoptic view of Sentinel time series.

Quantitative evaluation
When considering per-class metrics, the F-Score values are very good for the classes Tree Savannah (TS), Shrub Savannah (SS) and Crop Area without Trees (CA) for all the classifications modes. The results are just good for the class Crop Area with Trees (CAT) for January, July, October and December classifications. The results are moderate with the Floodable Crop Area (FCA) for October and December, respectively, mainly due to the limited training samples of these two last vegetation classes (see Figure 3). It can also be shown from Figure 4 that the improvement for these minority classes (CAT and FCA) mostly states in the precision value, i.e., underdetection is reduced.
Tables 2, 5, 7, 11, and 13 show the confusion matrices for the January, April, June, October and December classifications, respectively. They allow a detailed analysis of the errors for monthly classifications. It can be highlighted that:   Table 14 shows the confusion matrix for the full year 2018 classification. Several aspects of this improvement can be highlighted: • the confusion between Crop Area with Trees (CAT) and Crop Area without Trees (CA) is considerably reduced to only 1.18%; • the confusion between Floodable Crop Area (FCA) and Crop Area without Trees (CA) decreases drastically to 0.18%; • the confusion of Shrub Savannah (SS), Crop Area with Trees (CAT) and Floodable Crop Area (FCA) with Tree Savannah (TS) decreases to 0.20%, 1.39% and 0.40% respectively.
In parallel, Table 15 shows the confusion matrix for the fusion of the twelve monthly classifications. One can note that: • the confusion of Crop Area with Trees (CAT) and Floodable Crop Area (FCA) with Crop Area without Trees (CA) decreases to 0.42% and 0.20% respectively; • the confusion of Shrub Savannah (SS) and Crop Area with Trees (CAT) with Tree Savannah (TS) decreases to 0.25% and 0.42% respectively; • no confusion between Floodable Crop Area (FCA) and Tree Savannah (TS) in both directions.

Classification confidence
The RF classifier associates a confidence (unsupervised margin) calculated out of the distribution of the labels predicted by the trees (Frenay, Verleysen, 2014) of the forest. Figures 5 show the predicted yearly LC map and the yellow squared areas with its corresponding confidence map. One can observe that the confidence is influenced by the proportion occupied by the vegetation class samples in the reference data (see Figure 3). For example, the Crop Area without Trees (CA) class in that specific yellow zone is easily recognised, with high confidence. Besides, the Tree Savannah (TS) class generally exhibits high confidence, which is encouraging for a use of such information.
It is useful to study how confidence values are related to correct and incorrect classifications. For that, the distribution of confidence values was calculated for only the learning sample categories for yearly, (January-April 2018), (May-August 2018) and (September-December 2018) classifications, respectively. One observes that all the learning samples were correctly classified so there was not incorrectly classified learning samples for each of these classifications. However, these correctly classified samples have different behaviours for confidence values lower than 40% and higher than 60% and the same behaviour between 50% and 60%. We have more pixels (<500) which have confidence values lower than 40% in the January, July, October and December classifications compare to February, March, April, June, August, September and November ones, where we have less and less pixels (<200) having these confidence values respectively. May and yearly classifications do not have pixels having confidence values lower or equal to 40%. The number of pixels with confidence values between 60% and 100% is decreasing for all classifications. We have much more pixels with low confidences values in January, July, October and December classifications compare to other months with the exception of May with the yearly classifications where most of the pixels have high confidence values. These results are consistent with the quantitative evaluation demonstrating that the May and yearly classifications exhibit better results than the other monthly ones. October also exhibits the worst accuracy among the per month results. This difficulty to classify the October month could be due to the presence of clouds and cloud shadows associated with bad landscape in our study area. Thus, introducing confidence to weight the majority vote would probably improve fusion results.

Visual analysis
Quantitative assessment is not able to reveal alone several kinds of errors present in the classifications. Conversely, a visual analysis makes it possible to identify other phenomena which are analysed in this section. Indeed, important parts of the study area are not labeled in the reference databases, and thus not included in the quantitative evaluation.
For the considered nomenclature, the study area is globally not difficult to classify using 10 m

CONCLUSION
This paper presented multiple experiments for the fully automatic production of vegetation maps at regional scale out of Sentinel-2 images time series using the ι 2 supervised classification chain. This approach uses all available Sentinel-2 image ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume V-3-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France data (no scene selection in terms of appropriate dates or cloud cover) and uses existing geodatabases as reference training and validation data for a supervised classification process robust to errors in the reference data (e.g., out of date databases). The process is efficient (enabling a fast delivery of the classifications after the acquisition of the satellite images) and not . To conclude, the use of fusion of monthly classifications is a good trade-off between accuracy et computational time. This work was conducted in the context of defining a joint use of airborne and free satellite images to update and enrich existing LC data. The advantage of the proposed approach resides in the regular availability of new images provided by Sentinel-2 to reduce the uncertainty in the detection of study areas where airborne acquisitions and advanced analysis are mandatory. The considered legend was rather simple, including Tree Savannah, Shrub Savannah, Crop Area without Trees, Crop Area with Trees, and Floodable Crop Area vegetation classes. Despite the obtained accuracy is higher than most comparable (in terms of areas covered and number of classes) state-of-the-art approaches, Crop Area with Trees and Floodable Crop Area (corresponding to thin objects) suffered from poor recognition and advocates for the integration of VHR imagery. On the opposite, better results (90%) were obtained with the Crop Area without Trees class.