A COMPARATIVE STUDY OF MACHINE LEARNING CLASSIFIERS FOR CROP TYPE MAPPING USING VEGETATION INDICES

: Timely and accurate mapping of crops is crucial for agriculture management, policy-making, and food security. Due to the differences in the product calendars of various crops, it is possible to classify them by investigating the remote sensing Vegetation Indices (VIs) during crop growth season. This study developed a VI-based mapping approach to specifying crop types based on phenological and spectral metrics derived from the sentinel-2 images. We used six spectral VIs (ARVI, CVI, EVI, LAI, GLI, and NDVI) in three supervised machine learning methods, including Random Forest (RF), GBoost (GB), and K-Nearest Neighborhood (KNN) for crop mapping. Field data consisting of wheat, barley, canola, vegetables, and a bare land class, were collected as the testing and training data set. The classification results were evaluated through test samples showing high overall accuracy (OA) and satisfactory class accuracies for the most dominant crop types across different fields despite the variability of planting and harvesting dates. Among the VIs utilized to crop mapping, the Atmospherically Resistant Vegetation Index (ARVI) in all three classification methods achieved better results. The overall accuracy of RF, GB, and KNN models with the ARVI index was 95%, 88%, and 90%, respectively.


INTRODUCTION
According to a statistical report published by the Food and Agriculture Organization (FAO) of the United Nations, the cereal harvest in Asia in 2020 was 1448 million tons.Iran produces 1 percent of harvested cereal in Asia (FAO, 2020).The most widespread crops in Iran, especially in the provinces with limited water resources such as Qazvin, are wheat, barley, and canola.In addition to these crops, vegetables such as green peas, cabbage, and carrots are also cultivated in small fields (Ahmadi et al., 2020).Crop Types map is the crucial instrument for the yield production estimation.With the increasing freely-available satellite imagery, the material and temporal cost of crop monitoring, health assessment, yield estimation, and area under cultivation mapping has been comprehensively reduced.One of the challenges facing food-management organizations worldwide, especially in Iran, is estimating the area under cultivation of products to satisfy the annual needs of communities.Therefore, many studies have been done on these issues.For example, Amani et al. (2020)used the GEE cloud platform, due to the large volume of processing data, in part of Canada and implemented the neural network algorithm on the Sentinel 1 and 2 images and achieved 77% overall accuracy (OA).Moreover, Yan et al. (2021) proposed a method based on discrete grids with machine learning (ML) to integrate GaoFen-1 and Sentinel-2 imagery.The proposed method obtained 86% and 88% accuracy for crop type classification in Northeast China.A comprehensive study compared different ML and deep learning algorithms with a dual attention deep neural network for crop mapping in Iran.The outcomes demonstrated the best OA of 98.54% for the Aq-Qala agricultural area (Seydi et al., 2022).Any data correlating with vegetable growth can be used as a proper source for monitoring the seasonal growth cycle of crops.Optical satellite sensors with spectral bands in the red edge, such * Corresponding author as the Landsat series, Sentinel-2, and MODIS, allow us to study crop growth cycle.The acquired images in the crop growth season have been used to generate various vegetation indices, enhancing the crop type mapping results.These spectral indices can be introduced as input to the ML algorithms (Rostami et al., 2022a).For example, Liu et al.(2018) used eight MODIS coarse resolution images to compute EVI in the growing season of Henan province of China and achieved the OA of 84% by the global threshold model.The ML and deep learning algorithm can develop a model that simulates the relation between output and different variables as inputs (Aghdami-Nia et al., 2022;Rostami et al., 2022b).Nevertheless, ML algorithms have parameters that should be optimized with some methods (Ansari and Akhoondzadeh, 2020;Ranjbar et al., 2021).There are methods for optimizing ML parameters, such as genetic algorithms, some empirical formulas (Arabi et al., 2022), and grid search.Grid search can explore a wide search space and yield satisfactory results (Hosseini et al., 2012).As mentioned above, most studies on cropland mapping were conducted over traditional spectral index time series and used dispersed vegetation classes.This study classified essential crops in the study area and reduced different classes of vegetables to one class with similarities in the growing season and spectral reflection to achieve the most accurate results in crop types.The primary purpose of this study is to map crop type in the Qazvin plain region using different VIs obtained from the Sentinel-2 images.The proposed method includes the following sections: (1) Finding the most efficient spectral indicators for crop mapping.(2) Investigating the 8-month spectral indices of each crop for early detection of products.(3) Comparing several ML methods (i.e., Random Forest, GBoost, and KNN) and optimizing the parameters of algorithms with Grid-search methods.(4) Providing an accurate multiclass map of crop types and evaluating by in situ data.

Study area
Qazvin plain includes fertile fields located 150 km west of Tehran, at the geographical distances between latitudes 36֯ 00' 00"N to 36 00' 20"N and 49 40' 00"E to 50 '35 35"E longitude (Figure 1).The study area comprises arid and semi-arid climates with average annual precipitation of 234.1 mm.Hence, it includes most irrigated agricultural systems.In the irrigation network of Qazvin plain, which has an area of 43,000 acres, including wheat, barley, colza, alfalfa, and in smaller areas, tomatoes, cucumbers, watermelons, eggplants, and onions that named agricultural-vegetable in classification.This study compares different classification approaches for winter crop mapping along the Qazvin plain.

Field Data
We used a handheld global positioning system (GPS) with a positional accuracy of <5 m to record the locations of the samples.Crop type of 105 fields in the study area is acquired.The surveyed fields contain five classes: wheat, barley, rapeseed, agricultural-vegetable, and bare fields.Then, for the training phase, the collected data were divided into two parts: training (70%) and testing data (30%).We used only training data to learn algorithms and test data for accuracy assessments.The number of pixels per class for training ML algorithms is mentioned in Table 1.
Table 1.The number of training and test pixels collected for each class using in situ data.

Satellite Data
This study employed Sentinel-2 optical satellite images to generate spectral indicators suitable for studying crop types.Sentinel-2 is a European multispectral satellite developed in collaboration with the European Space Agency and the European Commission initiative Copernicus.Launching B series satellites decreases this sensor's revisit time by 3-5 days, which is ideal for precise monitoring of agricultural lands.We considered the winter crop year from the beginning of October 2021 to the end of May 2022, which is the harvest time for winter crops.As a result, eight images were selected with cloud coverage of less than 5% as well as nearly one-month intervals.Furthermore, we computed six VIs for selected eight images, including Atmospherically Resistant Vegetation Index (ARVI), Chlorophyll Vegetation Index (CVI), Enhanced Vegetation Index (EVI), Leaf Area Index (LAI), Green Leaf Index (GLI), and Normalized Difference Vegetation Index (NDVI) which described in Table 2.

METHODOLOGY
Crop-type mapping is possible by investigating the phenology of each parcel of agricultural land.For this purpose, we have compared five spectral indices related to vegetation and a biophysical variable called leaf area index (LAI).The 8-month images of these indicators were used for the whole study area and employed as an input dataset.Several ML classifiers, including RF, GBoost, and KNN, were utilized for the classification task.
The grid search technique determined the optimum value for parameters of each ML algorithm.Selected parameters were used in all trained models.The crop type mapping framework is illustrated in Figure 2. Furthermore, the output maps of each model can be seen in the results section.

Classification methods
Mapping is one of the most important motivations for acquiring satellite imagery.Several methods can accomplish classification, which is the first step in producing a map.In recent years, various ML models been developed to classify satellite images using massive data with high complexity.As discussed, three state-of-the-art ML methods for classifying agricultural fields were selected to compare this study's mapping results, which are explored in the following subsection.

Random Forest
RF is one bagging method that includes several subclasses to classify the subsets of data selected randomly from the input data (Scornet et al., 2015).The subclass classifiers in the RF algorithm are the decision tree.The voting process occurs between the predictions of all the trees, and the class with the most votes among the trees is introduced as the prediction output (Ghorbanian et al., 2020).

GBoost
The boosting algorithms use several learning methods to obtain sequential hypotheses, each focusing on complex or incorrectly predicted data in the previous step.The GBoost algorithm, based on boosting learning methods developed by Friedman, can also be used in multiple class problems (Friedman, n.d., 2001).A new weak learning model is added and fixed in each training stage to reduce the loss function.Generally, GBoost involves three elements: (1) A loss function to be optimized.(2) A weak learner: decision trees are used in gradient boosting.
(3) Additive Model: trees are added one at a time, and existing trees in the model are not changed.A gradient descent procedure minimizes the loss when adding trees (Zarei et al., 2021).

K-Nearest Neighbourhood
KNN is a supervised ML method that can solve regression and classification problems.In classification, the number of classes in the data must be introduced into the algorithm (Peterson, 2009).The algorithm then considers a representative for each class according to the training data.The algorithm calculates the Euclidean distance between the new data features and the class representative.The class number with the shortest distance from the data is taken out of the algorithm as a prediction.Different versions of this algorithm use different methods of calculating distance and similarity.

EXPERIMENT AND RESULT
Regarding the impact of in-season vegetation changes on spectral bands, using different VIs in crop growing season, which are obtained from various band ratio combinations, is a suitable method to classify in-season plants.The most critical point in this study is to select the best VI with the highest classification accuracy in all scenarios

Crop mapping results
The results of crop mapping based on selected VIs and ML algorithms are illustrated in Fig 3 and 4. In addition, in-situ collected data from the study area was also provided as a ground truth map for comparative analysis.The results showed that the map produced by RF extracted more precise land boundaries than other ML methods.Additionally, the KNN classifier could not delineate different classes with a high level of accuracy, and the produced map included salt and paper errors.Generally, ARVI provided the most outstanding results among the other VIs based on visual analysis.The leading cause of the superiority of ARVI2 is an atmospheric correction used in this index, which resulted in better discrimination of various types of crops. .

Within class comparison
As is clear from ground truth data and generated maps (

Assessment of the mapping accuracy
The statistical accuracy assessment of the crop mapping using different accuracy measures is summarised in Table 3.The effect of different indices on crop classification was also investigated.
As is clear from Table 3, the lowest accuracy was related to the GLI (GBoost-OA = 80%, RF-OA = 81%, and KNN-OA = 80%), and the highest accuracy in RF and KNN was associated with the case of using ARVI (RF-OA = 95% and KNN-OA = 90%).Three indices, NDVI, EVI, and LAI, used spectral band-4 and band-8 of Sentinel-2 to show satisfactory results; however, there are still slight differences.Overall, all ML classifiers obtained more than 80% OA in every VIs.In particular, the RF classifier outperformed other state-of-the-art methods in statistical assessment.Moreover, similar to visual interpretation, ARVI provided the most efficient performance compared to the other VIs.

CONCLUSION
Accurate crop mapping is essential for food security and land use management.However, it is still a challenging task in remote sensing due to the complicated phenological features of croplands.This study compared six spectral vegetation indices to map the crop type of Qazvin plain in Iran and evaluated results with in-situ data for accuracy assessment.Moreover, we investigated the efficiency of ML methods for mapping the four most common crop types in the study area.The visual and statistical analysis indicated that all VIs have satisfactory mapping results with more than 80% OA in every ML model.Furthermore, ARVI increases accuracy measures by more than 90% in RF and KNN ML classifiers.Salt and paper noise is quite evident in the KNN-produced crop maps, although its classification accuracy is comparable with RF.Future studies can improve the performance of ML methods with spatial filters for noise reduction and more reliable boundary extraction in croplands using morphological post-processing.

Figure 1 .
Figure 1.location of Qazvin plain in Iran.

Figure 3 .
Figure 3.Comparison of the result of crop mapping using three algorithms (RF, GB, KNN) with CVI, NDVI, and EVI inputs.The red boxes indicate the same locations to further visual analysis.a) Collected in-situ data as ground truth, b) crop maps produced using ML methods and VIs.

Figure 4 .
Figure 4. Comparison of the result of crop mapping using three algorithms (RF, GB, KNN) with ARVI, LAI, and GLI inputs.The red boxes indicate the same locations to further visual analysis.a) Collected in-situ data as ground truth, b) crop maps produced using ML methods and VIs.
Fig 3 and 4), barley and wheat are the dominant crops cultivated in the region.Hence, distinguishing these two crops in the study area is crucial in generating crop-type maps.Red boxes in Fig 3 and 4 indicate a specific area containing barley and wheat fields adjacent.The RF and KNN offer the most accurate results in terms of distinguishing barley and wheat, even with GLI that illustrate the lowest precision in mapping (Table3).All generated maps by GB have omitted the whole or a part of the barley fields, especially with GLI input

Table 3 .
Accuracy assessment of different ML methods for crop mapping with various inputs.The bold values show the highest accuracies (OA: overall accuracy, KC: kappa coefficient).