A SPECTRAL-SPATIAL AUGMENTED ACTIVE LEARNING METHOD FOR HYPERSPECTRAL IMAGE CLASSIFICATION

: In this paper, a new classiﬁcation technique for hyperspectral images (HSIs) based on an augmented active learning (AL) is introduced. The proposed method consists of two main steps: ﬁrst, a 2-D non-subsampled shearlet transform (NSST) is applied to each spectral band of HSIs to extract the spatial features. After that, the kernel minimum noise fraction (KMNF) is used to reduce the spectral dimension. Second, the classiﬁcation task using an augmented active learning technique is performed. For this purpose, an iterative process is considered. At each iteration, a discriminative sample selection and augmentation are used to create the training set. Then, the support vector machine (SVM) is iteratively applied to the training set. In the proposed method, the most informative samples are selected by a new query function combination of a posterior probability-based uncertainty and angle-based diversity criteria. The augmentation strategy during the training process is chosen by two-sample Kolmogorov-Smirnov test and the existing outliers are removed by k-means clustering. Finally, the proposed algorithm is applied to the real datasets and compared with three state-of-the-art AL algorithms. The obtained results show that the proposed method signiﬁcantly increases accuracy considering the most informative samples.


INTRODUCTION
Recently, hyperspectral image classification especially supervised approaches has gained particular attention in many practical applications such as agriculture, mineralogy, environmental studies, etc. (Camps-Valls et al., 2014).The accuracy of supervised techniques is highly dependent on the quality of annotated dataset which is provided by user.Preparing enough labeled samples is really time-consuming and an expensive process (He et al., 2017).
Active learning (AL) approaches could significantly help to increase the classification accuracy by detecting the most informative pixels during the training and decrease the labeling effort.In fact, AL selects the most informative unlabeled samples from a data pool to refine the learning performance (Persello and Bruzzone, 2014).In recent years, AL has been extensively studied in the field of hyperspectral image classification.At first, a semi-supervised multinominal logistic regression model combined with entropy (EP)-based active selection strategy (Li et al., 2010) was presented.Then, the AL strategies combined with Bayesian classification method and loopy belief propagation technique (Li et al., 2011b, Li et al., 2011a) were investigated.After that, an AL framework based on Markov random field (MRF) (Sun et al., 2015) was introduced.In addition, some works based on the combination of the AL strategy and deep learning have been studied for HSI classification (Li, 2015, Liu et al., 2016, Haut et al., 2018).Particularly, a technique that integrates a multiclass level uncertainty (MCLU) active learning criterion with a stacked autoencoder (SAE)-based neural network (Li, 2015) was designed.In (Liu et al., 2016) a strategy to join the restricted Boltzmann machine (RBM) with a weighted incremental dictionary learning criterion was proposed.A method that utilizes six AL sampling criteria, such as maximum EP, breaking ties (BT), random acquisition, mutual information (MI), etc., with Bayesian-convolutional neural network (BCNN) was presented (Haut et al., 2018).In (Paoletti et al., 2020), the performance of capsule networks (CapsNets) was enhanced and led to better results of HSI classification by utilizing a new AL method based on BT criterion.Additionally, a decoupled network with an active learning strategy (DCN-AL) is introduced (Bai et al., 2020).This technique considers both intra-class and inter-class variations and extracts features more efficiently.
Based on the number of informative samples which are elected at each iteration, two kinds of strategies for AL algorithms are considered: the single and batch modes.At each iteration, sigle mode selects the single most informative sample while the batch mode selects a batch of the most informative ones.The batch mode strategy usually achieves higher classification accuracy (Tuia et al., 2009).
Recently, two main criteria called uncertainty and diversity have been introduced.They are integrated to select the most informative samples (Patra and Bruzzone, 2011).
The uncertainty criterion selection aims at selecting a batch of unlabeled samples that have the lowest classification confidence.The uncertainty based batch mode approaches are divided into three heuristic categories (Tuia et al., 2009).First, posterior probability-based techniques such as best versus second best (BvSB) (Li and Zhang, 2016), BT (Ahmad et al., 2019) and Kullback-Leibler (KL)-Max strategy (Jun andGhosh, 2008, Rajan et al., 2008) which calculate the class confidence to evaluate classification uncertainty of each unlabeled sample (Yu et al., 2015).A well-known AL method in this category that considers both spectral and spatial information using maximum a posterior marginal (MPM) and loopy belief propagation (LB) then utilizes BT to select informative samples is MPM-LBP-BT (Li et al., 2013).Second, large margin based approaches for example margin sampling (MS) (Tuia et al., 2009) and MCLU (Demir et al., 2011) which relies on SVM specificities and measures the distance of samples to hyperplane to determine the uncertain values.Third, committee-based approaches such as maximum disagreement (MD)-based criteria (Zhou et al., 2016) which calculates sample uncertainty by considering the incompatible assumption between each committee.
Recently, different diversity approaches based on the closest support vector (Wang et al., 2017), angle (Demir et al., 2011) and clustering (Demir et al., 2011) have been considered to reduce the correlation among the uncertain samples.One of the methods that utilizes both MCLU uncertainty and angle-based diversity (ABD) of data to select the diverse informative samples is MCLU-ABD (Demir et al., 2011).
In this paper, a new multi-criteria AL method which is called spectral-spatial augmented AL (SSAAL) has been proposed.It includes two main steps: 1) The spectral-spatial features are extracted by applying 2-D non-subsampled shearlet transform (NSST) (Lim, 2010) and kernel minimum noise fraction (KMNF).The 2D-NSST is applied to each spectral band of HSI to extract the spatial features.After that, KMNF reduces the spectral dimensions.
2) A new multi-criteria batch mode augmented AL algorithm is applied to the selected spectral-spatial features in order to determine the most informative samples.These samples are selected by a query function which is a combination of a new posterior probability-based uncertainty and an angle-based diversity criteria.Data augmentation (DA) is also considered to increase the informative samples for insufficient labeled classes.DA is iteratively applied to the selected samples during the training process and significantly increases the supervised classification accuracy.The outlier samples are defined by k-means clustering and reduced.
The proposed method is compared to the well-known state-ofthe-art techniques.The obtained results demonstrate superior classification performance of the proposed method.
The remainder of this paper is organized as follows: section 2 presents the proposed algorithm.The experimental results are discussed in section 3. Concluding remarks are provided in section 4.

Spectral-Spatial Feature Extraction Using NSST and KMNF
The NSST contains two kinds of non-subsample filter banks that are iteratively applied to extract spatial features: pyramid and shearing filters.Pyramid filters divide the image into approximate and detail images which are the same size as the original image for a predefined level of decomposition.Shearing filters decompose the detail images into a number of shearing directional sub-bands.Also, at each iteration of the process, the obtained low-frequency sub-band is again divided into a lower scale highfrequency and low-frequency sub-bands (Soleimanzadeh et al., 2018).
In this paper, 2D-NSST is applied to each band of X ∈ R I 1 ×I 2 ×I 3 as the HSI with the length of I1, width of I2 and I3 bands.Shearlet transformation of the bth band of the image is defined as: After that, in order to reduce the spectral dimension, one of the most popular nonlinear dimensionality reduction techniques "KMNF" (Gao et al., 2017) is used.It consists of two consecutive principle component analysis (PCA) transformations.At first, the prior one estimates the covariance matrix of the noise in the data based on a strong relationship between adjacent pixels to decorrelate and rescale the noise from the data.Then, the latter is a standard PCA transformation which is applied to the matrix and arranges the bands with respect to the signal to noise (Priyadarshini et al., 2019).KMNF is applied to XNSST as follows: In the proposed method the important shearlet coefficients that contain 99% of the energy are preserved and the rest of them are discarded.

Augmented AL Algorithm
The obtained spectral-spatial features of the previous step are used as the input of augmented AL algorithm.These features are randomly divided into three categories: initial training set L, unlabeled pool U , and testing set T .During the augmented AL process, at each iteration of algorithm, a batch of m most informative samples among the samples in U is selected by a query function, it is added to the augmented initial training set LA and trained until the maximum number of iterations is achieved.In the following, the algorithm will be explained in more detail.

Uncertainty Criterion
A one-against-all SVM is simultaneously used to determine uncertainty criterion and perform the supervised classification (Melgani and Bruzzone, 2004).The class membership probability-based uncertainty (CMPU) criterion is applied to the unlabeled samples in U to define the samples which are confused with the class membership.The uncertainty measure of each unlabeled sample x ∈ U is related to its classification confidence.For this aim, c binary SVM classifier corresponding to c information classes is iteratively applied to the labeled samples.After training, based on the c obtained decision hyperplanes of SVM, c functional Euclidean distances fq (x) , q = 1, 2, ..., c for each sample to c hyperplanes are calculated.By utilizing Platt scaling (Lin et al., 2007) and fitting a Sigmoid model to these functional distances, the probability of class membership for each x is calculated.The produced class probabilities in this way can be denoted as: where P (y |x ) defines the best class membership probability of sample x and P (y |x ) denotes the second best class membership probability of it.They are the class probabilities between the most confused classes y and y .a and b are two scalar parameters.
By measuring the ratio of these values, CMPU criterion is obtained.This criterion defines classification confidence of samples and the ones which have the lowest value of it are selected as the most uncertain samples.CMPU is defined as: 2.2.2 Diversity Criterion It is used to select the samples with low redundancy among n uncertain samples.In this paper, ABD in the kernel space (Demir et al., 2011) is used to calculate the diversity of uncertain sample as follows: where K (xi, xj) is a nonlinear kernel map function between two samples xi and xj.

Query Function
In the proposed method, a weighted query function based on integration CMPU and ABD criteria for defining a batch of m(m < n) most informative samples is considered. (6) By exploiting objective function ( 6), the most informative samples (x1, x2, ..., xm) are extracted at each iteration of SSAAL algorithm.In this function, the parameter λ, (0 ≤ λ ≤ 1) controls the relation importance of the two terms and the optimal value of it for each dataset is utilized.The details of selecting optimal λ are discussed in section 3.2.

Data Augmentation (DA)
Transformation such as scaling, flipping and rotating are used to increase the insufficient labeled samples during the training process.In the proposed technique, to reduce the running time and prevent overfitting, a new approach of DA is applied to the initial training samples and the most informative samples that are selected at each iteration by query function (6).For this purpose, the non-selective samples of HSI become zero and a new hyper-cube is created.Then, different augmentation methods are applied to the hypercube and the obtained non-zero pixels are extracted and added to the training samples.In addition, the same distribution of the original and the augmented samples is an important point that should be considered.Therefore, the two-sample Kolmogorov-Smirnov test (Labadi et al., 2014) is considered for selecting the most effective augmentation methods.It measures the difference between cumulative distribution functions (CDFs) of two existing training and augmented sample vectors and uses the maximum absolute value of it.It can be defined as follows: where xt and xa are training and augmented sample vectors with sizes t and a, respectively.F1(xT ) and F2(xA) are their corresponding CDFs.
If d is lower than the critical value da,t (Knuth, 2014) at a significant level α = 0.05, xt and xa belong to the same distribution and the augmented samples can be added to the training samples.This critical value can be denoted as: Some of the augmented samples are outliers.These samples have a high uncertainty.However, they can decrease the classification accuracy.Therefore, in the proposed method k-means clustering is used to reduce the outlier effect (Wu and Prasad, 2016).This method allocates samples to k clusters and removes the samples that belong to the smallest cluster by considering equation ( 9): In this equation, C = [C1, C2, ..., C k ] is an ascending order set that represents the number of cluster members for k cluster and the members of the smallest cluster are outliers if only th < 0.5.
The pseudo-code of SSAAL algorithm is summarized in the Algorithm 1.

Datasets
The proposed technique has been applied to the three real datasets.

Indian Pines Scene (IP) It is acquired by the Airborne
Visible/Infrared Imaging Spectrometer (AVIRIS) sensor in 1992.This dataset is composed of 220 spectral bands with wavelength varying 0.4-2.5µmand 145 × 145 pixels with a spatial resolution 20m × 20m.In our experiment 20 bands due to noises and water absorption are eliminated, resulting in 200 bands.This dataset contains 10249 total available samples belonging to 16 classes (see Table 1).The false color composition of this dataset as well as ground reference classification map are shown in Fig. 1.
Figure 1.Hyperspectral Indian pines image and its ground-truth map.

Kennedy Space Center (KSC)
It is collected over the KSC, Florida, USA, in 1996 (see Fig. 2).This dataset includes a spatial coverage of 512 × 614 pixels with a spatial resolution of 18m and 224 spectral band in the wavelength range from 0.4 to 2.5µm.By removing water absorption and low signal-to-noise bands, the number of spectral bands is reduced to 176.This dataset contains the total number of 5211 samples belonging to 15 classes (see Table 2).

Pavia University Dataset (PU)
It is acquired by the Reflective Optics System Imaging Spectrometer (ROSIS) sensor over the urban area of the University of Pavia, Italy, 2003.This dataset consists of 610 × 340 pixels with a spatial resolution of 1.3m (see Fig. 3).The number of spectral bands is 115 with spectral ranging from 0.43 to 0.86µm.12 noisy bands are removed; the remaining 103 spectral bands are used in the experiment.Table 3 depicts the number of total available pixels in each class in Pavia University dataset.

Experimental Setup
Results of the proposed method are compared with two batch mode state-of-the-art AL algorithms: MUCLU-ABD (Demir et al., 2011) and BvSB (Li and Zhang, 2016), and one well-known spectral-spatial AL technique MPM-LBP-BT (Li et al., 2013).
For a fair comparison, the same spectral-spatial feature selection algorithm for MCLU-ABD, BvSB, and the proposed method based on NSST-KMNF is considered.In this experiment, NSST with five levels decompositions is used.Also, the important shearlet sub-bands that contain 99% of energy are preserved.For, Indian Pines, KSC, and Pavia University, the number of shearing directions at four scales and the number of remaining shearlet sub-bands after applying KMNF are presented in   5.
The effect of the sampling weight parameter λ in ( 6) is evaluated in terms of overall classification accuracy (see Fig. 4).As is shown, the optimal value of λ for each dataset is selected among a range of λ = [0, 0.1, 0.2, ..., 1].The optimal values show that although both uncertainty and diversity criteria have important roles on the obtained classification accuracy, the importance of each of them for each dataset is different.
The AL classification algorithms were implemented in MAT-LAB (R2020a) on a computer with two core processor (2.60 GHz), 40 Gb of memory, and 64-bit operating system.
LIBSVM is adopted to implement SVM.One against all (OAA) SVM with radial basis function (RBF) kernel has been (Chang and Lin, 2011).The SVM hyperparameters C and γ were optimized by applying a grid search according to a threefold cross-validation technique.To obtain better results, these parameters were updated once during the AL iterations.The overall accuracy (OA), average accuracy (AA), Kappa coefficient (Kappa), and classification accuracy of each class are calculated to evaluate the classification performance.

Experimental Results
Figure 5 depicts the overall classification accuracy for three datasets.In this experiment, the initial accuracy achieved by SSAAL is higher than the other techniques for Indian Pines and University of Pavia datasets, while KSC obtains the best accuracy after three iterations.Tables 6-8 summarize the classification accuracy, overall accuracy, average accuracy, Kappa coefficient, and computation time.The classification maps of all methods are shown in Figs.6-8.From Tables 6-8, it can be easily observed that the SSAAL achieves the best performance in terms of AA and OA measures in comparison to MCLU-ABD (Demir et al., 2011), BvSB (Li and Zhang, 2016), and MPM-LBP-BT (Li et al., 2013) methods.In most of the classes, the classification accuracy of SSAAL is higher than the other techniques.The computation time of SSAAL method is lower than MPM-LBP-BT and close to MCLU-ABD and BvSB techniques.
The obtained results show that utilizing the combination of NSST and KMNF in the proposed method can extract spectral and spatial features of HSI with low redundancy, effectively.Additionally, the integration of the extracted features, the operative query function in exploiting the most informative and distinct samples, optimal data augmentation during the training process, and outlier elimination can lead to accurate and consistent classification results, significantly.

CONCLUSIONS AND FUTURE WORK
In this paper, a new augmented AL algorithm for spectral-spatial HSI classification with limited labeled samples has been presented.The proposed method extracts the spatial features using NSST and reduces the dimensionality of the spectral features using KMNF.After that, a multi-criteria batch mode AL method based on a new query function which integrates CMPU uncertainty and ABD diversity criteria is applied to the extracted spatialspectral features to select the most informative ones.
HSI dataset X = {x1, x2, ..., x h }, Number of shearing directions,Batch size (m), Number of iterations (it), Number of clusters (k), CMPU parameters (a,b) I) Extract spectral-spatial features: a) Apply 2D-NSST to each spectral band of HSI.b) Apply KMNF to reduce the spectral dimension.II) Augmented AL process: a) Generate the initial training labeled set L, unlabeled pool U and testing set T .b) Augment L and preserve the original and augmented samples in LA. c) For i=1:it 1: Train LA with SVM.2: For each x ∈ U compute its CMPU value.3: Select n samples from U that have the lowest CMPU.4: Select a batch of m most informative samples from the n(= 10m) samples using query function (6).5: Specify labels to the m selected samples.6: Augment the m obtained samples considering the selected method by two-sample Kolmogorov-Smirnov test.7: Allocate m selected samples and their augmentations to k clusters and remove the outliers considering th.8: Include non-outlier samples into LA.9: Remove the selected samples in step 4 from U .d) end Output: Classification results of HSI Ŷ = {ŷ1, ŷ2, ...

Figure 4 .
Figure 4. Overall classification accuracy versus the sampling weight parameter (λ) The same initial training set (L), unlabeled pool (U), testing set (T), number of iterations (it), and batch size (m) are considered for all AL algorithms.The total samples of each dataset are considered as T. L is constructed by randomly selecting three samples per class and the remaining samples of each dataset are preserved in the U.At each iteration of AL process, m samples from U are selected, they are augmented by DA methods and added to the augmented initial training set LA. Also, the number of clusters k are equal to the number of classes for each dataset plus one.The SSAAL parameter settings of three datasets are demonstrated in Table5.

Figure 5 .
Figure 5. Overall classification accuracy versus the number of training samples.

Figure 8 .
Figure 8. Classification maps obtained by different AL techniques on Pavia University dataset.

Table 1 .
, ŷn} Indian pines dataset: class numbers, class names, and number of observations for each class

Table 2 .
KSC dataset: class numbers, class names, and number of observations for each class

Table 6 .
The determined samples are iteratively trained by SVM.It is note worthy to mention that the selected training samples are augmented using the determined methods by two-sample Kolmogorov-Smirnov test and the existing outliers are removed by k-means clustering.Experiments on three real HSI datasets are performed for validation.Based on the visual and qualitative results, in the proposed method the classification accuracy is significantly enhanced in comparison with three state-of-the-art AL algorithms.Classification accuracies and computation time obtained by different AL techniques on Indian Pines dataset.

Table 7 .
Classification accuracies and computation time obtained by different AL techniques on KSC dataset ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume X-4/W1-2022 GeoSpatial Conference 2022 -Joint 6th SMPR and 4th GIResearch Conferences, 19-22 February 2023, Tehran, Iran (virtual) Classification maps obtained by different AL techniques on KSC dataset.

Table 8 .
Classification accuracies and computation time obtained by different AL techniques on Pavia University dataset.