AN EXTENDED SPECTRAL–SPATIAL CLASSIFICATION APPROACH FOR HYPERSPECTRAL DATA

In this paper an extended classification approach for hyperspectral imagery based on both spectral and spatial information is proposed. The spatial information is obtained by an enhanced marker-based minimum spanning forest (MSF) algorithm. Three different methods of dimension reduction are first used to obtain the subspace of hyperspectral data: (1) unsupervised feature extraction methods including principal component analysis (PCA), independent component analysis (ICA), and minimum noise fraction (MNF); (2) supervised feature extraction including decision boundary feature extraction (DBFE), discriminate analysis feature extraction (DAFE), and nonparametric weighted feature extraction (NWFE); (3) genetic algorithm (GA). The spectral features obtained are then fed into the enhanced marker-based MSF classification algorithm. In the enhanced MSF algorithm, the markers are extracted from the classification maps obtained by both SVM and watershed segmentation algorithm. To evaluate the proposed approach, the Pavia University hyperspectral data is tested. Experimental results show that the proposed approach using GA achieves an approximately 8% overall accuracy higher than the original MSF-based algorithm.


INTRODUCTION
Hyperspectral imaging concerns the measurement, analysis, and interpretation of spectral imagery acquired from either a given scene or a specific object by satellite, airborne, terrestrial, or laboratory sensor, over visible, infrared and sometime thermal spectral regions of electromagnetic spectrum (Shippert, 2004).Recent technological improvements in spatial, spectral, and radiometric characteristics of spectrometer imagers beget the need of developing new methods for land cover mapping.There are two major approaches for classification of hyperspectral images: the spectral or pixel-based and the spectral-spatial approaches.While the pixel-based techniques, such as the classic Maximum Likelihood or Support Vector Machines (SVM) classifiers, primarily emphasize the independence of pixels, the spectral-spatial frameworks such as Geographic Object-Based Image Analysis (GEOBIA) (Blaschke et al., 2014) or Minimum Spanning Forest (MSF) (Tarabalka et al., 2010a) classifiers employ both the spectral characteristics and the spatial context of the pixels.Many researchers have demonstrated that the use of spectral-spatial information improves the classification results, compared to the use of spectral data alone, in hyperspectral imagery (Plaza et al., 2009;Li et al., 2010;Fauvel et al., 2012;Heras et al., 2014;Xu et al., 2014).
Segmentation techniques are powerful means for defining the spatial dependences among the pixels and for finding the homogeneous regions in an image (Gonzalez and Woods, 2002;Chen et al., 2012).The advantages of using segmentation for distinguishing spatial structures from one another are also discussed in (Tarabalka et al., 2009;Tarabalka et al., 2010;Bitam and Ameur, 2013).An alternative way in order to improve the accuracy of segmentation is performing a marker-based technique (Soille, 2003;Tarabalka et al., 2010).The idea behind this approach is selecting of one or several pixels for every spatial object as the seed or a marker of the corresponding region.Marker-based segmentation considerably decreases the oversegmentation.As a result, it leads to more reliable accuracies.
For classification of the high-dimensional hyperspectral images, there is a major problem.The problem is due to the high number of spectral channels and the relatively small number of labels samples.Many algorithms have been reported to be effective in reducing the dimensions of input space and achieving better performance, such as principal component analysis (PCA) (Srivastava and Liu, 2005;Saegusa et al., 2004), independent component analysis (ICA) (Zheng et al., 2006), maximum noise fraction (MNF) (Green et al., 1988) unsupervised feature extraction methods, decision boundary feature extraction (DBFE) (Lee and Landgrebe, 1993), discriminate analysis feature extraction (DAFE) (Landgrebe, 2003), and nonparametric weighted feature extraction (NWFE) (Kuo and Landgrebe, 2004) supervised feature extraction methods.
In this paper, we propose an extended spectral-spatial classification approach based on subspace analysis of hyperspectral remote sensing data.In the proposed approach, three different methods are employed to extract the optimal hyperspectral features, including unsupervised and supervised feature extraction, and the genetic algorithm (GA).Afterwards, the enhanced marker-based MSF (MMSF) spectral-spatial algorithm is used to classify the optimal features.In the enhanced MMSF algorithm to select markers the pixels related to a given class with the largest population are kept for each region of the segmentation map.Finally, the most reliable labelled pixels are selected among the existing pixels for each region as the markers.Therefore, it benefits from a segmentation algorithm to integrate the spatial information into the marker selection process.

LITERATURE REVIEW
Automatic marker selection has been previously used in the literature mostly for the greyscale and colour images.Markers are often chosen by searching the flat zones (i.e. the connected components of pixels with a constant gray level value) or the zones of homogeneous texture (Soille, 2003).Noyel et al. (2007;2008) performed classification of hyperspectral image using different methods, such as Clara (Kaufman and Rousseeuw, 1990) and linear discriminate analysis (Duda et al., 2001) and then filtered the classification maps, using mathematical morphology operators, for selecting large spatial regions as markers.In (Tarabalka et al., 2010a) an efficient approach is proposed for spectral-spatial classification using the MSF grown from automatically selected markers.It uses a pixel-wise SVM classification, in order to select pixels with the highest probability estimate to each class, as markers.In this approach, a connected components labelling is, first, applied on the classification map.Then, the markers are considered to be p% of the pixels with the highest probability estimate for large regions, and pixels with an estimated probability higher than a pre-defined threshold for small regions.The disadvantage of this approach is that it does not employ the spatial or neighbouring information in marker selection process.
By summarizing the exiting literature about subspace extraction, it can be found that it always focuses on the pixel-based classification, without considering the spatial relationship of neighbouring pixels.Recent studies show that the exploitation of spatial information is necessary for classification of hyperspectral imagery, but few such approaches have been proposed (Huang and Zhang, 2009), which is partly due to the high dimensionality of the data and the spectral and spatial heterogeneity of remote sensing images (Duarte-Carvajalino, 2008).

THE PROPOSED APPROACH
The flowchart of the proposed spectral-spatial classification approach is presented in Figure 1.As shown, the proposed approach consists of two blocks: (1) Subspace extraction for pre-processing: it aims to reduce the dimensionality and extract the spectral subspace from hyperspectral data.For this purpose, three different techniques are employed, including unsupervised and supervised feature extraction, and GA.In the unsupervised feature extraction method, we use PCA, ICA and MNF algorithms.The DBFE, DAFE and NWFE algorithms are used in the supervised feature extraction method.The GA is a general adaptive optimization search method based on a direct analogy to Darwinian natural selection and genetics in biological systems (Huang and Wang, 2006).It starts from an initial population which is composed of a set of possible solutions called individuals (chromosomes), and then evaluates the quality of each individual based on a fitness function.We use the accuracy parameters of SVM classification obtained from the training samples subset as the fitness function.The fitter solutions have a better chance to survive or reproduce in the next generations.The population during consecutive generations evolves to be fitter in the problem's conditions.Selection, crossover, and mutation are the main GA's operators to reproduce future generations.The evolutionary process will not stop until termination conditions satisfy (Zhuo and Zheng, 2008).
(2) The enhanced MMSF spectral-spatial classification of the spectral subspace: the SVM and the watershed segmentation algorithm are first used, in parallel, for classification and for segmentation of the spectral subspace extracted, respectively.Watershed segmentation is an efficient morphological algorithm for image segmentation.It combines both region growing and edge detection techniques (Vincent and Soille, 1991).Afterwards, all the pixels related to the class with the largest population are kept for each region of segmentation map (see Figure 2).Lastly, the most reliable labelled pixels are selected among the existing pixels for each region as markers.The markers are then used to build the MSF.

EXPERIMENTAL RESULTS AND DISCUSSIONS
The Pavia University hyperspectral image is used for our experiments.It is collected by ROSIS-03 sensor over Pavia city.

Experiment results
In this study, to perform PCA, NWFE and DBFE algorithms, we kept the spectral subspace with the total variance more than 90%.
The MNF eigenimages with near-unity eigenvalues can be viewed as noise-dominated features and hence removed.It should be noted that the maximum N-1 dimensional subspace is available for DAFE since the maximum rank of ∑  , a betweenclass scatter matrix , is N-1 for an N-class classification problem (Landgrebe, 2003).Table 2 presents the value of parameters used in GA.Table 2.The GA's Parameters for dataset used.
In experiments, the Gaussian radial basis function (RBF) kernel is used for the SVM classifier (Camps-Valls and Bruzzone, 2005).The RBF kernel's parameters, i.e.C and  , are chosen by a five-fold cross validation.In order to create a map of markers in enhanced MMSF algorithm, as mentioned in section 3, first, for each region of segmentation map, all the pixels related to the class with the largest population are kept.Then, if it contains more than 40 pixels, 9% of its pixels with the highest estimated probability are selected as the marker.Otherwise, the region marker is formed by the pixels with an estimated probability higher than a threshold .The threshold  is equal to the lowest probability within the highest 6% of the probabilities for the whole image.In the next step, the image pixels are grouped into the MSF using the spectral angle dissimilarity measure built from the selected markers ( Van der Meer, 2006).In order to compare the results of the proposed approach we have implemented the original and enhanced MMSF algorithms on all image bands.In the original MMSF algorithm the labeling of connected components is performed using the eight-neighborhood connectivity.For each connected component, if it contains more than 20 pixels, 5% of its pixels with the highest estimated probability are selected as the marker for this component.
Otherwise, the region marker is formed by the pixels with an estimated probability higher than 2%.
The accuracies of the classification maps are generally assessed by the overall accuracy (OA), the Kappa coefficient of agreement (κ), and the class-specific producer's accuracy (PA).The OA is the percentage of correctly classified pixels, the κ is the percentage of agreement corrected by the amount of agreement that can be expected due to chance alone, and the PA is the percentage of correctly classified samples for a given class.

Discussions
In the Pavia University dataset, a total of 3921 and 40002 pixels were available as training and test data, respectively (see  Table 4 presents the number of features and accuracy values estimated for different methods.From the results, it can be found that all subspaces gave higher accuracies than the Original-MMSF that uses 104 channels.Also, in the unsupervised and supervised feature extraction methods, PCA and DAFE algorithms provide the best results, respectively.However, as can be seen from Table 4, the best results are achieved using GA subspace features.The kappa coefficient of GA-MMSF algorithm is approximately 9% higher than the Original-MMSF algorithm.
In Table 4, all the class-specific producer's accuracy rates for the proposed GA-MMSF algorithm are higher than Original-MMSF approach.

CONCLUSION
In this study, an improved spectral-spatial classification approach for hyperspectral images based on the subspace analysis techniques has been proposed.The subspace analysis techniques are used to reduce the computational cost, since the spectralspatial classification is time-consuming and unacceptable for hyperspectral data with hundreds of channels.On the other hand, subspace analysis is able to reduce the information redundancy in hyperspectral data as the huge spectral channels are highly correlated.Therefore, we proposed to integrate the subspace analysis and spectral-spatial classification for hyperspectral image interpretation.In the proposed approach, the dimension reduction of hyperspectral images is accomplished using three different approaches  including unsupervised and supervised feature extraction methods and the GA.The enhanced MMSF spectral-spatial algorithm is then used to classify extracted features.In the proposed MMSF algorithm, the corresponding pixels of each class with the largest population for each region of watershed segmentation map are kept.Then, the most reliable labelled pixels are selected as markers and used to build the MSF.Experimental results show that the subspace images are effective in extracting spectral information from the hyperspectral data.
The proposed methodology was able to take advantage of spectral and spatial information simultaneously for accurate classification of hyperspectral image.Further work is needed to improve the proposed approach.It is, for example, necessary to take advantage of the available data in order to automate the entire classification process.

Figure 1 .
Figure 1.Schema of the proposed approach.

Figure 2 .
Figure 2.An example of the interference segmentation map in SVM classification map.

Figure 3
Figure3shows the color composite image, reference data and the classification maps of Original-MMSF, Enhanced-MMSF and the proposed approach.In this figure, the classification map obtained by GA-MMSF contains much more homogeneous regions when compared with the maps obtained by other algorithms (see Figure3(k)).

Table 1
describes the main characteristics of this dataset.

Table 1 .
The main characteristics of the dataset used.

Table 3 .
The training and test samples.

Table 4 .
The number of features and accuracy values obtained.