A NEW SPECTRAL-SPATIAL SUBSPACE CLUSTERING ALGORITHM FOR HYPERSPECTRAL IMAGE ANALYSIS

In the past decade, hyperspectral imaging techniques have been widely used in various applications to acquire high spectral-spatial resolution images from different objects and materials. Although hyperspectral images (HSIs) are useful tools to obtain valuable information from different materials, the processing of such data is challenging due to several reasons such as the high dimensionality and redundancy of the feature space. Therefore, advanced machine learning algorithms have been developed to analyse HSIs. Among the developed algorithms, unsupervised learning techniques have become popular since they are capable of processing HSIs without having prior knowledge. Generally, unsupervised learning algorithms analyse HSIs based on spectral information. However, in many applications, spatial information plays an eminent role, in particular when the input data is of high spatial resolution. In this study, we propose a new clustering approach by utilizing the sparse subspace-based concept within the hidden Markov random field (HMRF) structure to process HSIs in an unsupervised manner. The qualitative analyses of the obtained clustering results show that the proposed spectral-spatial clustering algorithm outperforms the sparse subspace-based clustering algorithm that only uses spectral information.


INTRODUCTION
Recent advances in airborne hyperspectral imaging technology allow users in different applications, to acquire very fine spectralspatial resolution data in a wide range of the spectrum. Usually, spectral channels (also known as bands) of hyperspectral images (HSIs) cover a spectrum range from the VNIR ( 0.4 -1 µm), SWIR ( 1 -2.5 µm), to the LWIR ( 8 -13 µm). Spectral bands usually contain valuable information which can be used to detect and track different objects that are being investigated (Goetz et al., 1985, Ghamisi et al., 2017. However, the analysis of hyperspectral data is a challenging task since such data contain a high number of bands. Additionally, depending on the sensor's characterization, the pixels of captured HSIs can be highly mixed. Therefore, the analysis procedure of HSIs tends to be more complicated (Iordache et al., 2011, Koirala et al., 2019. In order to overcome the aforementioned problems, several machine learning algorithms (e.g., supervised and unsupervised) have been proposed to analyse the hyperspectral images. Among the proposed algorithms, unsupervised learning techniques have become a useful tool in particular when there is no training data available in the analysis procedure (You et al., 2018). In the computer vision field, subspace-based clustering algorithms were designed to cope with the high dimensionality and mixture problems of HSIs. In such algorithms, the high dimensional data can be projected into a union of lower dimensional subspaces. In this way, each data point of the hyperspectral images is assigned to a specific subspace (cluster) (Elhamifar, Vidal, 2013).
One of the widely used subspace clustering algorithms is sparse subspace clustering algorithm (SSC) (Zhang et al., 2016, You et al., 2018, Hinojosa et al., 2018, Shahi et al., 2019. Sparse subspace clustering algorithm (SSC) was introduced in 2013 by Elhamifar and Vidal., (Elhamifar, Vidal, 2013). In SSC, the above-mentioned problems (e.g., high dimensionality and highly mixed pixels) can be addressed to some extent. The main idea of SSC is to write down each data point as a linear combination of other data points from the same subspace which is also known as the Self-expressiveness property of the data. In SSC, the entire data is used to cluster the HS data into meaningful groups. Nevertheless, using the entire data to calculate the sparsest coefficients demands a complex computational process and can be extremely time-consuming. Since 2013, the concept of SSC has been explored in more details in several studies (Elhamifar et al., 2015, Wu et al., 2015, You et al., 2018. In (You et al., 2018), exemplar-based subspace clustering algorithm (ESC) is proposed. ESC considerably reduces the time and computational process by searching for a subset of HS data and representing the whole data based on this subset. Recently, the effect of spatial information on SSC was studied (Zhang et al., 2016, Hinojosa et al., 2018. In (Zhang et al., 2016), the obtained results show that their proposed spectralspatial subspace clustering algorithm performs well compared to the original SSC algorithm. Nonetheless, the proposed spatialspectral sparse subspace-based clustering algorithms use the entire data points of hyperspectral images in order to calculate the sparse representation matrix. In addition, to our knowledge, the effect of spatial information in the ESC algorithm has not been investigated so far.
There are several approaches to include spatial and contextual information in segmentation and classification problems (Zhang et al., 2001, Benediktsson et al., 2005, Dalla Mura et al., 2011, Ghamisi et al., 2014, Khodadadzadeh et al., 2014, Benediktsson, Ghamisi, 2015, Ghamisi et al., 2018. One of the approaches, which has been successfully used in the literature, is the hidden Markov random field (HMRF) (Zhang et al., 2001, Ghamisi et al., 2014. In (Zhang et al., 2001), authors intro-duced HMRF as a probabilistic model to segment magnetic resonance images by incorporating spatial information. Theoretically, HMRF is a special case of Markov random fields (MRFs). Based on MRFs' idea, there is a higher chance that a pixel has the same class label as of its neighbourhood pixels compared to the other pixels in hyperspectral images (Li et al., 2012, Ghamisi et al., 2014. However, an HMRF is different from an MRF in which, an HMRF is defined based on a pair of random variables (i.e., the clustering result, and HSIs), while an MRF is merely defined based on the clustering result (Zhang et al., 2001).
In this paper, we propose a new spectral-spatial subspace-based clustering algorithm to analyse hyperspectral images. The proposed algorithm consists of two phases; the first phase is utilized to capture spectral information using an advance sparse subspace clustering algorithm (i.e., ESC). ESC is able to handle high-dimensional and highly-mixed data in an efficient manner. The second phase of the algorithm is based on the use of the HMRF algorithm to capture the spatial information of hyperspectral images. In the HMRF, Maximum a posteriori (MAP) and expectation-maximization (EM) algorithms are used to iteratively estimate class labels and set of parameters, respectively. In addition, the edge-preserving step is employed to avoid HMRF to over smoothing the clustering results.
The rest of the paper is organized as follows. In Section 2, the proposed algorithm is elaborated in details. Afterwards, the experimental results are presented and discussed in Section 3. Section 4 is devoted to the conclusions and remarks.

METHODOLOGY
The following section is allocated to describe the proposed spectralspatial subspace clustering algorithm. In addition, the workflow of the proposed algorithm is presented in Figure 1.

Notation
Let us assume that the hyperspectral images can be presented where mi ∈ L, and L = [1, 2, 3, ..., l ] which L is the class label indices.

Sparse subspace-based clustering algorithms
In order to explain the sparse subspace clustering algorithm (SSC), it is better to start with the sparse dictionary learning concept. In which, we can formulate the dictionary learning problem as follows ∈ R F ×P is the spectral dictionary, and P is the number of atoms. C = [c1, c2, ..., cN ] ∈ R P ×N is the coefficient matrix that can represent a collection of data points. In order to extract the coefficients, one can write each data point of X as a linear combination of some points from D. However, obtaining a spectral dictionary D is a challenging task in most applications. Therefore, self-expressiveness property is introduced by Elhamifar et al.,, in (Elhamifar, Vidal, 2013). The aforementioned concept implying that each data point can be presented as a linear combination of other data points from the same subspace. The optimization problem in equation (1) can be reformulated as below where diag(C) = 0 is a constraint that prevents adding a data point to its linear combination from the other data points in the same subspace. Alternating direction method of multipliers (ADMM) can be employed to solve the above-mentioned minimization problem. For more details on ADMM solver, we refer you to (Boyd et al., 2011, Benzi, Olshanskii, 2006. As a next step, after the sparsest coefficient calculation, the similarity graph needs to be computed through W = |C| + |C| T . Such a symmetrisation on W shall be carried on, to assure that all points from the same subspace are connected. In the end, spectral clustering can be applied on W to cluster the data points. However, SSC has difficulties to perform efficiently on the large data sets (You et al., 2018).

Exemplar-based subspace representation (ESC)
In the ESC algorithm, a subset of samples is chosen as atoms for a spectral dictionary. In ESC, a farthest first search (FFS) algorithm is used to select atoms for the dictionary. The FFS is designed to work based on minimizing the self-representation cost. The first sample in the FFS is randomly chosen. Consequently, the remaining samples in the subset are selected based on the first random initialization (You et al., 2018). Also, the number of atoms are predefined. Therefore, the optimization problem in ESC can be written as where X0 ≡ [x1, x2, x3..., xP ] ∈ R F ×P is a subset of selected samples from X. λ is a trade-off parameter between the penalty and the fidelity term. Since X0 is used as the spectral dictionary to solve the minimization problem in (3), the analysis of hyperspectral images is significantly decreased. However, in ESC, there is no spatial information included while in most cases, spatial information can be used to improve the segmentation/clustering results.

Hidden Markov random field (HMRF)
One of the algorithms that incorporates spatial information in the segmentation/clustering algorithm is the HMRF. HMRF is applicable to 2D/3D images since it is based on hidden Markov random models (HMMs). More specifically, in HMRF, the main process is based on Markov random field rather than Markov random chains which are constrained to 1D images. Therefore, HMRF becomes a desirable segmentation method to retrieve spatial information from images (Zhang et al., 2001). Basically, HMRF is a modified version of finite mixture model (Ghamisi et al., 2014). Therefore, to have a comprehensive understanding of HMRF, we start with the finite mixture which can be summarized in the following two steps: 1) Computing the joint probability of m and z depending on a set of parameters Θ as follows where Θ = [θ l , l ∈ L] is a set of parameters which needs to be estimated. In equation 4, f (.) is a probability density function (pdf) with parameter θ l = (µ l , σ 2 l ), where µ and σ are, respectively, the mean and standard deviation of xi with class label l. In addition, ωm i = P (mi = l) is a conditional probability mass function (PMF) for the class label l that is assigned to each pixel mi.
2) The marginal distribution of Zi = z depending on Θ can be formulated as The above-mentioned model is known as the finite mixture model (FM). Although, FM is used in order to have exploratory statistical analyses on data, it has a lack of including spatial information in its model structure (Zhang et al., 2001). Therefore, in (Zhang et al., 2001) HMRF is proposed by Zhang et al., to capture the spatial structure of the data through including the information of surrounding pixels of each pixel. In this paper, it is assumed that each pixel can maximum have four neighbours which are denoted by mN i . Thus one can rewrite the equation (5) as where p(l, mN i ) is the conditional PMF on the class label l. However, to fitting the HMRF model, there are initial and iterative estimation procedures which are needed to be taken into account as below 1) The initial clustering labels M (0) and the initial set of parameters (Θ (0) ) are needed to be estimated. Here, we propose to use an advanced sparse subspace-based clustering algorithm ESC to estimate the initial labels and parameters. Consequently, two iterative procedures are run to simultaneously update the class labels and the set of parameters using Maximum a posteriori (MAP) and expectation-maximization (EM). MAP estimation can be formulated as the following optimization problem: where which is the fit term of the model, and U (m) = c∈C Vc(mi, mj) is the penalty term to have the spatial information. In addition, Vc(mi, mj) is known as clique potential, and C is the set of all possible cliques. Vc(mi, mj) can be calculated as Vc(mi, mj) 2) The second initial and iterative procedure is to estimate the parameter θ. The initial θ (0) is calculated based on the initial clustering result M (0) . To iterate the estimation procedure of θ, we used an expectation-maximization (EM) algorithm as below where k is the number of iterations, and the EM functional can be defined as Q(θ|θ k ) = E[logp(m, z|Θ), z|θ (k) ]. The above-mentioned maximization problem is used to maximize the EM functional. For more detailed explanation we refer you to (Zhang et al., 2001).
As the last step, an edge detection algorithm is employed to preserve the edges, and to prevent HMRF algorithm to over smooth the clustering results. Principle component analysis (PCA) is applied to transform the data. The first PC is used to extract the edges, and for the edge detection technique, Canny edge detection technique is applied (Canny, 1986). Therefore, one can rewrite the MAP minimization problem as follows where b ∈ B, B is the binary output of the edge-preserving step. If bi = 0, it means that the ith pixel is not located at edge, otherwise bi = 1.

Data acquisition
The hyperspectral image data set was acquired during a field campaign on sites in central Finland during September 2018. An unmanned aerial vehicle (UAV) was deployed to scan surfaces and obtain HSIs for mineral exploration. The used HSIs for this study were acquired over an outcrop of the Archean Siilinjärvi glimmerite-carbonatite complex, where is currently mined for large apatite occurrences used in fertilizer production (O'Brien et al., 2015). Here, the hyperspectral image scenes were collected during two UAV flights over two rocky outcrops, roughly 260 m apart. A hyperspectral frame-based camera (0.6 Mp Rikola Hyperspectral Imager) was deployed on a hexacopter UAV (Aibotix Aibot X6v2) that flew over targets along with a pre-programmed stop-scan-motion flight plan to capture a complete set of HSIs for the subsequent image mosaicking. Flight altitudes were 40 m (Scene No.1, 13.5 x 23.5 m) and 30 m (Scene No.2, 10.5 x 16 m) for the two flights, both taken during sunny and low-wind conditions. Surface samples with a handheld spectrometer and rock specimen were taken for ground validation in parallel during the UAV flights. Additionally, ground

Data description
Out of multiple HSI scenes, two scans encompassing a mixture of surface features, such as different rocks types, water and vegetation were selected. Scene No.1 and No.2 both feature 50 images bands, covering the electromagnetic spectrum between 504-900 m, taken with image integration times of 1-2 s and final image pixel resolutions of 2.6 cm and 2.2 cm, respectively. Pre-processing as well as geometric and radiometric data corrections were achieved with a Python-based toolbox MEPHysTo (Jakob et al., 2017), using workflows tailored for UAV-borne HSIs. For more details on pre-processing of acquired hyperspectral data, we refer the readers to (Jakob et al., 2017).
Calibration to reflectance was done with ground-based PVC panels of white, grey and black shade and with known spectral signature, using the empirical line method (Adão et al., 2017, Poncet et al., 2019, which showed promising results for UAV campaigns centring around geologic mapping (Jackisch et al., 2019). The principal lithological surface composition of our scenes are carbonatite veins that occur in glimmerite host-rock, surrounded by clay, soil, low vegetation, small water ponds and debris. The whole surface appears as rather dark and flat.

Qualitative analyses of clustering results
In order to evaluate the performance of the proposed spectralspatial clustering algorithm, K-means, ESC, and subspace-based HMRF are applied on the two real hyperspectral image data sets. The number of class labels l is defined by a geologist. For the scene No.1, and scene No.2, seven and six number of classes are defined respectively. In addition, we use the default parameters that are proposed by You et al., and Ghamisi et al., for both ESC and HMRF respectively (You et al., 2018, Ghamisi et al., 2014. However, in order to choose the number of samples in ESC, we propose to use a close number of samples to the mineral targets in each scene. As it can be observed from the obtained clustering results in Figures 4-5, primary features of interest are the centi-to-decimeter wide carbonatite veins inside the glimmerite bodies. With a vertical NS striking trend of the veins, they follow the main direction of the regional carbonatite-glimmerite intrusion. Evidence of shearing and folding of rock units can be visually identified on the RGB, as well as in the classified scenes. The NS trend of the carbonatite is visible in Figure 2, were a broad feature is observed, while the folding of the carbonatite class is apparent in Figure 3. Improvement of quality between ESC and subspace-based HMRF can be qualitatively evaluated by comparing both. Arguably, the water body in the top of Scene No.1 is not completely captured by ESC, and carbonatite is featured as thin veinlets, that not fully represent this unit. Subspace-based HMRF, on the other hand, encloses the carbonatite fully, but tends to over-estimated its occurrence in Scene No.1, while in Scene No.2 clustering compares better to the real geological structures.

Quantitative analyses of clustering results
In order to have a quantitative comparison between the performance of the studied clustering algorithms, ground truth data points are produced for each scene which are shown in Figures 2-3. The performance of each clustering algorithm is evaluated using overall accuracy (OA), average accuracy (AA), and kappa coefficient. The obtained quantitative results on scene No.1 is presented in Table 1. As can be seen in Table 1, our proposed algorithm (OA: 70.45) significantly outperforms the ESC (OA: 50.32) and K-means (OA: 34.32) algorithms. In scene No.1, although K-means performs well to capture cluster A and B, it is not capable of capturing cluster E and F at all. ESC classifies cluster F and G better than K-means and our proposed method. In the second scene, the proposed algorithm (OA: 79.35) outperforms ESC (OA: 57.23) and K-means (OA: 33.48). The quantitative assessment on the second scene shows that the proposed algorithm classifies all the clusters better than its competitive clustering algorithms.

CONCLUSION AND REMARKS
In this paper, a new spectral-spatial subspace-based clustering algorithm for hyperspectral image analysis was proposed. Due to the presence of high dimensionality and highly mixed pixels in hyperspectral data, an advanced sparse subspace clustering algorithm, the so-called ESC, is used. However, ESC does not consider spatial information, and it is more likely to happen that, surrounding pixels have the same class label as the central pixel. Therefore, we proposed to fuse ESC with the HMRF algorithm that extracts spatial information. The qualitative and quantitative analyses of the obtained clustering results show that the proposed subspace-based HMRF algorithm performs well in terms of grouping the geological features compared to ESC and K-means algorithms. As the future work, we will investigate the influence of the optimization parameters in ESC and HMRF. Besides, the proposed framework will be tested and compared with different subspace clustering algorithms. Moreover, the proposed method will be applied on different data sets.