SMOOTHING PARAMETER ESTIMATION FOR MARKOV RANDOM FIELD CLASSIFICATION OF NON-GAUSSIAN DISTRIBUTION IMAGE

In the context of remote sensing image classification, Markov random fields (MRFs) have been used to combine both spectral and contextual information. The MRFs use a smoothing parameter to balance the contribution of the spectral versus spatial energies, which is often defined empirically. This paper proposes a framework to estimate the smoothing parameter using the probability estimates from support vector machines and the spatial class co-occurrence distribution. Furthermore, we construct a spatially weighted parameter to preserve the edges by using seven different edge detectors. The performance of the proposed methods is evaluated on two hyperspectral datasets recorded by the AVIRIS and ROSIS and a simulated ALOS PALSAR image. The experimental results demonstrated that the estimated smoothing parameter is optimal and produces a classified map with high accuracy. Moreover, we found that the Canny-based edge probability map preserved the contours better than others.


INTRODUCTION
Hyperspectral imaging sensors provide a huge amount of data with rich spatial, spectral and temporal resolution information.These images have attracted much attention in the remote sensing community and have opened the doors to variety of new applications and challenges, which need to employ both spectral and spatial information for accurate data analysis and classification (Camps-Valls et al., 2014).Markov random fields (MRFs) as undirected graphical models are common methods for incorporating both spectral and contextual information (Chen et al., 2010a).They are formulated as the minimization of an energy function which consists of spectral and spatial energy terms and an important term which plays a key controlling role known as smoothing parameter (Aghighi et al., 2014).Larger values of the smoothing parameter result in over-smoothed classified maps and too small values do not fully utilize the available spatial information (Tolpekin and Stein, 2009).Several attempts have been made to estimate the smoothing parameter, which would maximize the image classification accuracy.Derin and Elliott (1987) employed the least squares method, but the performance of their method was limited by the choice of suitable data samples (Derin andElliott, 1987, Tso andMather, 1999).A number of studies have examined heuristic optimization algorithms to estimate the smoothing parameters, such as iterative conditional estimation (Salzenstein and Pieczynski, 1997), genetic algorithms (Tso and Mather, 1999), simulated annealing (Li et al., 2012), and Ho-Kashyap optimization method which was used for automatic weight parameters determination in the context of supervised classification by using training data (Serpico and Moser, 2006).In the recent years, Tolpekin and Stein (2009) demonstrated a new smoothing parameter estimation technique for super-resolution mapping based on class separability, the neighbourhood system size and the configuration of class labels.Then, Li et al. (2012) added local properties in estimating the optimal smoothing parameter and developed their spatial adaptive method.However, both meth-ods introduced another parameter, which was set as a constant empirical value.Moreover, both methods suffered from similar covariance assumptions for the classes.Thus, we proposed a robust smoothing parameter estimation framework to overcome their limitations (Aghighi et al., 2014).One of the limitations of the described methods is that they were developed based on the assumption of the Gaussian class conditional distribution (Aghighi et al., 2014), (Tolpekin and Stein, 2009) and (Li et al., 2012).Although the Gaussian distribution is widely applied in many of image labelling applications, this assumption may not be tenable for remotely sensed mixed pixels (Xu et al., 2005).Moreover, some practical data such as polarimetric synthetic aperture radar (PolSAR) data are non-Gaussian (Doulgeris et al., 2008).
Another common drawback of contextual based classification approaches is that they generate unreliable classification results near edges between the land covers (Moser et al., 2013), or remove the edges of small features.Therefore, several attempts have been made to develop the edge-preserving methods, which have been used during the MRF optimization process.For instance, line processes (Moser et al., 2013) and (Solberg et al., 1996) and adaptive neighborhood systems (Hegarat-Mascle et al., 2007), are both based on the assumption of using an ideal edge map.However, each image band may provide different or even conflicting information based on its wavelength.Therefore, some other methods such as fuzzy no-edge/edge function using Sobel mask (Tarabalka et al., 2010), edge probability map using Canny operator (Aghighi et al., 2014) and graduated increase edge penalty (Yu and Clausi, 2008) were introduced in previous studies.This article presents a novel robust framework for the smoothing parameter estimation which is not dependent on assumptions of a specific statistical distribution of the image data (Section 2.1).This contextually adaptive smoothing parameter estimation method is proposed on the basis of the balance of spatial and spectral energies and the global spatial frequency distribution of a co-occurrence class label.For this purpose, we have introduced a new spectral energy change function and two new concepts called the class label co-occurrence matrix of the categories (CLCMC) and global class label co-occurrence matrix of the categories (GCLCMC), which can all be computed using pairwise coupling of probability estimates from support vector machines (SVM) one-versus-one classification outputs.Furthermore, we compared seven different edge filters to incorporate the edge information into the MRF framework: Canny, Sobel, Roberts, Prewitt, Laplacian of Gaussian, curvature edge indicator, graduated edge penalty.The performance of the proposed method is evaluated using two hyperspectral images collected by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) and the Reflective Optics System Imaging Spectrometer (ROSIS) (Section 3).Finally, conclusions are drawn in Section 4.

PROPOSED METHOD
In the development of our new smoothing parameter estimation framework, we denote an image by where B is a number of spectral channels, and m is a number of pixels.Let Ω = {ω 1 , ω 2 , . . ., ω M } be a set of M thematic classes of interest.The classification task consists in assigning for each pixel Y i a class label l j , yielding the classification map

The Potts MRF model
The optimal classification map L * given the image Y can be generated by solving the maximization problem for the a posteriori probability (MAP) decision rule (1).
where, p(Y | L) is the class-conditional distribution and p(L) is the prior probability distribution.Based on the complexity of (1) which involves the optimization of a global distribution model of the image and due to the equivalence of MRF and Gibbs random field (Duggin and Robinove, 1990), (Geman and Geman, 1984), this optimization problem can be simplified and resolved by minimizing the sum of local posterior energies (2) (Geman and Geman, 1984): In this research, we applied expectation-maximization (EM) algorithm (Levitan and Herman, 1987) to compute the MAP.In Equation (2), U(Y | L) and U(L) denote spectral and spatial energy terms, respectively.The spatial term U(L) is defined by using the Potts model, which penalizes different class labels for neighboring pixels (3): where, j is the class label of a pixel Y j which is a member of the symmetric neighborhood for pixel Y i denoted by N i .In this equation, δ i , j is the Kronecker delta function (δ i , j = 1 i f i = j and δ i , j = 0 i f i = j .In Equation (3), W ( j ) denotes the weight of contribution from pixel j ∈ N i to the spatial energy term and can be modeled as: where, 0 ≤ q < ∞ and ∑ j ∈N i φ j = 1.In these equations, q controls the overall magnitude of weights and consequently the spatial energies; thus, larger values of q leads to smoother solutions (Tolpekin and Stein, 2009).Here, φ j inversely depends to d i , j which is a geometric distance between the pixel i and its spatial neighbors j ( 5) (Li et al., 2012).
where, r is the spatial resolution, η is a normalization constant to ∑ j ∈N i φ j = 1 and g is a power-law index which is assumed to be one.By substitution (3-5) in ( 2) we can write (2) as ( 6).
In order to normalize U(L | Y ), we multiply ( 6) by 1/(1 + q): We call q/(1 + q) smoothing parameter and denote it by λ ; therefore, 1/(1 + q) can be written as 1 − λ .As mentioned, the optimal classified map L * depends on the maximization of the posterior probability (1) or the minimization of local posterior energies (2 or 7), thus the absolute value of U(L | Y ) in ( 7) is not important.From Equation ( 8) and on, U(L | Y ) is referred to as the normalised energy.
The spectral energy U(Y | L) can be computed as ( 9) (Tarabalka et al., 2010): In order to estimate the smoothing parameter, consider that a given pixel i with the true label i = α is assigned to an incorrect class label i = β .Therefore, based on (1) we can infer that: Which is same as ( 11) By substituting the corresponding terms in (11) and solving this inequality equation, we will have the local likelihood energy change ∆U ι αβ and the change of a local prior energy which is simplified as ∆U P αβ = qψ αβ .Furthermore, λ for each pair of classes (α and β ) can be estimated by ( 12) (Aghighi et al., 2014): Due to the assumption of Gaussian class conditional densities, the value of ∆U ι αβ was defined as a Mahalanobis distance using the equal covariance matrix for all the classes in the case of (Tolpekin andStein, 2009, Li et al., 2012); or using the mean of the covariance of each pair of classes in the case of (Aghighi et al., 2014).However, this assumption may not be tenable for remotely sensed mixed pixels (Xu et al., 2005).Thus, in order to avoid the Gaussian distribution assumption we propose a new equation to compute the change in the likelihood energy as: where, P(Y i | i ) can be estimated by pairwise coupling of probability estimates from one-versus-one SVM outputs (Wu et al., 2004).In order to compute ∆U ι αβ , the image pixels are catego-Table 1: Global matrix of ∆U ι αβ for the image's selected pixels rized based on the computed class labels and the pixels of each category are sorted in descending order based on the maximum probability in each pixel.Then, by employing the SVM accuracy assessment results, the minimum value between user and producer accuracies of each class are used to select the proportion of pixels with the higher probability in each category.These selected pixels in each category are assumed to be reliably classified.
For a given pixel amongst the selected pixels of each category, the maximum probability should belong to the P (Y i | i = α) and other probabilities of that pixel belong to Then the means of pixels in each category were computed using ∆U ι αβ (13).In the next step, each mean vector of ∆U ι αβ is normalized and a square matrix of ∆U ι αβ with size M was generated (see Table 1).This matrix is a zero diagonal square matrix, because each element on its diagonal indicates ∆U ι αα is zero.
Then, the class label of the neighbours of each pixel in each category are extracted to compute the CLCMC and GCLCMC, which are square matrices of size M. Let N P be the number of pixels for a class ω i ; since the second order MRF neighboring system is chosen, each pixel Y i is surrounded by N S = 8 pixels Y i .The CLCMC index is calculated for pixels of each class ω i as: CLCMC ω i ,ω j shows the probability of co-occurrence of class ω i with other classes ω j .In this equation, c is the class label of a given pixel (i) of the category and s is the class label of surrounding pixel s.Due to the MRF neighboring concept which says that 1) a site cannot be a neighbor with itself i / ∈ N i ; and 2) the neighborhood relationship is mutual (i ∈ N j ⇐⇒ j ∈ N i ) (Levitan and Herman, 1987), CLCMC is converted to GCLCMC to show the global spatial frequency distribution of each pair of classes in the image: The index of GCLCMC indicates the probability that a given pixel with true label (α) is misclassified as a false label (β ) due to the spatial energy.Therefore, ψ αβ for each pair of classes (ψ αβ ) is the corresponding element of GCLCMC matrix for classes α and β .Although the general concepts of the CLCMC and GCLCMC indexes are similar to the concepts of class label co-occurrence matrix of the blocks (CLCMB) and global class label co-occurrence matrix of the blocks (GCLCMB) which we proposed earlier (Aghighi et al., 2014), for this paper both CLCMC and GCLCMC are normalized and indicate the probability of co-occurrence of class ω i with another classes ω j in each category over the whole image.
In the next step, the global matrix ∆U ι αβ and ψ αβ are used to compute λ αβ for each pair of classes using Eq. ( 12).Finally, the optimized smoothing parameter λ * is computed by averaging the estimated smoothing parameter λ αβ for each pair of classes ( 16) (Aghighi et al., 2014):

Edge preserving
The first step for preserving the edges is to determine the edge locations.Hence, we employed five of the best known edge detection methods including to Canny (Canny, 1986), Sobel (Roushdy, 2006), Roberts (Jain et al., 1995), Prewitt (Prewitt, 1970), Laplacian of Gaussian (Marr and Hildreth, 1980) to extract the binary edge map I i , as well as utilize the different curvature indicators to generate the curvature edge indicator D i in each case (21) (Chen et al., 2010b).Since conflicting information is derived from the different wavelengths in hyperspectral images, I i and D i were computed for each image band (B-band).Then a one-band edge probability map from the B-band image (n b ) is computed by using (17) for the Canny, Sobel, Roberts, Prewitt and Laplacian of Gaussian as well as (18) for different curvature indicators.17) In order to preserve the small structures and edges in the classified map, the spatial energy component (3) can be formulated as (19 or 20) (Aghighi et al., 2014).
where the superscript E refers to the edge probability map and w (i) can be computed using (17 or 18) based on edge detection method and g (∇ s ) can be calculated using (24).

Different curvature indicator
This edge indicator was developed by Chen et al (2010) and called different curvature indicator.It can effectively distinguish edges from areas with flat and ramp intensity distributions in the image data (Chen et al., 2010b).The difference curvature D i for a given pixel i of the image is defined as: where u ηη and u εε represent the second derivation of the gradient ∇u and perpendicular to ∇u, respectively, and |.| denotes the absolute value In these equations, u x and u xx denote the first and second derivation in x, respectively, u y and u yy are the first and second derivation in y, respectively, and u xy indicates the first derivation in y of the first derivation in x.Table 2 summarizes the behaviour analysis of the different curvature edge indicator.

Graduated edge penalty
In order to use the graduated edge penalty, we defined the edge penalty term g (∇ s ) which can be any monotonically decreasing function of edge strength.In this function, the penalty decreases large small large Flat and ramp pixel small small small Noise pixel large large small as the edge strength increases between two groups of pixels assigned to different classes; thus all the edges pixels are not penalized equally.The penalty function is formulated as (Yu and Clausi, 2008): where ∇ s represents the normalized gradient magnitude (∇ s ∈ [0, 1]) as the edge strength measurement on site s.K is a positive value that controls the strength of the edge penalty term.By increasing K to infinity, all the edges penalties are equal to 1; and by gradually increasing K, the edges are penalized differently based on the strength or weakness of the edge gradient magnitude (Yu and Clausi, 2008).In order to calculate ∇ s , Equation ( 25) which is based on the gradient magnitude √ ϑ on site s was used (Yu et al., 2012): where ϑ can be computed using (Lee and Cok, 1991): Each of the variables of ( 26) can be estimated as: In order to compute ϑ , its items can be calculated using: 30) Therefore, p s q s − t s 2 ≥ 0 and p s + q s ≥ λ .Since

EXPERIMENTAL RESULTS AND DISCUSSION
In order to compare the performance of the proposed method with our previous method (Aghighi et al., 2014), the same sets of training and test pixels for each image were selected by the stratified random method (Foody, 2004), which can be found in (Aghighi et al., 2014).In this research, two benchmark hyperspectral datasets and all classes were collected to evaluate the proposed smoothing parameter estimation method.
1) The Indian Pines image: This hyperspectral image was recorded by the AVIRIS sensor from Indian Pines test site which was an agricultural area in North-western Indiana, USA.The image comprises 145 by 145 pixels with 20 m/pixel spatial resolution and 200 spectral bands within a wavelength range of 0.4 to 2.5 m.The reference map contains sixteen classes, namely Alfalfa, Cornnotill, Corn-min, Corn, Grass-pasture, Grass-trees, Grass-pasturemowed, Hay-windrowed, Oats, Soybean-notill, Soybean-mintill, Soybean clean, Wheat, Woods, Buildings-Grass-Trees-Drives and Stone-Steel-Towers.
2) The Pavia University image: The Pavia University hyperspectral image was recorded by ROSIS sensor from Pavia district in north Italy (Figure 2(a)).This dataset comprising of 610 by 340 pixels with spatial resolution of 1.3m and 103 spectral bands.The reference map (Figure 2(b)) contains sixteen classes, namely asphalt, meadows, gravel, trees, painted metal sheets, bare soil, bitumen, self-blocking bricks and shadows.
In order to evaluate the procedure on non-Gaussian data, we also simulated a 210 by 250 pixel test image using fully polarized L-band data from Phased Array type L-band Synthetic Aperture Radar (PALSAR) sensor at the Japanese Advanced Land Observing Satellite (ALOS) (Pantze et al., 2014).The quad polarizations data (HH, HV, VH and VV) was selected to reach the best classification results (Doulgeris et al., 2011).After pre-processing of the data using software package (NEST version 5.0.12)provided by ESA (European Space Agency), some samples of five class, namely water, forest, urban, farm and open field were extracted.Then, they utilized to produce a simulated PALSAR data (Fig. 1(b)), such that each class has at least on boundary to every other classes (Fig. 1(a)) (Doulgeris et al., 2011).The produced data is non-Gaussian distributed due to using high resolution image and highly textured regions (Doulgeris et al., 2011).In this research, the probabilistic one-versus-one SVM classification method was adopted as a nonlinear classifier through the use of Gaussian radial basis function (RBF) kernel for hyperspectral data (Tarabalka et al., 2010), as well as for simulated ALOS PALSAR data (Sambodo and Indriasari, 2014).The optimal SVM parameter C and γ were chosen by fivefold cross validation (Aghighi et al., 2014).The SVMLIB library (Chang and Lin, 2011) was used to estimate the probability of individual classes for each pixel and produce the classification map (see Fig. 1(c) and Fig. 1(c)).Then, the results of SVM classification were used to estimate the smoothing parameter λ * for both hyperspectral dataset and classify the image using MRF (Table 3).The smoothing parameters for simulated PALASAR data were estimated based on the current method and (Aghighi et al., 2014) (Table 4).In order to compare the efficiency of the proposed parameter estimation method with (Aghighi et al., 2014) and (Tarabalka et al., 2010), both hyperspectral datasets were used.Then, the λ * value derived in this paper was employed to manage the contribution of spectral and spatial energies for the both non-edge based MRF method (Fig. 1(d)) and the edge-based MRF (Fig. 1(e)) by using our previous edge probability map (Aghighi et al., 2014) (see Table 5).
The difference between results obtained by applying the proposed method and (Aghighi et al., 2014) are about -0.48 precent and with (Tarabalka et al., 2010) are +2.8 precents.Furthermore, comparisons of the accuracy of maps produced using λ * and those produced by (Tarabalka et al., 2010) show that the overall accuracy of the classification by SVMMRF E for the University of Pavia data have increased by 6.27 percent and 0.1 percent for the Indian Pines data (Table 5).Then we evaluated the statistical significance for each pair of corresponding classification maps in terms of accuracy by using McNemars test with the 5% significance level (Zhang et al., 2011).Due to the calculated χ 2 and Z values, the null hypothesis (H 0 ) of no significant difference between two corresponding maps accuracy values of this paper and (Aghighi et al., 2014) is not rejected for both Indian Pines and University of Pavia images.However, the null hypothesis (H 0 ) of no significant difference between two corresponding maps accuracy values of this paper and (Tarabalka et al., 2010) for the University of Pavia is rejected.
In another experiment for non-Gaussian dataset, we utilized SVMMRF to classify PALSAR simulated image using both estimated smoothing parameters (Table 4).The difference between results derived by λ * and (Aghighi et al., 2014) is about 1.9 percent (Table 6).
We evaluated the statistical significance for both Pines and University of Pavia classified maps in terms of accuracy by using Mc-Nemars test with the 5% significance level (Zhang et al., 2011).
According to the calculated χ 2 and Z values, the null hypothesis (H 0 ) of no significant difference between the classified map accuracy values using λ * and (Aghighi et al., 2014) is not rejected.Moreover, the classification accuracy in percentage for each class of for PALSAR images is presented in Table 7).
As mentioned in Section 2.2, different edge preserving maps are produced using Canny, Sobel, Roberts, Prewitt, Laplacian of Gaussian operators, different curvature indicator and the graduated  (17,18,24).Then, the new edge preserving information was incorporated in the MRF using the spatial energy term (19, 20) to produce the SVMMRF E maps (Table 8).(Zhang et al., 2011) is used to evaluate the statistical significance between SVMMRF E using Canny based edge probability map and other edge preserving method.Due to the calculated χ 2 and Z values, the null hypothesis (H 0 ) of no significant difference between Canny based edge preserving method and others is not rejected for the University of Pavia image.However, the performance of the edge preserving method depends on the datasets, the land cover classes and the size of misclassified regions in the initial pixelwise classification map.

CONCLUSION
In this article, we adress the issue of the automatic smoothing parameter estimation to manage the contribution of the spectral and contextual information in the context of MRF classification.An innovative MRF smoothing parameter estimation method has been proposed, which is not dependent on assumptions of a specific statistical distribution of the image data.This method consists of employing SVM classification to produce the probability of individual classes for each pixel, followed by two new indexes named CLCMC and GCLCMC to estimate the smoothing parameter.In addition, the performance of seven edge preserving techniques using Canny, Sobel, Roberts, Prewitt, Laplacian of Gaussian operators, different curvature indicator and the graduated edge penalty were evaluated.Experimental results have demonstrated that the proposed method can estimate the optimal smoothing parameter for both non-Gaussian and Gaussian distributed data.Furthermore, the edge probability map using Canny filter yields accurate classification maps for both hyperspectral datasets.

∂
y denote the first partial derivatives of the b th univariate band of image Y on site i with respect to vertical and horizontal directions.

Table 2 :
Behaviour analysis of the different curvature edge indicator(Chen et al., 2010)

Table 5 :
Overall accuracy assessment of the classified maps for SVMMRF method with different smoothing parameters and SVM for AVIRIS and ROSIS images

Table 8
(Aghighi et al., 2014)ssification accuracies for all the datasets using seven edge preserving methods.The computed overall accuracies and Kappa coefficients in this research show that SVMMRF E using the edge probability map, in particular the Canny edge method(Aghighi et al., 2014)results in the best overall classification accuracies in these experiments.Then, Mc-Nemars test with the 5% significance level

Table 6 :
Overall accuracy assessment of the classified maps for PALSAR images

Table 7 :
Overall accuracy assessment of the classified maps for PALSAR images Class label SVM (Aghighi et al., 2014) λ *