SUPERPIXEL SEGMENTATION FOR POLSAR IMAGES WITH LOCAL ITERATIVE CLUSTERING AND HETEROGENEOUS STATISTICAL MODEL

Superpixel segmentation has an advantage that can well preserve the target shape and details. In this research, an adaptive polarimetric SLIC (Pol-ASLIC) superpixel segmentation method is proposed. First, the spherically invariant random vector (SIRV) product model is adopted to estimate the normalized covariance matrix and texture for each pixel. A new edge detector is then utilized to extract PolSAR image edges for the initialization of central seeds. In the local iterative clustering, multiple cues including polarimetric, texture, and spatial information are considered to define the similarity measure. Moreover, a polarimetric homogeneity measurement is used to automatically determine the tradeoff factor, which can vary from homogeneous areas to heterogeneous areas. Finally, the SLIC superpixel segmentation scheme is applied to the airborne Experimental SAR and PiSAR L-band PolSAR data to demonstrate the effectiveness of this proposed segmentation approach. This proposed algorithm produces compact superpixels which can well adhere to image boundaries in both natural and urban areas. The detail information in heterogeneous areas can be well preserved. * Corresponding author


INTRODUCTION
Object-based segmentation and classification are promising in remote sensing field, which significantly outperform the pixelbased image processing (Niu and Ban 2013;Ban and Jacob 2013).Therefore, object generation plays a key role in this kind of processing, where the images are segmented into many homogeneous regions.A superpixel is defined as a local region which preserves most of the object information and well adheres to the object boundaries (Xiang et al. 2013).To better preserve the polarimetric and statistical characteristics of the images and also overcome the influence of speckle noise in the meantime, superpixel generation and segmentation with regular size and shape seem promising for PolSAR data.Until now, numerous superpixel segmentation algorithms have been proposed for optical images, among them, the simple linear iterative clustering (SLIC) (Achanta et al. 2012) method is popular and shows good performance in superpixel generation.In contrast, there are very few superpixel generation and segmentation approaches proposed for SAR and PolSAR images.Xiang et al. (Xiang et al. 2013) developed a novel superpixel generation algorithm based on pixel intensity and location similarity, which modified the similarity measure of SLIC to make it applicative for SAR images.For PolSAR data, Liu et al. (Liu et al. 2013) incorporated the revised Wishart distance and edge map into the Normalized cuts algorithm to produce superpixels.On the basis of SLIC, Feng et al. (Feng, Cao, and Pi 2014), Song et al. (Song et al. 2015), and Qin et al. (Fachao, Jiming, and Fengkai 2015) utilized the symmetric revised Wishart distance, Bartlett distance and revised Wishart distance respectively as the similarity measures instead of the original one to generate superpixels.It can be seen that these methods are all designed based on the assumption of Wishart distribution, which can well describe the backscatters of natural areas.However, for heterogeneous urban areas, this assumption is usually violated (Wenjin, Huadong, and Xinwu 2015;Soergel 2010), making the superpixels not well adhere to urban boundaries and preserve the polarimetric features.In the meantime, it is quite difficult to find a particular distribution to describe the backscatters of urban areas since they are extremely complex (Wenjin, Huadong, and Xinwu 2015).Therefore, superpixel generation and segmentation for PolSAR images, especially in heterogeneous urban areas, still remains unsolved.Another drawback of these methods lies on the nonadaptive selection of trade-off factor, which balances the polarimetric similarity and spatial proximity while simultaneously provides control over the shape and compactness of superpixels.This parameter is usually set manually to a constant value by trial and error, which might not be suitable in some areas.
Recent studies show that the higher scene heterogeneity leads to non-Gaussian clutter modelling (Vasile et al. 2010).The spherically invariant random vector (SIRV) proposed by Yao (Yao 1973), is a sort of non-Gaussian processes with random variance and has been already used in PolSAR data classification (Doulgeris, Anfinsen, and Eltoft 2008) and segmentation (Bombrun et al. 2011).Inspired by SLIC and the SIRV product model, this paper proposes an adaptive superpixel segmentation algorithm (Pol-ASLIC) for PolSAR images.The key points of this approach are the definition of pixel similarity measure and the adaptive setting of trade-off factor.The edge map is also essential to the final superpixel generation.The main contributions of this paper are listed as follows: 1) We utilize a new edge detector based on the SIRV model to detect PolSAR image edges.Traditional CFAR detector is designed on the basis of Wishart distribution and cannot well detect the edges in complicated scenes.In contrast, our method can get a better edge map, especially in heterogeneous urban areas.2) We define an effective similarity measure which contains multiple cues including polarimetric, texture, and spatial information for superpixel generation in PolSAR images.Therefore, more local information can be considered.3) We propose an adaptive trade-off factor to control the shape and compactness of superpixels.Similar to (Fengkai, Jie, and Deren 2015), the equivalent number of looks (ENL) and new edge maps are combined to produce an effective polarimetric homogeneity measurement, which is then incorporated into the trade-off factor, making it flexible in homogeneous and heterogeneous regions.Thus, the number of parameters in traditional SLIC reduces to only one, i.e., the superpixel number.Furthermore, there are very few over-or under-segmentation in the final superpixel map compared to other methods.

Parameter Estimation
In the SIRV product model, the m -dimensional complex measurement k is defined as (Vasile et al. 2010) where z is an independent complex circular Gaussian vector with zero mean and normalized covariance matrix † E{ } = Μ zz .
The superscript † denotes the conjugate transpose operator.τ is a positive random variable and its PDF is not explicitly specified.For PolSAR data, the normalized covariance matrix Μ characterizes polarimetric diversity while the random variable τ can be considered as spatial texture, which represents the randomness of spatial variations and only affects the scattering power.The PDF of k is denoted as (Vasile et al. 2010) † 1 where N represents the number of independent samples used in the estimation.For PolSAR applications, the target vector k can be formed using the linear bases as here HH VV , S S and HV S are the elements of complex scattering matrix, T is the transpose operator.Therefore, in this paper, m equals to 3. For a given Μ , the texture estimator ˆi τ can be obtained by maximizing the log likelihood function of (2) like (Vasile et al. 2010) Replacing i τ in (2) with (4), we can obtain the maximum likelihood estimator of normalized covariance matrix as (Vasile et al. 2010) which can be obtained by a recursive algorithm as 1 ˆ( ).
6) Note that the convergence of (6) can be assured with an arbitrary initialization of ˆi C (Vasile et al. 2010).Therefore, this algorithm can be initialized with the identity matrix 0 ˆm = C I .The span for SIRV case is given by The texture estimator in (4) can then directly be linked to the total scattering power (span) according to (7).Hence, the maximum likelihood estimator of span is defined as (8) It can be observed that the estimated covariance matrix is independent of the total scattering power and it only contains the polarimetric information.Finally, according to (1), the conventional covariance matrix C can be derived as ˆˆ.
In terms of the number of samples N used in the estimation, existing studies have shown that the span driven adaptive neighborhood (SDAN) (Vasile et al. 2008) can achieve a good tradeoff between preserving signal characteristics and collecting a large number of samples.According to the above estimation scheme, the conventional covariance matrix C of a PolSAR data set is decomposed into two parameters, i.e., the normalized covariance matrix Ĉ which contains polarimetric information and the span P that contains scalar texture information.In the next sections, for simplicity, Μ and P are represented by the estimations Ĉ and P , respectively.These two parameters will be finally utilized for PolSAR edge detection and superpixel segmentation.

Distance Measure between Covariance Matrices
Measuring pixel similarity in PolSAR images is quite essential to our proposed framework.It will be used in the following edge map calculation and superpixel segmentation.This subsection introduces the SIRV distance between two normalized covariance matrices.
For normalized covariance matrices, since they do not respect the Wishart distribution, conventional Wishart distance cannot be directly applied.Similar to (Liu et al. 2013), the likelihood ratio test respects to the texture τ and the normalized covariance matrix Ĉ is also adopted to derive the corresponding SIRV distance.The new hypothesis test is 0 1 : : where are the center normalized covariance matrices of two regions i Θ and j Θ , respectively.Then the generalized likelihood ratio is It can be observed that this distance measure considers the polarimetric information from neighborhood pixels, which can reduce the effect of speckle noise to some extent.However, this distance measure is not symmetric, which makes it unsuitable for superpixel generation.Similar to the definition of symmetric revised Wishart distance in (Anfinsen, Jenssen, and Eltoft 2007), we define the symmetric SIRV distance as (13) where it can be seen that the first term of ( 12) is removed and the symmetric distance is dominated by the second term which takes into account the neighbor observed samples.This is beneficial for some distributed targets, such as the complex buildings in heterogeneous urban areas.

SLIC Superpixel Segmentation
The basic idea of SLIC is the performance of local k-means clustering, and image pixels are iteratively assigned to neighborhood superpixels with close pixel gray value and spatial location (Achanta et al. 2012).The procedure includes three steps: 1) initialization of the cluster centers; 2) k-means clustering in a local region; and 3) post-processing.

Let p
N be the total pixel number, and K is the desired superpixel number.Initially, K cluster centers are sampled on a regular grid with uniform step size p / S N K = .The centers are then moved to the locations corresponding to the lowest gradient position in a 3 3 × neighborhood, which avoids centering a superpixel on an edge and to reduce the chance of seeding a superpixel with a noisy pixel.Next, in the assignment step, each pixel is assigned to the nearest cluster center with the least distance whose search region overlaps its location, and then the superpixel centers will be updated.In terms of the distance measure SLIC D , the pixel spectral information and the spatial location information are combined into a single distance measure as (Achanta et al. 2012) where s d is the Euclidean distance between two superpixels with center locations at ( , ) i i x y and ( , ) j j x y .p d represents the pixel spectral distance, which is the Euclidean distance in the three dimensional CIELAB color space.It should be noted that if simply defining SLIC D to be five dimensional Euclidean distance, there will be inconsistencies in clustering behavior for different superpixel sizes.For example, for large superpixels, spatial distances outweigh color proximity, giving more relative importance to spatial proximity than color.This will produce compact superpixels those do not adhere well to image boundaries.In contrast, for smaller superpixels, the converse is true.Therefore, it is necessary to normalize color proximity and spatial proximity by their respective maximum distances within a cluster.The maximum spatial distance within a given cluster should be the sampling interval S .However, for color distances, since it can vary significantly in different clusters, it is not easy to determine the maximum color distance λ in ( 14).This problem can be avoided by modifying ( 14) as ( ) where β is a weighting factor to be set manually that controls the shape and compactness of superpixels.Finally, after the clustering procedure, the pixels in the same cluster may be disjointed in space.Such pixels should be assigned the label of the nearest cluster center using a connected components algorithm.

Edge Detector
In CFAR edge detection algorithm, a set of filters with different orientations are applied on each pixel of a PolSAR image to calculate the edge map.The filter is displayed in Fig. 1 (a), which is controlled by four parameters, i.e., the length f l , the width f w , the spacing f d between two rectangle regions, and the angular increment f θ between two orientations.These filters estimate the average covariance matrix within the rectangle window on both sides of the center pixel and then calculate the Wishart distance as a measure of the probability of an edge pixel.The edge strength of each pixel is represented by the maximum distance from different sets of filters in this pixel.This method has been used in various PolSAR image applications (Lang et al. 2014;Liu et al. 2013), however, there are still two limitations: i) Rectangle window functions are poor 2-D smoothing filters.
Strong speckle in PolSAR data will diminish the average accuracy of covariance matrix since all the pixels are put equal weights; ii) The Wishart distribution is not suitable for heterogeneous urban areas, resulting in incorrect covariance matrix estimation and the corresponding distance measure.
Filter banks have been proven to be effective for the edge detection since they can extract directional intensity variations.
Inspired by this idea, we replace the rectangle-shaped filter with Gauss-shaped filter to overcome the first limitation of traditional CFAR edge detector, as shown in Fig. 1 (b).The horizontal Gauss-shaped window function is defined as: Gauss-shaped filter.where x σ and y σ control the window length and width, respectively.W is the Gauss weight for each pixel, which will be used for the average of local covariance matrix of center pixel on both sides of the window.From (16) it can be observed that the pixels near the center pixel have larger weights than other pixels.This is in accordance with the fact that information contained at the pixels near the center pixel is more important than those at other pixels when deciding whether the center pixel is an edge pixel.At each orientation, the local average center covariance matrix for a PolSAR image can be computed as After that, we utilize the SIRV distance given in (13) to calculate the similarity of center normalized covariance matrices on both sides of the central pixel.

Homogeneity Measurement
The new edge map introduced in previous subsection not only is beneficial to the initialization of cluster centers in SLIC, but also will be helpful to analyze the homogeneity in PolSAR images.Inspired by (Lang et al. 2014) Based on ( 18), the trace moment-based estimator of ENL is derived as where / L = X Z . The detailed derivation of this estimator can be found in (Anfinsen, Doulgeris, and Eltoft 2009).
Since the ENL and edge map are both related to target polarimetric information and more importantly, their value trends in homogeneous and heterogeneous regions are opposite, combining the ENL and edge map can significantly improve the probability of discriminating homogeneous and heterogeneous areas.According to (Fengkai, Jie, and Deren 2015), the homogeneity measurement can be represented as ENL HoM= EDGE (20) where EDGE denotes the proposed edge map.

Similarity Measure with Multiple Cues
This subsection gives the distance measure for PolSAR superpixel generation and segmentation, which considers the polarimetric, texture, and spatial information at the same time.
The homogeneity measurement is incorporated into the distance measure, making the trade-off factor adaptive to balance the shape and compactness of the superpixels.We firstly introduce the polarimetric similarity cue and texture similarity cue, then give the complete distance measure which is incorporated with the homogeneity measurement.
As we discussed before, the conventional covariance matrix can be decomposed into two parts, i.e., the normalized covariance matrix Ĉ which contains the polarimetric information and the span P that contains scalar texture information.Since the distance measure in ( 13) mainly considers the polarimetric information, we define a texture distance T D based on the estimated span P like T ˆ( , ) ( , ) max( ) where max( ) P denotes the maximum value of P and | | ⋅ represents the absolute value operator.It should be noted that since the estimated span only contains the scalar texture information without polarimetric information, simple subtraction operation can be applied on P directly.
In ( 15), β is utilized as a weighting factor to balance the spectral similarity and spatial proximity.Similarly, in (Song et al. 2015;Fachao, Jiming, and Fengkai 2015;Feng, Cao, and Pi 2014), this parameter is also chosen to be a constant to balance the polarimetric and spatial similarity.This parameter is usually set manually by trial and error, which might cause over-or under-superpixel segmentation in some spatially complicated areas.In this paper, the parameter β is set adaptively according to the local spatial complexity of the scene, which can be defined as ( ) adp 1 HoM( , ) HoM( , ) . 2 i i j j x y x y It can be seen that this adaptive parameter considers the homogeneity measurement of two compared pixels.Then the complete adaptive distance measure for superpixel generation is defined as It is worth noting that since T D is already normalized, there is no need to set another trade-off factor.adp β can be used to balance the spatial proximity and other two similarity measures.Specifically, for homogeneous areas where adp β is high, there is not too much edge information, the spatial proximity overweighs other two similarity measures, leading to compact superpixels.In contrast, for heterogeneous areas, adp β is low and can suppress the spatial proximity, therefore, the superpixels are generated mainly on the basis of polarimetric and texture information, which can well adhere to image boundaries.In ( 22), since the homogeneity measurement is quite large in homogeneous areas and low in heterogeneous areas, β can be set to 1 in different cases.Thus, the number of parameters in original SILC algorithm is reduced from two to only one, i.e., the number of superpixels, making the proposed method easy to use.
The implementation procedure of Pol-ASLIC is similar to that of SLIC except the steps before local iterative clustering, which covers the following steps.Input: original PolSAR image 1) Normalized covariance matrix and span estimation.
2) Edge map calculation based on the normalized covariance matrix.3) ENL estimation using (19).4) Homogeneity measurement calculation using (20).5) Set the number of superpixels K and initialize the cluster centers.
6) Local iterative clustering with the adaptive distance calculated using (23).7) Post-processing to eliminate the disjointed pixels.Output: superpixel map.

Dataset Description
Here we choose one PolSAR dataset to demonstrate the effectiveness of our proposed method with visual presentation and quantitative evaluation and comparison.This dataset, as shown in Fig. 2 (a), was acquired by ESAR L band system with study area located in Oberpfaffenhofen, Germany.The number of looks is four.There are a lot of man-made buildings in Fig. 2 (a), which is a heterogeneous urban area.For the visual presentation and quantitative evaluation, the manual segmentations are used as the ground truth.We select one area from the Pauli image and depict the ground truth segmentations, which are yellow lines in Fig. 2  In our experiment, two superpixel segmentation methods for PolSAR data, i.e., Qin's method in (Fachao, Jiming, and Fengkai 2015) and Liu's method in (Liu et al. 2013), are utilized for comparison.The former is a modified version of SLIC and the latter is designed based on Normalized cuts.In these two compared approaches, the edge maps produced by traditional CFAR edge detector are both incorporated into the superpixel segmentation.

Comparison of the Superpixel Segmentation Results
In this subsection, we will compare the superpixel segmentation results of three methods by visual assessment and quantitative evaluation.Fig. 3 gives the superpixel segmentation results of ESAR dataset using Liu's, Qin's, and our proposed methods, respectively.The numbers of superpixel are all set to 2200.To compare the results in detail, two areas marked with yellow rectangles A and B are selected from Fig. 3 (c)-(f) and are shown in Fig. 4. Area A mainly includes the buildings while Area B covers natural targets, as well as some man-made targets.Fig. 3 (a), (c), and (e) present the final superpixel maps of three methods, where the red lines superimposed onto the Pauli images depict the superpixel boundaries.Fig. 3 (b), (d), and (f) give the corresponding representation maps, in which the coherency matrix of each pixel is replaced by the average coherency matrix of the superpixel this pixel belongs to.From Fig. 3 (a) and (b), we can see that the edges of the superpixels are very smooth, and the shapes of the superpixels are quite regular.In natural areas, the results are acceptable.However, in urban areas, these superpixels cannot adhere well to image edges, the points and lines in the image cannot be preserved and most of the urban information is lost.There are two reasons for this result.On one hand, the edge map calculated by traditional CFAR edge detector does not work well in heterogeneous urban areas.On the other hand, unlike SLIC, Normalized cuts algorithm does not consider the pixel local information.Compared with Fig. 3   However, the shape of the superpixel is very irregular and the edges are not smooth, even in the homogeneous natural areas, as shown in Fig. 4 (c) and (g).In this method, to well preserve the edges and points, the trade-off factor which balances the polarimetric similarity and spatial proximity is set to 1.0.
Therefore, the polarimetric similarity overweighs spatial proximity, leading to irregular superpixels.The results in Fig. 3 (e) and (f) indicate that our proposed algorithm can generate promising superpixels for PolSAR images.The target points and edges can be preserved very well.Moreover, the compactness of the superpixels is adaptive.In homogeneous areas, the edges of superpixels are very smooth and the superpixel shapes are quite regular, which can be seen in Fig. 4 (d) (h).This is because in such areas, the homogeneity measurement is high, making the spatial proximity overweigh other two similarities.Therefore, the superpixels are compact and regular.In contrast, within heterogeneous areas, the homogeneity measurement is low.To preserve the detailed information, spatial proximity is not as important as polarimetric and texture similarities any more.Therefore, the superpixels have irregular shapes and can well preserve the image edges and points.From Fig. 4, we can also see that in heterogeneous areas, our method can achieve better results than Qin's method, where the building edges are clearer and the man-made targets are better extracted.This is because our new edge detector based on the SIRV product model can detect more accurate edges.In addition, the proposed distance measure considers more local information for superpixel generation, such as the span information.Compared with other two methods, it can be concluded that the proposed method provides smoother approximations in homogeneous areas, and also can keep better details in heterogeneous areas.

Quantitative Evaluation
In this paper, to perform a quantitative comparison of different methods, we adopt two commonly used evaluation metrics: i.e., boundary recall (BR) and achievable segmentation accuracy (ASA).BR is defined as the fraction of ground truth boundaries correctly recovered by the superpixel edges.If a true boundary pixel falls within 2 pixels from at least one superpixel edge, it can be regarded to be recovered correctly.Therefore, a high BR indicates that the superpixels can well adhere to image edges and very few true boundaries are missed.ASA is defined as the highest achievable accuracy of object segmentation when regarding the superpixels as units.By labeling each superpixel with the ground truth segments of the largest overlapping area, ASA can be obtained as the fraction of labeled pixels those are not leaked from the ground truth boundaries.Thus a high ASA means that the superpixels comply well with objects in the PolSAR image.These two indicators can evaluate the final superpixel maps.Fig. 5 and Fig. 6 depict the BR and ASA of three methods with different numbers of superpixels, respectively.This number is set from 250 to 2500 with different step sizes.According to these two figures, Liu's method performs the worst in terms of boundary adherence and achievable segmentation accuracy.In addition, another drawback is its extremely low time efficiency.Qin's method and our proposed method have similar BR values when the superpixel number does not exceed 500.However, if we increase this value, our method has a better boundary adherence than Qin's method.In Fig. 6, these two methods have similar results but our method still performs slightly better than Qin's approach.It is worth mentioning that the number of superpixels in these methods is set to 2200.Although increasing this value can achieve better results for all the three methods, it is not recommendatory to produce too many superpixels for PolSAR data.The main advantage of superpixel generation is that superpixels can cause substantial speed-up of subsequent processing since the number of superpixels of an image is significantly lower than the number of pixels.Therefore, the goal of our approach is to get better segments for PolSAR data with a limited number of superpixels.

CONCLUSION
This paper proposes an adaptive superpixel segmemtaton method for PolSAR images.The whole framework is designed on the basis of local iterative clustering and a heterogeneous statistical model, which has three main contributions, i.e., edge map calculation, homogeneity measurement evaluation, and adaptive distance measure definition.The Gauss-shaped filter and SIRV model have been utilized to improve the traditional CFAR edge detector for PolSAR images.The edges of urban areas can be effectively extracted and the locations are also accurate.Multiple clues are combined in the distance measure for superpixel segmentation, making it more effective to cluster pixels since more local information is considered.Moreover, this distance measure can balance different similarities adaptively, according to the homogeneity measurement.The performance of our proposed method is demonstrated on one PolSAR dataset from the ESAR platform, with visual and quantitative evaluation.The superpixel generation and segmentation results show that our proposed method has a better performance and can balance over segmentation and under segmentation more effectively than other methods.

Fig. 1 .
Fig. 1.Filter configuration.(a) Rectangle-shaped filter.(b).Gauss-shaped filter.where x σ and y σ control the window length and width, ESAR L-band image from the Oberpfaffenhofen area in Germany.(a) Pauli RGB image.(b).The yellow lines are superimposed onto Pauli RGB image, depicting the ground truth segments.
(a), the result in Fig.3(c) seems much better, where most of the edges and points are preserved.The superpixels can well adhere to image boundaries and capture the local information.

Fig. 3 .
Fig. 3. Superpixel generation results of Liu's, Qin's, and our proposed approaches with K = 2200 for ESAR image.The first column denotes the final superpixel maps of different methods.The red lines superimposed onto the Pauli images depict the superpixel boundaries.The second column gives the representation maps, where the coherency matrix of each pixel is replaced by the average value of the superpixel this pixel belongs to.

Fig. 4 .
Fig. 4. Comparison of detailed superpixel generation results in area A and B. The first row denotes the final superpixel maps.The green lines superimposed onto the Pauli images depict the superpixel boundaries.The second row gives the corresponding representation maps.(a) and (b) are the results of area A in Fig. 3 (c) and (e), respectively.(c) and (d) are the results of area B in Fig. 3 (c) and (e), respectively.