UNSUPERVISED SAR CHANGE DETECTION METHOD BASED ON REFINED SAMPLE SELECTION

: In deep learning based synthetic aperture radar (SAR) change detection, selecting samples of high quality is a crucial step. In this work, we have proposed a refined sample selection algorithm for unsupervised SAR change detection. The propose and incorporation of volume control factors and multi-hierarchical fuzzy c-means (MH-FCM) algorithm generate samples of large diversity and high confidence, thus satisfying the needs for high quality samples. The method includes two phases: firstly, an enhanced difference image is constructed according to the difference consistency between single pixels and their neighbourhoods, and a triangular threshold segmentation method is then proposed to determine the volume control factors for sample selection. MH-FCM is developed to classify the log mean ratio difference image into 4 classes. Secondly, a dual-channel convolution neural network with an adaptive weighted loss is adopted to learn and predict the input and to obtain the change detection result. Experimental results of the Gaofen-3 dataset in Beijing have validated the effectiveness and usefulness of the proposed method.


INTRODUCTION
Since synthetic aperture radar (SAR) possesses the merits of not being influenced by insufficient light and climatic conditions, making it the optimal acquisition method for certain scenes in remote sensing Zhang et al., 2016). As one of the most representative research topics in SAR, SAR change detection has been applied in forest monitoring (Marshak et al., 2019;Pantze et al., 2014), crop monitoring (Khabbazan et al., 2019;Teimouri et al., 2019), urbanization research (Ban and Yousif, 2012;Hu and Ban, 2014), especially in disaster detection, e.g. forest fire detection (Wei et al., 2018;Zhou et al., 2019) and flood detection Lu et al., 2014;Schlaffer et al., 2015).
The unique imaging mode of SAR brings difficulties to manual interpretation, thus unsupervised methods have become the mainstream trends in SAR change detection, either traditional or deep learning based algorithms. Bruzzone and Prieto (Bruzzone and Prieto, 2002) summarised the traditional unsupervised SAR change detection method into a classic paradigm: image preprocessing, difference image (DI) construction and analysis. Of these, generating and analysing DI are the primary research directions. Additionally, unsupervised deep learning methods can usually be used in two steps-selecting samples in unsupervised ways and constructing deep models for learning and prediction.
As a crucial step in traditional change detection algorithms, the quality of DI generation is closely related to the quality of detection, among which logarithmic ratio (LR) (Dekker, 1998) is the most widely used algorithm. The likelihood ratio algorithm (Xiong et al., 2012) utilises the statistical characteristics of the pixel neighbourhood to construct the likelihood ratio and to reduce the noise effect caused by isolated * Corresponding author: bincui@njupt.edu.cn. pixels. Besides, Zhang et al.  proposed to use the shearlet transform and Gaussian LR to build differential expressions for saliency detection and Gabor feature extraction.
Commonly used DI segmentation methods include thresholding and clustering methods. Thresholding methods are widely used for its simplicity and efficiency. Typical examples include the generalised Gaussian model adaptive minimum error algorithm (Bazi et al., 2005) and the Gaussian model expectation maximization algorithm (Bazi et al., 2007). Commonly adopted clustering methods include the fuzzy local information C-means (Krinidis and Chatzis, 2010), and the Markov random field fuzzy C-means (FCM) (Gong et al., 2014).
Compared to traditional methods, deep learning based SAR change detection possess better learning and feature extraction abilities. To reduce human interference, several unsupervised methods have been proposed. Gong (Gong et al., 2016) used the FCM-based joint pre-classification method to generate labels, and then selected samples and performed characterization learning through deep models to obtain predictions. In , a dual-channel convolutional neural network (DCCNN) was developed for change detection. Gao et al. (Gao et al., 2017) utilised hierarchical FCM (H-FCM) segmentation to determine the changed, unchanged, and intermediate samples.
The selected unchanged and changed samples were then used to train the network for predicting intermediate pixels. Furthermore, various deep networks were also used in change detection, such as self-step learning (Shang et al., 2018), Gabor principal component analysis net (GaborPCANet) , PCANet , and stacked autoencoders .
There are two main issues in sample selection for unsupervised SAR change detection. Firstly, algorithms could generate unstable results in analysing DI for constructing sample data, causing failures of high-confidence sample selection and further model learning. Secondly, due to that unchanged areas often occupy most of the image, leading to imbalanced sample classes. To solve these, this paper presents an unsupervised SAR change detection method based on refined sample selection (flowchart shown in Figure 1). Based on volume control factors and hierarchical clustering, a refined sample selection method has been developed for effectively avoiding instability and imbalance in sample selection and producing high quality samples. Then a DCCNN with an adaptive weighted loss (AWL) has been trained to detect changes, further balancing contributions of sample classes.

Refined sample selection
To limit the number of pixels in each class, a volume control factor is proposed for refined sample selection. Firstly, we propose an enhanced DI generation method that combines the difference consistency of single pixels and the neighbourhoods. According to the statistical characteristics of the histogram of the DI, a triangular threshold segmentation method is then proposed to calculate the volume control factor.

Enhanced difference image:
The enhanced detector is constructed by LR and the logarithmic likelihood ratio (LLR), where LR and LLR reflect the single pixel and neighbourhood differences, respectively . Assuming that the two images are 1 I and 2 I , the LR difference of pixel ( , ) i j is: The LLR difference becomes: , , where , i j Ω is the neighbourhood of ( , ) i j , and 1,2 ( , ) I m n is the pixel intensity in , i j Ω .Concerning the consistency of two equations above, the enhance DI is constructed by multiplying LR D and LLR D as: Figures 2(a), (b) and (c) display the histograms of LR, LLR and the enhance DI, respectively. Different from the other two methods, in Figure 2(c), the unchanged grey levels are much closer to 0, and the change area is loosely distributed along the long tail. Thus, the homogeneous pixels are more concentrated, which is beneficial for more accurately ascertaining the volume of changed pixels.

Triangular threshold segmentation:
Incorporating the unique shape of DI histogram, a Douglas-Peucker method (Douglas and Peucker, 1973) based triangular threshold segmentation method is proposed, and further used as a volume control factor of different classes.
with the similar triangles then: where max max L H is the maximum frequency, 1 max L L represents the grey level corresponding to max max L H , 1 n L L is the grey level of n L , and n n L H denotes the frequency corresponding to 1 n L L . In this case, n L becomes the segmentation threshold.
Similarly, a grey level m L can be derived as the threshold of the right side of the histogram peak. Therefore, pixels with grey levels between n L and m L are unchanged, and pixels with grey levels greater than m L or less than n L are changed pixels.
Assuming Num is the number of pixels in changed area, the value of Num is used to calculate the volume control factor, and the thresholds that limit the number of changed and intermediate pixels are set as low T and high T , respectively.

LMR difference image:
The segmentation result is not used to accurately obtain the change area but to evaluate and limit the number of pixels in different classes. Although the enhanced DI has better threshold separability, the DI itself is not suitable for clustering analysis due to the large difference degree between potential different classes. Thus, a multihierarchical FCM (MH-FCM) based on log mean ratio (LMR) is proposed for sample selection. LMR operator is defined as: Moreover, most of the unchanged regions are easy to classify, leading to imbalanced sample classes and model learning. If the FCM binary segmentation result of the DI is directly used to determine the upper limit of the number of CPs, multiple false detection pixels will appear and decrease the sample quality. Therefore, an MH-FCM algorithm is developed by introducing the Num , the thresholds low T and high T (derived from section 2.1.2) as volume control factors, and to control the pixel volume in different classes. Pixels in the DI are classified into 4 groups: CPs, IPs, UPs, and high-confidence unchanged pixels (HUPs) to reduce the effect of UPs that are easily distinguished and meanwhile balance the sample set. The process of the MH-FCM algorithm is presented as following: (1) The upper limit of the number of CPs should be less than T is set to 0.8.

Center Center
Center .
The numbers of corresponding pixels in each class are 1 2 6 , ,...,
(3) The number of CPs is where the upper limit of the number of IPs, high T , is set to 1.2.
The number of the classes in IPs is i t .

DCCNN
In this section, the input is constructed, and a DCCNN with AWL for calculating the difference between the corresponding input patches is utilised.

Data construction:
Dual-channel patches are retrieved as blocks centered at pixels classified as different classes from the segmented image. Overlapping patches of sizes of 1 1 n n × are collected using a sliding window with a step size of 1, and dualchannel images are formed by concatenating two patches extracted on the same position of the multitemporal images across channels. And the number of patches equals the size of the original image. Thus, the relationship between the dualchannel patch and the class in the sample selection result is established, and the dual-channel patch is used as the input data. Patches centered at CPs and UPs are used as training samples, and patches center at IPs are used as test data. As the class attribute of patches centered at HUPs are easily determined with weak learnability, they are ignored in network training. It is noted that the large gap between the number of different training classes still may remain.

Network architecture and loss function:
Taking into account of the input size, a two-layer DCCNN was utilised to evaluate the degree of difference between patches. Figure 4 displays the network architecture, in which 3×3 and 2×2 kernels are used in convolutional and pooling layers, respectively. Although patches centered at HUPs are discarded, the imbalance between samples in the two classes still affects the performance of proposed method. Generally in deep networks, the weight of each sample is defaulted to be the same; therefore, the model automatically tilts towards the majority class and ignores the minority if there was a huge gap in between. In this paper, the AWL is applied, allocating larger weights to changed samples. If the number of unchanged and unchanged samples are u Num Ω and c Num Ω , respectively, the weight factor is:

Datasets
Original images of sizes of approximately 13,000×22,000 pixels from Beijing Gaofen-3 SAR were acquired in April, 2017, and May, 2018, respectively. Seasons for the acquisition dates are similar, thus vegetation changes have little effect on the backscattering. Several pre-processing steps, such as calibration, registration, geocoding, and cropping were completed. After the following multi-look processing of 2×3, the azimuth and distance resolution are approximately 7.8m. Owing to the considerably large size of the entire image, three representative sub-regions were selected for evaluation ( Figure 5).

Experimental design
For validating the effectiveness of the algorithm, we compared the proposed method with several state-of-the-art algorithms, including the FCM algorithm based on LR (LR-FCM), extreme learning machine based on neighbourhood ratio (NR-ELM)  and GaborPCANet. Moreover, to verify the advantage of the AWL, results of non-adaptive weighted loss (NAWL) were also presented as baseline.
For comprehensive analysis, several evaluation criteria were applied to assess the detection accuracy: (1) false positives (FP) corresponding to the number of UPs that have been erroneously identified as CPs; (2) false negatives (FN) denoting the number of CPs that have been incorrectly rejected; (3) overall error (OE) representing the number of wrongly classified pixels as the sum of FP and FN; (4) percentage correct classification (PCC) as the ratio between the amount of correctly detected pixels and total amount of pixels; and (4) Kappa coefficient (KC).

Yamenkou dataset:
Results of the Yamenkou dataset are shown in Figure 9 with comparisons of quantitative analysis presented in Table 1. LR-FCM result was severely influenced by noise owing to the lack of neighbourhood information (Figure 9(a)), explains why the three-class FCM result was abandoned in sample selection. The results of NR-ELM and GarborPCANet shown in Figures 9(b) and (c) were slightly less affected by noise than LR-FCM. Figure 9(d) shows the result of the non-adaptive method, similar to the result shown by the proposed method in Figure 9(e). From Table 1, the non-adaptive result was better than the proposed method in PCC; as for KC, the proposed algorithm was better. Generally, the proposed methods greatly decreased the occurrence of false detections with a PCC of 98.49%, and the KC of 59.03%.

Luchengxiang dataset ：
Visual results of multiple methods of the Luchengxiang dataset are shown in Figure 10 with quantitative results displayed in Table 2. Changes within the region are mainly caused by the transitions between buildings, vegetation, and bare land. As shown in Figures 10(a), (b) and (c), results of LR-FCM, NR-ELM and GaborPCANet were to some extent influenced by noise. Compared with reference image, the proposed method (results shown in Figure10(e)) could basically detect all changed areas with little noise, indicating more balance of FP and FN and controlling false and missed detections within 0.5%. When incorporating the NAWL, only the sample pixels were detected as CPs, largely caused by the imbalance between the numbers of changed and unchanged samples; this tilted the model weight towards the unchanged samples and prevents the model from correctly classifying the input. Therefore, the AWL is necessary. dataset：Figure 11 shows the results of Weishanzhuang dataset; the evaluation indicators are displayed in Table 3. Changes in this area are all from buildings and vegetation. As displayed in Table 3, all the methods, except LR-FCM, achieved good results. LR-FCM (result shown in Figure 11(a)) missed many changed pixels, primarily due to poor anti-noise ability. As shown in Figures 11(b) and (c), NR-ELM and GaborPCANet results could detect most of the main change area. The non-adaptive method showed more missed areas caused by the imbalance sample class. Figure 11(e) shows that the proposed method could basically detect all change areas with a KC (91.69%) slightly better than that of other methods.  Table 3. Results (%) and comparison of various methods on Weishanzhuang dataset.

Parameter sensitivity analysis
Three aspects are discussed in this section: (1) the quality of selected samples, (2) the neighbourhood size of input patch, and (3) the sample imbalance.

4.2.1
The quality of selected samples：In sample selection, two issues must be addressed: (1) determining the number of selected samples; (2) selecting samples that would achieve a high accuracy. However, when the sample was selected in an unsupervised manner, wrong samples will be generated. For method validation, we compared the quality of selected samples between the proposed and 6 various sample selection methods, including: NR-FCM3, LMR-FCM3, LR-FCM3, LR-HFCM, H-LMR-HFCM, and NR-HFCM, where FCM3 represents the three-class FCM used in DI. Parameters in Table 4 were used to calculate the sample selection accuracy. The vertical axis denotes its corresponding class in the reference image (the first letter), while the horizontal axis represents the pixel class in sample selection (the second letter). The number of pixels correct classified in sample selection (initial correct volume, ICV) and the sample accuracy (SA) were calculated: CC UU SA CC UU UC CU  Yamenkou dataset was chosen to evaluate the quality of sample selection. Figure 12 shows comparisons of various sample selection methods. The proposed method split the original UPs into UPs and HUPs. Figures 12(g) and (h) show the two classes of pixels before and after merge. In Figure 12(g), the black, grey, and white pixels represent UPs, IPs, and CPs, respectively. In Figure 12(h), the black, red, green, and white pixels represent HUPs, UPs, IPs and CPs, respectively. The proportion of each class obtained using different sample selection methods are shown in Table 5.
ICV and SA of various methods were calculated (shown in Figure 13). The proposed method obtained the largest number of correct samples, and the number of correctly identified pixels in the sample selection accounted for 94.15%; SA reached 98.82%, and both were the highest among all the methods.

Neighbourhood size of input patch ：
To find the optimal neighbourhood sizes of input patch, we varied the neighbourhood size from 9×9 to 17×17. Figure14 shows the performances of KC against varying neighbourhood sizes on Yamenkou, Weishanzhuang and Luchengxiang datasets, respectively. For all three datasets, the 13×13 achieved the highest accuracies, indicating that moderate neighbourhood size is more suitable. It is noted that the neighbourhood size can to some extent influence the detection results, and optimal size can be empirically chosen. Considering the network structure, the optimal patch size was 13×13. Figure 11. Comparisons of KC with varying neighbourhood sizes.

4.2.3
The sample imbalance ： The imbalance between changed and unchanged samples cannot be ignored, as it largely decreases the detection accuracy or leads to model invalidation. Therefore, multi-hierarchical clustering and AWL were used to address sample imbalance. MH-FCM algorithm was used to separate most of HUPs from UPs to reduce the degree of imbalance and training cost. Sample volume proportions of different classes in the three datasets are shown in Table 6. If HUPs were not separated from UPs, the ratio between the unchanged and changed samples even reach to 164:1 in Luchengxiang dataset, causing the network failure. Figure 15 shows the proportional relationship before and after HUP separation. It can be observed that the proposed refined sample selection greatly reduced the sample imbalance.  Results of using the AWL and NAWL were also compared. As shown in Figure 16, the use of AWL effectively improved the accuracy, especially in the Yamenkou dataset. Figure 13. Comparisons of AWL and NAWL on three datasets.

CONCLUSIONS
This paper introduces the refined sample selection for unsupervised SAR change detection, which uses the volume control factors and MH-FCM for selecting samples of high quality. Then a DCCNN with AWL is constructed to alleviate the imbalance between the changed and unchanged samples and meanwhile produce the change detection result. The propose of refined sample selection method not only optimises the process of selecting samples but also reduce the effect due to imbalanced sample classes. Moreover, the incorporation of volume control factors and MH-FCM algorithm could generate high quality samples of large diversity and high confidence.
Experimental results indicate the effectiveness of the proposed method. However, the generation of DI highly relies on the ratio operation, and may be sensitive to low scattering regions, causing false alarms detections.