COLLAPSED BUILDING CLASSIFICATION WITH OPTICAL AND SAR DATA BASED ON MANIFOLD LEARNING

The collapse of buildings is a major factor in the casualties and economic losses of earthquake disasters, and the degree of building collapse is an important indicator for disaster assessment. In order to improve the classification of collapsed building coverings (CBC), a new fusion technique was proposed to integrate optical data and SAR data at the pixel level based on manifold learning.Three typical manifold learning models, namely, Isometric Mapping(ISOMAP), Local Linear Embedding (LLE) and principle component analysis (PCA), were used, and their results were compared. Feature extraction were employed from SPOT-5 data with RADARSAT-2 data. Experimental results showed that 1) the most useful features of the optical and SAR data were contained in manifolds with low-intrinsic dimensionality, while various CBC classes were distributed differently throughout the lowdimensionality spaces of manifolds derived from different manifold learning models; 2) in some cases, the performance of Isomap is similar to PCA, but PCA generally performed the best in this study, yielding the best accuracy of all CBC classes and requiring the least amount of time to extract features and establish learning; and 3) the LLE-derived manifolds yielded the lowest accuracy, mainly by confusing soil with collapsed building and rock. These results show that the manifold learning can improve the effectiveness of CBC classification by fusing the optical and SAR data features at the pixel level, which can be applied in practice to support the accurate analysis of earthquake damage.


INTRODUCTION
LARGE-SCALE earthquakes severely damage people's lives and properties. Fast, accurate, and effective earthquake disaster monitoring and evaluation using airborne and spaceborne remote sensing provides an important scientific basis and decision-making support for government emergency command and postdisaster reconstruction.
ERS-SAR data before and after the Kobe earthquake in 1995, were analyzed the information of intensity change and coherence coefficient, and studied that the extraction accuracy of earthquake damaged buildings is higher than any single information (Matsuoka et al.2004); The intensity and coherence of SAR image before and after the change were Jointly detected,and then added GIS information of research area to detect the change information, the accuracy of the extraction was obviously higher than that of the single information (Gamba et al.,2007); The intensity change information of alos-palsar data were used before and after the Wenchuan earthquake in 2008 to combine with the coherence coefficient to detect and analyze the change of the building damage in Dujiangyan urban area, and the extraction results were highly consistent with the field survey results (Gong Lixia et al.,2016);High-resolution optical image and SAR image were integrated to explore to measure the seismic damage information of buildings, using this method of optical image vectorization to extract the building mask, superimposing the vectorized mask data on the SAR image intensity difference value map, so as to improve the accuracy of building seismic damage extraction (Chini et al.,2009); Taizi port area in the 2010 Haiti earthquake was taken as an example, It has experienced a number of buildings on the IKONOS optical image before the earthquake, and carried out SAR simulation imaging as an image of intact buildings before the earthquake, it is used to detect the changes with the onboard SAR images of COSMO-SKYMED and RADARSAT-2 in the area after the earthquake. The comparison between the results of building damage extraction and the GEOEYE optical image after the earthquake have showed that the method can detect damaged and undamaged buildings with high accuracy (Wang Tianlin et al.,2012). It has experienced that ENVISAT in the central Dujiangyan area of the Wenchuan earthquake in 2008 Aasar acquired the data before and after the earthquake and the optical image of IKONOS after the earthquake,the detection results of the intensity correlation changes of SAR image before and after the earthquake and IKONOS Image after the earthquake were classified as a band combination, and the damaged buildings were extracted with the accuracy of 81.3% (Xue Tengfei et al.,2012).
Since ERS-1 SAR data came out in the early 1990s, the researchers have been studying the fusion of optical and SAR data,Various pairs of optical and SAR data, such as Landsat TM/ETM+ and ERS-1/2, SPOT-4/5 and ERS-1/2, and Landsat TM/ETM+ and ALOS PALSAR data, have been fused to support the studies of landuse/landcover classification and forest and farmland monitoring using a variety of fusion methods (Joshi et al., 2016;Pohl & Van Genderen, 2016;Zhang & Xu, 2018). Although many studies have used both optical and SAR data, there is still no consensus on the appropriate fusion level (i.e., pixel-based level, feature-based level or decision-based level) for combining these two data sources (Pohl & Van Genderen, 2016;Zhang & Xu, 2018). Considering the differences in the two data sources, conventional fusion approaches, such as MNF and PCA, which have been used for fusing various optical data, may not behave well in the fusion of optical and SAR data (Tupin, 2010). Manifold learning is usually used to reduce the dimension of hyperspectral remote sensing data. Manifold learning attempts to learn the low-dimensional structure of data in highdimensional space (Verveer & Duin, 1995). In the fusion of optical data and SAR data, various features extracted from optical data and SAR data can also form high-dimensional data sets. Learning this low dimensional and unique information can be regarded as the fusion process of two data sources. Therefore, the underlying ideas of manifold learning can potentially provide an approach to fusing optical and SAR data to improve collapsed building covers (CBC) classification. This study aims to propose a new methodological framework at the pixel level to fuse the features of optical and SAR data to improve CBC classification based on manifold learning.

METHODOLOGY
The general methodological framework of fusing optical and SAR data for CBC classification is demonstrated based on manifold learning in Figure 1. This framework has two main parts. The first part is data preprocessing and feature extraction,in this part, the optical data and SAR data are preprocessed by the corresponding processing technology. For SAR data, focus processing and multi-view processing are used ,and polarization features are extracted by using various decomposition models. Then, the optical and SAR data are jointly registered using manually or automatically selected ground control points (GCP), and the root mean square error (RMSE) is less than one pixel. Then, texture features are extracted from the co-registered SAR data,for optical data, spectral and texture features are extracted. The second part uses manifold learning models to fuse optical data and SAR data. Firstly, all data sets (including related features) are normalized by using the optical data and SAR data, and the intrinsic dimensionality in manifolds is eliminated. Then, the intrinsic dimension is used to analyze the intrinsic geometry of the whole dataset. This is followed by a manifold learning process to create manifolds using different manifold learning models. Then, support vector machine is applied to manifold for CBC classification. Finally, the results validation and accuracy assessment are applied to perform CBC classification and determine the effectiveness of the proposed framework. All features are extracted from optical data and SAR data,and placed in stacks representing various layers of different features. These layer-stacked data sets form a high-dimensional feature space, which integrated the information of optical data and SAR data. Therefore, the fusion of optical data and SAR data can be studied and analysed from the high-dimensional feature space. This section describes which features can be extracted from optical and SAR data in this study, and how to extract these features. Generally speaking, the spectral characteristics in optical satellite data are not enough to distinguish different CBC classes because of their material diversity and spectral confusion. Therefore, context information (for example, texture and shape features) calculated by adjacent pixels is usually needed to provide supplementary information. However, the combination of spectral information and contextual information is still insufficient to completely separate from different CBC classes; therefore, polarimetric SAR data was previously fused with optical data. The original SAR data contains different polarizations of backscatter coefficients and phase information. However, for dual polarization and full polarization SAR data, some decomposition methods can be used to extract further polarization features (except for backscatter coefficients).
For the SPOT-5 data, two different categories of features were extracted: spectral features and textural features. The Normalized Difference Vegetation Index (NDVI) and Normalized Difference Water Index (NDWI) were employed to calculate the spectral features. Textural features, including the four textural measures of homogeneity, dissimilarity, entropy, and angular second moment, were calculated based on the gray level co-occurrence matrix (GLCM). Additionally, the further shape features and textural features were extracted using the Shape Adaptive Neigh-borhood (SAN) technique. Various features were also extracted from different SAR data based on their polarimetric characteristics. For fully polarimetric SAR images (Radarsdat-2), different polarimetric features were extracted. Fully polarimetric SAR data can be described through the backscattering coefficient matrix. In this study, Several commonly polarimetric decomposition approaches were employed, including the polarization ratio, coherence coefficient, Cloude-Pottier decomposition, Freeman-Durden decomposition parameters, and Ya-maguchi four-component decomposition parameters. This study employed the polarization ratio, coherence coefficient and Cloude-Pottier decomposition approaches. Finally, by combining both the original data and features from optical and SAR data, a 31dimensional data set was obtained. The intrinsic dimension refers to the dimension of the real existence of the data. The feature data sets of the fused optical data and the synthetic aperture radar data are actually (or at least very similar) located on a lower dimensional manifold which is much smaller than the dimension of the original feature space.
In this study, the intrinsic dimension is used as a reference number to explore the essential regularity of feature highdimensional data sets. In order to help understand the manifold and improve the accuracy of CBC classification. Eigendimension estimation methods can be divided into two categories: eigenvalue method and geometric method.With the continuous research of a large number of researchers, the indepth understanding of the theory of topological dimension and geometric fragment shape, we have a further understanding of the study of the intrinsic dimension, and put forward some more approximate algorithms of the intrinsic dimension. There are packet number methods, entropy estimation methods, tensor methods to solve the intrinsic topological dimension of data Voting method, k-nearest-neighbor graph method of constructing local nearest neighbor graph, maximum likelihood estimation method of likelihood function of distance between near neighbors (Verveer & Duin, 1995), etc. Geometric method overcomes the shortcomings of eigenvalue method, which is based on fully exploring the intrinsic geometric characteristics of dataset. More and more geometric methods are applied to the estimation of manifold eigendimensions of nonlinear data.In this study, we utilized the maximum likelihood estimator (MLE), which is a geometric method, to estimate the intrinsic dimension of the fused data.
The fusion of optical and SAR data represents the analysis of the combined high-dimensional features from two data sources. Manifold learning only from the aspect of dimensionality reduction technology, it refers to mapping the data set existing in the high-dimensional space to a low-dimensional space through some nonlinearity, keeping the geometric structure information embedded in the data set in the high-dimensional space as much as possible, and then obtaining the valuable lowdimensional manifold subspace.In this study, three typical manifold learning methods were comparatively employed, including ISOMAP, Local Linear Embedding (LLE) and principle component analysis (PCA). PCA algorithm assumes that the feature data set exists in a linear structure in high dimensional space; the low dimensional manifold of Isomap can maintain the similarity relationship expressed by geodesic distance in the original data;the low dimensional manifold of LLE maintains the relationship between the data points in the overlapping local structures in the high dimensional manifold.

RESULTS AND ANALYSIS
The test site is the urban area of Yushu County, Qinghai Province, China, after the 2010 7.1-magnitude earthquake. The six CBC classes were bare soil(SOI), sparse vegetation(VEG), water(WAT), rock(ROC), collapsed building(CBID), and intact building(BLD). There is minimal topographic relief in the urban area of Yushu.The ascending RADARSAT-2 fine-mode polarimetric data with 8-m resolution at an incident angle of 21 • were acquired on April 21, 2010. This was also the case as seen from high-resolution time-series Google Earth images( Figure  2.). Finally, all optical and SAR images were coregistered to achieve a 10 × 10 m resolution under the georeference system of WGS84.The cluster sampling scheme was adopted to select sample pixels from each of the six classes in the three study sites. To increase the robustness of the classifiers and to evaluate their effectiveness in real applications, 70% of these samples were used as training data and 30% were used as testing samples to calculate the accuracy of this method. The learnt manifolds are treated as low-dimension feature spaces in which the six CBC classes are distributed from training data. Since their dimensionality is higher than three, it is difficult to visualize these feature spaces in a figure. Thus, a matrix of scatter plots between every combination of two features was drawn for each manifold. Figure 3 illustrates the plotted matrixes of the four manifolds derived using the ISOMAP ， LLE ， PCA algorithm. ISOMAP results indicate that the first three features (B1-B4) yielded more useful information, as they represent CBC classes that can be more clearly separated within these dimensional features. Rock overlapped with collapsed building to some extent, but both were well separated from vegetation. Some of the classes showed especially wide distribution, such as water. The manifolds derived from LLE demonstrate special characteristics compared to the one derived from ISOMAP. the scattering plots appear to resemble "lasers" being emitted from a central source.collapsed building ,soil and rock were partially overlaid together. the manifolds derived from the PCA method show that soil, vegetation, and intact building were well separated, except for some small portions of confusion. Rock and collapsed building tended to be mixed together more than other classes. It showed that for a few classes that are difficult to distinguish, manifold learning algorithm can better mine data structure and improve classification effect, but for more classess, because the overall data structure is more complex, the wrong local structure may be more, which may cause inaccurate overall manifold structure.
The intrinsic dimensionality was calculated as a reference by applying the MLE method described. The intrinsic dimensionality was 3.6723 for the dataset .By using the samples and the learnt manifolds, CBC can be classified using the SVM algorithm. The overall accuracy obtained using ISOMAP,LLE PCA were respectively 94.34%,93.68% and 97.13% ,and the Kappa value was 0.9216,0.8677,and 0.9639, which represented an improvement of approximately 1.13%,0.47%,1.8% compared to the original data. The producer's accuracy in Figure 4 indicates that rock, soil and collapsed building were more easily misclassified as other classes. PCA manifolds generally performed better than ISOMAP and LLE manifolds, while LLE obtained the lowest accuracy.

CONCLUSION
In this study, a methodological framework is proposed for fusing textual and spectral features of optical and SAR data using manifold learning models to improve CBC classification. This new technology way first extracted textual and spectral features of optical and SAR data at the pixel level,then fused all features based on the manifold learning models.It demonstrated some positive results to support the effectiveness of this pixelbased fusion when using single and dual-polarimetric SAR data. Three different manifold learning models (ISOMAP, LLE and PCA ) were employed. Three different combinations of optical and SAR data were used to test the effectiveness of the proposed framework. The SPOT-5 satellite data were used as optical data, and SAR data from RADARSAT-2 data, were employed as SAR data. Two main findings can be drawn from the experimental results. First, the most useful information of the optical and SAR data were included in the manifolds with intrinsic dimensionality, while various CBC classes were distributed differently over the feature spaces of the manifolds derived from different learning methods. Second, although ISOMAP performed comparably to PCA in some cases, PCA generally performed the best out of all the study cases, yielding the best producer's and user's accuracy. The LLE-derived manifolds obtained the lowest accuracy, mainly by confusing soils with rock and vegetation. These results indicated the applicability and effectiveness of the proposed framework using the manifold learning approach to fusing optical and SAR data and improving CBC classification. Zhang, H. S., Xu, R.,2018. Exploring the optimal integration levels between SAR and optical data for better urban land cover mapping in the Pearl River Delta. International Journal of Applied Earth Observation and Geoinformation,64,