Earthquake-damaged Regions Detection from High Resolution Image based on Super-pixel Segmentation and Deep Learning

Accurate detection and automatic processing of earthquake-damaged regions is essential for effective rescue and post-disaster reconstruction. In this study, we proposed a Combined Super-pixel Segmentation and AlexNet Detection approach (CSSAD) for automatically extracting damaged regions from post-earthquake high-resolution images. Simple Linear Iterative Clustering (SLIC) algorithm was used to segment the high resolution images to obtain more homogeneous geo-objects. Multiscale samples database, which took the different scale effect of damaged regions into account, was constructed based on the geometric centre of each superpixel. AlexNet, which achieved the automatic extraction of high-level features and accurate identification of target geo-objects, was used to detect the damaged regions. To enhance the localization accuracy, the output of AlexNet was further refined using super-pixel segmentations and masked out of shadow and vegetation. Compared with traditional method, the proposed approach effectively reduces the false and missed detection ratio at least 10 percent.


INTRODUCTION
In recent decades, the devastating earthquake disaster has caused great loss to human life and property (Geiss and Taubenboeck, 2013;Romaniello et al. 2017). It is very important to carry out post-earthquake disaster assessment to obtain the postearthquake disaster information timely and effectively (Romaniello et al. 2017;Tong et al. 2012). Building damage is often considered as the most important sign of post-earthquake disaster. Accurate and automatic detection of earthquakedamaged buildings is not only crucial for initiating effective emergency response actions, but also is important for estimating the economic losses, managing the resources to be allocated during the reconstruction phase (Tu et al. 2017;Jared et al. 2017).
Remote sensing has long been recognized as an effective technology for earthquake-damaged building detection (Yeom et al. 2017). With the rapid development of remote sensing technology, it is moving towards the goal of higher spatial resolution, higher temporal resolution and higher spectral resolution. Based on new remote sensing technologies such as Synthetic Aperture Radar (SAR), Light Detection and ranging (LIDAR), and Unmanned Aerial Vehicle (UAV) photography, the capability of multi-platform, multi-sensor, multi-scale and multi-angle observation is further improved (Chen et al. 2016;Chiang et al. 2017). Benefited from a wealth of data sources, many studies have presented assessment techniques for earthquake building damage by using different images (Elsen et al. 2017;Endo et al. 2018). Change detection based on pre-and post-disaster image was mostly used approach in damage detection, which was convenient and effective enough (Huang et al. 2018). Considering the limitation of data acquisition for predisaster, in recent years more studies focused on damage * Corresponding author detection based on the post-disaster Very High Resolution (VHR) images. UAV is flexible, easy to operate, low cost, safe and reliable. UAV images and high-definition video, were used to achieve the accurate assessment of damage (Cusicanqui et al. 2018). Owing to the good observation ability, oblique aerial photography technology has been widely used in recent studies. Oblique aerial images can achieve the fine observation of building façade, which contributes to accurately evaluate the damage situation (Gerke, Kerle, 2011). In addition, considering the defects of different remote sensing data in object recognition, the damage detection using multi-source data becomes one of hot spots in present research (Xue et al. 2016). However, the registration and fusion of heterogeneous data are relatively difficult. After acquired abundant and detailed image data of post-earthquake, machine learning algorithm was regarded as the most reliable method to detect the building damage (Vetrivel et al. 2016). The common damage detection approach using machine learning includes following three steps: manually select the typical and important damage characteristics, construct representative training samples, and use the machine learning models to classify or detect the targets.
With the development of the artificial intelligence and computer vision technology, deep learning, one of the state-of-the-art techniques in the field of machine learning and visual recognition, is identified as the commonly used and effective way to extract discriminative and representative high-level features (Bai et al. 2018). Deep learning can learn nonlinear spatial filters automatically and generalize a hierarchy of increasingly complex features, which is quite applicable in damage detection from VHR image. The advantage of deep learning is that it can learn features directly from the original data, which shows more flexibility and ability than traditional classification methods. In particular, convolutional neural network (CNN), which is composed of multi-layer nonlinear adaptive layer, has been proven to be a more effective image processing model. The whole CNN is end-to-end trained from the original pixel to the final category, thus reducing the need of hand-designed appropriate feature extractors.
In general, CNN has been the most commonly used method in damage detection. However, a lot of problems remain to be solved. In post-earthquake VHR images, damaged buildings are various in size and modality (Chen et al. 2013). Thus, CNN suffers from the fixed receptive field, the reduced feature resolution, and the insufficient training sample has severely limited the accurate damage detection. Besides, CNN is primarily a pixel-based classification method, which will cause serious confused classification and pepper phenomenon, although they can take full use of high-level features. Objected-based Image Analysis (OBIA) can effectively avoid the occurrence of salt and pepper phenomenon, but it mainly relies on the low-level features within the segmented object (Chima et al. 2018). Combining OBIA technology with CNN is important in current studies.
To solve these problems, we proposed one Combined Superpixel Segmentation and AlexNet Detection approach (CSSAD). On the one hand, the AlexNet can solve the problem that the traditional OBIA method is difficult to extract and utilize highlevel features. On the other hand, using OBIA technology can make up for the serious salt and pepper phenomenon. Based on super-pixels, we don't need to label samples pixel-by-pixel, which was also a time consuming and labor intensive process. And super-pixels are used as the most basic analysis element, which contributed to increase the accuracy.

STUDY AREA AND DATA
The study area locates in Port-au-Prince, the capital city of Haiti, where an earthquake with a magnitude of 7.0 on the Richter scale struck on 12 January 2010. The epicentre was located approximately 25 km west of Port-au-Prince. This strong earthquake caused extensive damage to buildings, facilities, and more than 20 million victims of the heavy losses. In Port-au-Prince a lot of buildings were damaged or even destroyed. We chose one typical and representative region with large area to construct the damage database, and two small regions to test the proposed approach, including one subarea of sampling region and one area far away from sampling region in Port-au-Prince.
Two post-earthquake images, including blue, green and red bands, were acquired from Google Earth on 17 January 2010. As shown in Figure 1, the image with large area (a), whose size was 4794*3781 was chosen as the basic image of collect damaged samples. Image (b) and (c) were respectively chosen as test and validation regions for damage detection. Image (b) was the subarea of Image (a), and Image (c) was the other badly damaged area in Port-au-Prince.

Super-pixel Segmentation:
Super-pixel segmentation algorithms group pixels into perceptually meaningful atomic regions, which can be used to replace the rigid structure of the pixel grid and greatly reduce the complexity of subsequent image processing tasks such as depth estimation, segmentation, body model estimation, and object localization. In this paper, VHR image is segmented into super-pixels, which are used as basic units to optimize damaged regions. As a widely used super-pixel method, the SLIC algorithm can output good quality super-pixels that are compact and roughly equally sized. SLIC is an efficient method which considers the color information of pixels and makes full use of the spatial information of pixels to cluster the pixels with similar color and close spatial distance (Csillik 2017).

Collection of Multiscale Damage Samples:
Training samples are the prerequisite and foundation for CNN. Generally, they are selected manually pixel by pixel. However, this method is time consuming and labor intensive, which is impractical for fast response to rapid earthquake damage mapping. In addition, due to the significant scale variance inherent in post-earthquake objects, they present different damage characteristics with different size. Thus, it is unreasonable to select the training samples using fixed scale. In this study, we used the super-pixel segments to construct samples database based on image interpretation at different scale (20 × 20, 40 × 40 and 60 × 60) (Shao et al. 2019). In this way we not only improved the efficiency of collecting training samples, but greatly increased the number and type of samples, which contributed to improve the classification accuracy. Figure 3 shows the selection method of multiscale by the use of super-pixel segmentation algorithm. As shown in Figure 3, the post-earthquake VHR image is divided into a series of small and uniform regions using a super-pixel segmentation algorithm. Then, for each class, a couple of image patches centred at its geometric centre pixel are extracted as training samples for it. In detail, samples of each class are first selected at three different sizes, and then all three types of samples are resampled to the smallest size. Finally, all training samples selected at their optimal scales form the training database.

Alexnet Model
As a deep learning method, CNN has improved the performance dramatically for a wide range of computer vision tasks such as image classification, saliency detection, object detection, and super-resolution (Long et al. 2017). AlexNet was firstly proposed as one new deep learning architecture in ILSVRC-2012 competitions which was very different from the state-of-art studies and showed high performance in object detection (Krizhevsky et al. 2012). Besides the increased depth of the network, ReLU, Dropout and LRN are first successfully used in AlexNet. Using GPU for operation acceleration is also another technology improvement. Limited to the net structure of AlexNet, the size of input data is capped as 227x 227x3.The net contains eight layers, including 5 convolutional layers and 3 fully connected layers. In each convolution layer, the stimulus function RELU and the local response normalization (LRN) process are included. The next three are fully connected layers. The first convolutional layer has 96 kernels, the kernel size is 11x11x3 size and with a 4 pixels stride. LRN, pool size is 3 x 3 with a strides of 2 pixels. The second convolutional layer has 256 kernels, its kernel size is 5x5x48. The third and fourth convolutional layers are behind without LRN. The third convolutional layer has 384 kernels, the kernel size is 3x3x256. While the fourth convolutional layer has 384 kernels, its kernel size is 3x3x192, and the fifth convolutional layer has 256 kernels, its kernel size also is 3x3x192. However, the fifth convolutional layer is behind with LRN. The first two fully-connected layers have 4096 neurons. In this study, we propose a flood detection method based on AlexNet architecture with our own image database. AlexNet structure is shown in Figure 4.
In the training stage, three pairs of patches with size of 20 × 20, 40 × 40 and 60 × 60 centred at each trained pixel are extracted and their three-channel RGB values are inputted into the AlexNet, where the small patch is for the first branch and the big one, resized to 40 × 40 before inputted into the AlexNet. Through training, a CNN classifier with 2 class predictions is generated for damage detection.

Classification Statistics within Each Super-pixel:
AlexNet predicts the presence and rough positions of target objects, but it has poor delineation for object borders, which caused a serious drop in damage detection accuracy. In a broad damaged area, intact building with small area may be influenced. There is a possibility to combine super-pixel segmentations and the coarse classification of AlexNet to improve the localization accuracy of objects. Firstly, we achieve the coarse classification of AlexNet. Then we used the super-pixel boundary and max voting method to get the optimization result, which avoid the situation that coarse segmentation tends to neglect the small objects and the fine segmentation inclines to generate the spurious regions in geo-object (Huang et al. 2019). The regions and region contexts information are well-preserved in the superpixel segmentation results.
In detail, we first conducted the damage detection at pixel level based on multilevel image database using AlexNet and the damage type of each pixel were predicted. Then, the last predicted class of each super-pixel was calculated using the following argmax function: (1) In this formula, Pk refers to the class that accounts for the majority of areas within that region.
is the predicted class of each pixel. S(k) is the number of pixels within super-pixel k.

Mask out of Vegetation and Shadow:
Vegetation and shadow were firstly extracted at pixel level. An image segment was then identified as a specific class using majority rules for image objects.
In this study we used one vegetation index for Google images (GVI) referring to the research results (Meyer and Neto 2008), which was calculated by Equation (2). In this formula G'=G/(R+G+B, R'=R/(R+G+B), B'=B/(R+G+B). A pixel with a GVI value higher than the selected GVI threshold was identified as vegetation; otherwise, it was identified as non-vegetation. ( Brightness, which is the average of pixel values from red, green and blue bands, can effectively separate shadow and non-shadow. A thresholding method was adopted in shadow extraction. Because of the significant difference, the optimum threshold was easily identified after few threshold tests.
After vegetation and shadow were extracted at pixel level, the majority rules at object-level were used to produce extraction results at object level. Specifically, if pixels of specific class within the object were the most, the image segment was identified as a specific class. Based on the extracted shadow and vegetation, they were masked out from the final cluster result to optimize the extraction result.

Accuracy assessment
The accuracy of the proposed method was evaluated based on visual comparisons. Referring to visual interpretation results based on higher-resolution remote sensing images which reached 0.14m, we used the overall accuracy, mistake rate and miss rate based on error matrix to evaluate the accuracy of extraction result. The visual interpretation results were shown in Figure 5.

Image Segmentation and Results Analysis
Super-pixels segmentation has a great influence on the collection of damage samples and optimization of detection results. To illustrate and validate the applicability and advantage of SLIC algorithm, we compared the SLIC segmentation result with the multiscale segmentation using the Fractal Net Evolution Approach (FNEA). During multiscale segmentation, we kept the typical parameters, such as segment number and compactness unchanged. The final segmentation result was shown in Figure 6.
In order to detect the damaged buildings of minimum size in the study area, the average length of sides for each super-pixel should be less than the minimum length of damaged regions. By repeating segmentation experiment, we set the segmentation number of image to 8067. Both of optimization and iteration process were repeated 20 times.
(a) (b) Figure 6. The segmentation results of SLIC (a) and FNEA (b) To reasonably evaluate the segmentation results of different segmentation methods, we used the Average Area (AA), the Standard Deviation of Area (SDA), the Average Length-Width Ratio (ALWR) and Standard Deviation of Length-Width Ratio (SDLWR) as the statistics index to evaluate the size and shape of the super-pixel object. As shown in Table 1, the SDA and SDLWR of segments computed using FNEA algorithm were higher compared with SLIC algorithm. Thus, SLIC algorithm can produce segments with more uniform grain size.

Damage Detection at Pixel Level:
As discussed in Section 3.1.2, in this study we collected damaged and undamaged samples in three different scales. Based on the trained model, the damage detection at the pixel level was conducted using the sliding window of 40*40. The detection results of damaged regions using AlexNet at pixel level were shown in Figure 7. Referring to the visual interpretation results, the detection accuracy was evaluated using the overall accuracy, false detection ratio and missed detection ratio.
Referring to Figure 5, the final detection result basically covered the damaged regions extracted from visual interpretation, although they showed a salt-and-pepper appearance throughout study areas. Concluded from Table 2, the overall accuracy was relatively high. This was mainly caused by the great quantity difference between damaged and undamaged numbers. Thus, the overall accuracy was not enough to actually reflect the final detection accuracy. False detection ratio and missed detection ratio were used to evaluate the final detection result.  Figure 7. The detection results of damaged regions using AlexNet at pixel level for Area1 (a) and Area2 (b) In conclusion, false detection ratio and missed detection ratio were high because of the salt-and-pepper appearance. Some undamaged regions of small area were detected as damaged regions. Moreover, AlexNet has poor object delineation for intact buildings, which caused the unclear boundaries between damaged regions and undamaged regions. Note: DA=damaged, UNDA=undamaged, OA=overall accuracy, FDR=false detection ratio, MDR=missed detection ratio.

Damage Optimization at Super-pixel Level:
Considering the poor object delineation for damaged regions, in this study we used the function argmax to classify each superpixel based on the detection result at object level. It should be noted that although the false detection ratio at the pixel level was high, the missed detection ratio was relatively lower. It indicated that the following damage detection at super-pixel level improved the extraction accuracy compared with the pixel-based approach to a certain extent. After mask out of the vegetation and shadow, the optimized results were showed in Figure 8. Detection accuracy was also evaluated using the overall accuracy, mistake rate and miss rate as shown in Table 3.
(a) (b) Figure 8. The detection result of damaged regions using AlexNet at super-pixel level for Area1 (a) and Area2 (b) As seen from Figure 8, the salt-and-pepper appearance was well controlled. The detected damaged regions basically covered the visual interpretation results. Moreover, the last result achieved the better object delineation for damaged and undamaged regions. As concluded from Table 2 and Table 3, there was a nearly 10% decline for false detection ratio as the low miss detection ratio was kept. The last optimized results were convincing enough.

Comparison with Other Methods
In order to illustrate the advantage of CSSAD approach we proposed in this study, we conducted the comparison experiments using the Traditional SIFT-BOW approach and Conventional CNN model, which were widely used to achieve the object detection in recent research. It should be noted that we used the same samples to train the model. And the damaged regions were detected using the sliding window of 40*40. After mask out of the vegetation and shadow, the last detection result was shown in Figure 9. Detection accuracy was shown in Table  4. We just used Area 2 to conduct the comparison experiment.
As seen from Figure 9(a), there was obvious salt-and-pepper appearance. Misclassification rate and miss rate were relatively higher, which indicated that the traditional SIFT-BOW approach was not effective enough to detect the accurate damaged regions.
In contrast, salt-and-pepper appearance was better in Figure 9(b). And the detected damaged regions basically covered the visual interpretation results. However, as Table 4 showed, compared with the detection result using AlexNet model, the detection accuracy still need to be promoted. Considering the difference in network structure, the fact that the deeper network produced more accurate result was verified. In addition, the detection result at pixel level was not accurate, which was obvious in Figure 7.

DISCUSSION
Rapid assessment of the building damage can not only provide a reliable reference for the emergency response team, but also provide a basis for the reconstruction of disaster areas after the earthquake. In this paper we proposed one CSSAA approach to detect the damaged regions from VHR images.
Considering the rich spatial information contained in highresolution images and the complexity of the various damaged geo-objects, selecting the right feature is the most basic task in image classification. Traditional method focused on the common spectral and texture features, which need to be empirically designed after many time consuming and labor intensive experiments. Moreover, these low level characteristics are too limited to recognize the damaged buildings with complex characteristics. High level characteristics need to be explored in damage detection. In this paper, the AlexNet, one of commonly used CNN model, was chosen for automatic feature learning of high-resolution images. With the hierarchical structure of the CNN, image features at higher levels can be automatically extracted. Moreover, CNN has shown satisfied robustness and accuracy in detecting complex targets. However, the traditional CNN model detect the specified target objects at the pixel level and required fixed size of input images, which ignored the scale variation of geo-objects. Thus, it was difficult to achieve the accurate detection of damaged regions. To this end, we combined the super-pixel segmentations with AlexNet classification results for both efficient multiscale sample selection and better extractions of object boundary.
As demonstrated in our experiments, the combination of image objects and deep features is quite effective. For one thing, it alleviates people from the time-consuming process of training sample selection and allows choosing more optimal training samples for each target class at various scales. For another, the combination of super-pixel segmentations and deep learning method provides accurate targets' localization as well as identifications in the multiscale classified images. In addition, the final classification is capable to capture various targets due to the consideration of multiscale information.

CONCLUSION
Efficiently and accurately acquiring information about damaged buildings after earthquake disaster is crucial for disaster response and rescue. In this paper we proposed a CSSAA approach, which combined the super-pixel segmentation and deep learning, to detect the damaged regions. The results showed that the methodology is able to accommodate the rapid damage mapping.
Considering the limitation of low-level features in damage detection, we used CNN model to extract the high-level feature.
Taking the scale variation of damaged regions into account, this paper presents one multiscale sample collecting method based on super-pixels. To avoid salt-and-pepper appearance and achieve better extractions of targets' boundary, we conduct the optimization process using argmax function at the super-pixel level. Generally, in this study combining OBIA technology with CNN to detect the damaged regions was verified as one efficient approach, which improved the accuracy in both localization and classification. Compared with other conventional methods, CSSAA achieve simple, practical, and appropriate for rapid highresolution damage detection.