A CNN-BASED FLOOD MAPPING APPROACH USING SENTINEL-1 DATA

: The adverse effects of flood events have been increasing in the world due to the increasing occurrence frequency and their severity due to urbanization and the population growth. All weather sensors, such as satellite synthetic aperture radars (SAR) enable the extent detection and magnitude analysis of such events under cloudy atmospheric conditions. Sentinel-1 satellite from European Space Agency (ESA) facilitate such studies thanks to the free distribution, the regular data acquisition scheme and the availability of open source software. However, various difficulties in the visual interpretation and processing exist due to the size and the nature of the SAR data. The supervised machine learning algorithms have increasingly been used for automatic flood extent mapping. However, the use of Convolutional Neural Networks (CNNs) for this purpose is relatively new and requires further investigations. In this study, the U-Net architecture for multi-class segmentation of flooded areas and flooded vegetation was employed by using Sentinel-1 SAR data and altitude information as input. The training data was produced by an automatic thresholding approach using OTSU method in Sardoba, Uzbekistan and Sagaing, Myanmar. The results were validated in Ordu, Turkey and in Ca River, Vietnam by visual comparison with previously produced flood maps. The results show that CNNs have great potential in classifying flooded areas and flooded vegetation even when trained in areas with different geographical setting. The F1 scores obtained in the study for flood and flooded vegetation classes were 0.91 and 0.85, respectively.


INTRODUCTION
Flooding are among of the most common and destructive natural hazards that cause social and economic disruption as well as causing loss of human lives. Besides the floods caused by heavy rainfall; coastal floods and rapidly melting snow and ice in mountainous areas should also be taken into account as hazard triggering factors. In addition, dam flooding can also occur after sudden and heavy rainfall and or due to infrastructure failure. Although different triggering factors exist, in flood events, a quick assessment of the event followed by a quick response is important in many aspects.
The flood extent mapping studies utilize spectral information from optical remote sensing data, synthetic aperture radar (SAR) data or a combination of these two together (Shen et al., 2019a). Although the data from optical sensors have been used for flood monitoring for a long while, they have significant limitations in flood assessment studies due to atmospheric conditions, e.g., cloud cover, and their inability to provide data at night (Clement et al., 2018). SAR sensors is a valuable data source to detect and monitor floods as they can provide data in all-weather conditions and also at night (Manavalan, 2017). On the other hand, water-like surfaces such as shadows, speckle effect, and geometric correction found in SAR data can be limiting factors in flood mapping studies.
Studies aimed at detecting floods from SAR data began to appear in the literature in the 1980s and have since been developed (Lowry et al., 1981). Thanks to the increase in SAR sensors and advances in remote sensing and computer vision algorithms in recent years, SAR data has been widely used in flood extent mapping and magnitude analysis. In this context, various methods have been used in the literature. These can be briefly listed as visual interpretation (Oberstadler er al., 1999), manual and automatic histogram thresholding (Nakmuenwai et al., 2017), supervised classification (Pulvirenti et al.,2013;Tavus et al., 2019Tavus et al., , 2020Tavus et al., , 2021, automatic segmentation , region growing (Matgen et al., 2011), fuzzy logic (Twele at al., 2016), change detection (Giustarini at al., 2012;Zhao et al., 2019), combination threshold and change detection  and interferometric SAR coherence (Chini et al., 2019;Li et al., 2019;Pelich et al., 2021).
Recently, there have been significant advancements in the supervised machine learning (ML) algorithms, especially the deep learning (DL) methods and the Convolutional Neural Networks (CNNs) (Jia et al., 2014). Unlike the pixel-based learning approaches, the CNNs can take advantage of the spatial structure of the target segment. Besides, an automatic feature presentation splits the feature space by reducing the uncertainties in the data. Due to these features, CNNs have become a method that has been successfully applied for flood mapping in recent years, as in many other application areas. Gebrehivot et al. (2019) investigated the potential of CNN method to detect floods from high-resolution unmanned aerial vehicle (UAV) images. As a result of the study using VGG-based fully convolutional network (FCN-16s), it was emphasized that it can successfully detect the flooded regions in the images in comparison to the conventional classification methods, such as FCNs and support vector machines (SVMs). Nemni et al. (2020) designed a CNN-based approach for extracting the flooded areas in Sentinel-1 SAR data. In the study, the flood masks were created with classical semiautomatic techniques, manual cleaning and visual inspections; and various CNN architectures were investigated. The methodology significantly reduced the time for producing the flood maps. The CNNs in the study achieved F1 scores of 91% and 92 % over the test dataset. Peng et al. (2019) proposed two different CNNs (PSNet-v1 and PSNet-v2) to predict the similarity between Planet Scope multispectral images with 3 m spatial resolution before and after flooding. Both architectures achieved superior performance with approximately 89% and 95% F1 score in 2017 Hurricane Harvey and 2018 Hurricane Florence, respectively. Similarly, Potnis et al. (2019) proposed an Encoder-Decoder neural network (NN) based on the Efficient Residual Factorized Convnet (ERFNet) for multi-class segmentation for analysing the urban floods from WorldView-2 data with 2 m spatial resolution. The ERFNet architecture proposed in the study provided an average Intersection Over Union (IoU) score of 0.484 and an overall accuracy value of 87%. Thus, it showed promising results in urban flood assessment with the satellite optical images. Rambour et al. (2020) introduced a SEN12-FLOOD dataset containing co-registered Sentinel-1 and Sentinel-2 images for flood detection and used the ResNet-50 network for flood mapping. With a state-of-the-art network (Resnet-50), the accuracy achieved with the SAR data was 75%, while the combination of RGB and SAR data provided 90% overall accuracy. Bonafila et al. (2020) introduced Sen1Floods11 dataset with Sentinel-1, and permanent and flood water. Permanent water and flood water surfaces were segmented using fully convolutional neural networks (FCNNs). The study results indicated that radar data with DL models can outperform the threshold-based algorithms for flood detection. In addition, the training data with automatic labels obtained from the optical images yielded to higher accuracy in comparison to the handlabelled scarce data. Konapala et al. (2021) investigated the potential of combinations data from Sen1Floods11 (Sentinel 1 and Sentinel 2), and Shuttle Radar Topography Mission (SRTM) data for generating accurate flood detection. As a result of the study evaluating the performance of the methodology with K-fold cross-validation using U-Net CNN, a median F1 score of 0.62 was obtained when only radar data were employed. A F1 score of 0.73 was obtained with the use of Sentinel-1 and altitude information together.
As a result of the literature review, it was clear that Sentinel-1 data has great potential in flood mapping, but have limitations in comparison to the optical data due to the nature of the flood events. On the other hand, the CNNs have been successfully used in many applications. Here, we applied a modified version of the U-Net architecture for multi-class segmentation to Sentinel-1 and SRTM data with 30 m resolution for accurate flood mapping. At the same time, we focused on further exploration of the potential of SAR data in identifying flood and flooded vegetation areas. In this paper, we present and discuss the initial results of the study.
In Section 2, the datasets used here, the pre-processing steps, label/mask generation, and U-Net architecture are explained. Section 3 presents the multi-class segmentation results and their accuracy metrics. Finally, the conclusions of the study and future work are presented and discussed in Section 4.

MATERIALS AND METHODS
In this section, an overall methodological workflow, the study area and the datasets, the details of the CNN architecture and the validation approach are explained.

Overall Methodological Workflow
The overall methodological workflow of the study is given Figure 1. The study sites can be named as Ordu, Turkey, Sagaing, Myanmar, Ca River, Vietnam, and Sardoba, Uzbekistan ( Figure 2). The sites were selected based on the availability of test data and the occurrence of recent major flood events. The input data includes pre-and post-event Sentinel-1 (S1) data and the elevation information from SRTM. A number of preprocessing methods were applied to S1 data to obtain the polarization information, to reduce the noise, and to remove systematic errors caused by the terrain. The input features used in the CNN architecture includes thus the S1 polarization data and the SRTM data. For the model training and validation, masks for flood and flooded vegetation classes were produced using a stepwise automatic thresholding approach with OTSU. The flood maps were produced with the CNN model and an accuracy assessment was performed using the test data in Ordu and Sagaing. Further details are explained in the following subsections.

Datasets
Here, Sentinel-1A C-band Interferometric Wide (IW) swath mode and Level 1 ground range detected (GRD) products were utilized. The products have vertical (V) and horizontal (H) polarization (i.e., VV+VH) information with a ground sampling distance (GSD) of 10 m. Datasets for each area were obtained from the ESA Copernicus Programme (Copernicus, 2020). The S1 data used in the study were chosen based on the acquisition dates considering the flood occurrence (before and after flood). The characteristics of the study data and the ground conditions, such as wet or dry, are summarized in Table 1.

Feature Preparation Workflow
The input features involved in the CNN architecture include the VV and VH polarization and the SRTM digital elevation model (DEM). The multi-class segmentation approach classifies the pixel as non-flood, flood, and the flooded vegetation. In order to determine the flooded (FL) area and flooded vegetation (FV) classes and thus to form the mask pixels to be utilized in the CNN architecture, the processing steps given in Figure 3 were applied.
The mask data to be used in the model training phase were produced with the approach listed below. This approach is basically based on the data preparation stages that are part of the work carried out by Nemni et al (2020). The main difference here is that the threshold values determined for the classes are obtained automatically from the Multi-OTSU threshold algorithm instead of manual detection (Liao et al., 2001). In order to generate labelled mask data for model training, the following steps were applied to the data from Sardoba and Sagaing regions denoted as DS 1-4 in Table 1, respectively.
 Before using the S1 images, some necessary preprocessing steps, such as radiometric correction, image speckle filtering, and orthorectification, were applied. Details on these processes can be found in Tavus et al. (2021).


The OTSU threshold method was applied to each image (pre-& post-event VV and VH) in order to determine the flood pixels.
 Based on the flood-induced change in the field, difference VV and VH images were produced by taking the differences between the thresholded pre-& post-event VV and VH data. At this stage, VV and VH flood masks were obtained with the values as 1: representing the flood and 0: representing the background. Afterwards, the pixels labelled as 1 in both of the masks were recorded as a flood mask.


In order to determine the FV pixels, FL pixels were extracted from the difference VV and VH images with applying flood mask produced in the previous step.


The FV pixels were produced by applying the OTSU threshold method to difference images, which do not contain flood pixels anymore.


As in the generation of FL pixels, the final FV mask was produced by taking the overlapping pixels of the VV and VH vegetation pixels at this stage. In this mask, 0 represents the background while 2 label represents the flooded vegetation.
 Majority filters applied to FL and FV masks and combined as a single mask. Finally, opening followed by closing morphological filters were also applied to the data in order to remove the elements that could not be removed by the majority filter, such as holes, noise, and borders remaining in the combined mask. In Figure  4, FL, FV, and merged masks of Sardoba study area are given.
Input data for all regions were created by stacking pre-& postevent VV and VH, which were produced with the preprocessing in the beginning of mask production stage; and the SRTM DEM data correspond to the area and then compress to 8-bit. Finally, the data preparation process was completed by arranging the 5band input data as 256*256*5 and the mask data representing 3 classes as 256*256*1 input size. As a result of this process, a total of 1086 images were generated and randomly allocated to the train, validation and test datasets, with have sample percentages of 72%, 18%, and 10%, respectively (Table 2).

CNN Architecture for Pixel-based Classification
In this study, a modified version of U-Net architecture was used for the class segmentation task. The modifications were applied by using ResNet-50 model in the encoder part, removing upsampling layers, and replacing them with the transposed convolution layers in the decoder part (He et al., 2015). The input images included 5 channels, and no pre-trained weights were available ( Figure 5). Therefore, all layers were initialized with the Glorot uniform initializer (Hanin and Rolnick, 2018). Table 3 shows the model configurations.  ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume V-3-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France

RESULTS AND DISCUSSIONS
Here, the statistical results obtained from the CNN predictor are presented and map results from Sardoba, Ordu and Ca River are discussed. The model trained with the data of Sardoba and Sagaing regions has been tested in the Ordu and Vietnam areas.

CNN Model Accuracy
The combination of Categorical Cross Entropy and Dice index was used as loss function and Adam method used as an optimizer. F1-score was used as accuracy metric for the performance evaluation. Table 4 shows the model results obtained from model training and validation samples. Table 5 shows the F1-Score of each class calculated using the model predictions obtained from test data.
The results presented in Table 4 show that the test and validation accuracies are similar. However, the test results in Table 5 presented for the individual classes are better. The FL class could be predicted with a higher accuracy than the FV class. The FL class prediction performance is comparable with the results of the recent CNN-based flood mapping studies in the literature.  Table 5. F1-Score of each class in the test data. Figure 6 shows test data parts used as ground truth and the model predictions in Sardoba site. It must be emphasized that the prediction results were obtained from the test split, which was 10% of all samples.

Ordu, Turkey Test Site Results
The flood maps of Ordu and Ca River (DS 5-8) test sites were produced with the model trained in Sardoba and Sagaing (DS 1-4) regions. The Ordu test site was analyzed for the subareas of Terme and the Yesilirmak River region of Samsun Province. Both provinces are located in the northern part of Turkey, in the Black Sea Region. The study area is complex due to rugged topography and mixed land cover with inland water bodies (i.e., streams and rivers), urban settlements, open terrain, and agricultural and dense forest areas. In the research conducted by Kocaman et al. (2020), Ordu flood map was generated from Sentinel-1 and Sentinel-2 data with Random Forest (RF) classification algorithm. In Figure 7, the FL and FV pixels produced from OTSU threshold algorithm, CNN and RF are shown together with zoom-in views. Please note that the FV and the FL classes were merged in one class in the RF results and marked as flooded area in Figure 7.   Figure 7 shows the results around the Yesilirmak River. As can be seen from the RF result, the river is densely surrounded with agricultural and forest lands. As a result of the comparison of the CNN and RF results, an essential outcome of the model was that the pixels around the river were also labelled as FV ( Figure  7), which shows the high prediction performance of the proposed method.
The results of Terme, another subarea of Ordu site, are shown in Figure 8. In this subarea, although the OTSU method could not produce useful outputs, the CNN results were similar to the RF results, which employed optical data as input feature as well. Again, the FV and the FL classes were represented with a single class (flood area) in the RF results in Figure 8.

Ca River, Vietnam Test Site Results
Another test was applied on the data of the flood event that occurred in Vietnam on September 6, 2019. This area represents a flood event that reaches much larger extents in comparison to the Ordu test area and spread to a relatively smooth topography.
As there is no previous study or external reference data for this area, the CNN results were visually compared with data from the Flood Mapping Tool (FMT) published by Hamid Mahmood (2022) (Figure 9).

CONCLUSIONS AND FUTURE WORK
In the present study, a CNN architecture was proposed for the mapping of flooded areas (FL) and flooded vegetation (FV) from Sentinel-1 data and SRTM DEM. Four different test sites, i.e., Ordu, Turkey, Ca River, Vietnam, Sardoba, Uzbekistan, Sagaing, Myanmar, with major flood events were utilized for this purpose. While the model training dataset was produced from the Sardoba and Sagaing test sites, further evaluations were carried out by using external references in Ordu and Ca River. The training dataset was split as train (72%), validation (18%) and test (10%) samples. The U-Net architecture on ResNet-50 backbone was implemented for the multi-class segmentation.
The results show that the F1 scores obtained from the test samples were 0.91 for the FL and 0.85 for the FV classes. The present study is the first one for detecting FV class with a CNNbased classifier. The visual assessments carried out in Ordu and Ca River also show high quality output of the method. The results from a subarea of Ordu, the Terme, also show that SAR data have potential for the detection of floods in the urban area.
On the other hand, the Ordu site has rugged topography, which indicates that the use of SRTM DEM as input feature can also be recommended for accurate flood mapping in such areas.
As future work, it is planned to improve the results with higher resolution SAR data and further tuning of the proposed methodology. As an example, data augmentation techniques for SAR data can be employed in order to investigate the influence of such techniques on small datasets. Fine tuning can also be applied to the CNN model trained in this study to assess its performance on different flood areas in the world. In addition, modifying the CNN architecture to utilize features from both SAR and optical sensors may improve the overall results.
Furthermore, different deep learning architectures for segmentation such as LinkNet, PSPNet, etc. can also be combined with different encoders, (e.g., SE-ResNeXt50, seresnet34) in order to assess impact of the backbones and architectures on the results by using the same dataset. Finally, the proposed approach can be applied to more datasets at different geographical locations having diverse characteristics for further validation.