ON THE EFFECT OF DEM QUALITY FOR LANDSLIDE SUSCEPTIBILITY MAPPING

: Generating precise and up-to-date landslide susceptibility maps (LSMs) in landslide-prone areas is important to identify hazard potential in the future. The data quality and the method selection affect the accuracy of the LSMs. In this context, the accuracy and precision of the digital elevation models (DEMs) used as input are among the most important performance elements. Therefore, the influence of DEM accuracy and spatial resolution in producing LSMs was investigated here. A high accuracy DEM with 5 m grid spacing produced from aerial photographs and the EU-DEM v1.1 freely accessible from Copernicus Land Monitoring Service with 25 m spatial resolution were used for producing two different LSMs using the Random Forest (RF) method in this study. The RF method has proven success for this purpose. A total of eight conditioning factors, which include topographical and geological features, was used as model input. The landslide inventory was derived with the help of aerial stereo images with 20 cm and 30 cm ground sampling distances. The performances of the LSMs were assessed with receiver operating characteristics (ROC) area under curve (AUC) values. In addition, the results were compared with visual inspection. The results show that although the AUC values obtained from the aerial DEM (0.95) and EU-DEM v1.1 (0.93) were comparable; based on the visual assessments, the LSM obtained from the higher resolution DEM was found more successful in detecting the landslides and thus exhibited better prediction performance.


INTRODUCTION
Landslides are the downward and outward movement of a slope with a rock or artificial fill material under the influence of gravity, slope, water and other external forces and is one of the most important natural hazards (IAEG Commission on Landslides, 1990). They can be triggered by other hazards such as earthquakes (Lee and Evangelista, 2006;Karakas et al., 2021a) and heavy rainfalls (Dikshit et al., 2020;Kocaman et al., 2020), as well as by anthropogenic activities (Sevgen et al., 2019;Yanar et al., 2020). The effects of landslides can be farreaching, including loss of lives and property, damage to roads, water, electricity, gas, sewerage, landscape, environment and transportation. The statistical records published by AFAD (Disaster and Emergency Management Presidency of Turkey) indicate that in total 23,393 landslides have occurred in Turkey from 1950 to 2020 (AFAD, 2019;. Therefore, considering the losses sourced by landslides, it is of major importance to produce landslide susceptibility maps (LSMs) to carry out human activities accordingly.
The LSMs are a complex matrix formed by a combination of scale-dependent parameters such as lithology, altitude, slope, aspect, the other topographic features (Mahalingam and Olsen, 2016); such as plan and profile curvatures, topographic wetness index (TWI) and stream power index (SPI), etc. These topographic features are often computed by using Digital Elevation Models (DEMs). Therefore, the DEM quality is essential in LSM production, and the height accuracy and the spatial resolution are indicative quality parameters for LSM production studies (Chen et al., 2020;Azeze, 2021). The DEMs obtained with different sensors and techniques can be used for LSM production.
The main purpose of this study was to evaluate the effect of the DEM quality, in particular the spatial resolution, on LSMs. Two DEMs obtained from different sources and having different spatial resolutions were used in the study. The high-resolution DEM (5 m) was produced from aerial stereo images. The lower resolution DEM, the EU-DEM v1.1 with 25 m resolution, is freely available from Copernicus Land Monitoring Service (CLMS, 2021). It was previously used in another study for LSM production in a large region (Can et al., 2021) and was found adequate for the study purposes. For the LSM production, the random forest (RF) method, which has been frequently used in recent studies and provides high accuracy, was employed. The accuracy of the LSMs were assessed by using model statistical outputs, such as the receiver operating characteristics (ROC) curve and the area under the curve (AUC), and by visual inspection of the output maps by experts.

Study Area and Datasets
The study area is selected within the Malatya and Elazig Provinces in Turkey. The site is within the East Anatolian Fault Zone (EAFZ) and has high seismicity and active tectonism (Karakas et al., 2021a). Here, the lithological units have weak shear strength properties. Thus, it is prone to landslides. The site was previously assessed for its landslide inventory (Karakas et al., 2021a) and the LSM production accuracy by using different machine learning (ML) methods (Karakas et al., 2021b(Karakas et al., , 2022. The location of the study area is illustrated in Figure 1. As can be seen in Figure 2, there are seven lithological units, which are alluvium (1), unconsolidated gravel, sand, silt, clay (2), neritic limestone (3), maden complex (4), magmatic rocks (5), puturge metamorphites (6) and marble (7) in the study area.  The aerial photogrammetric products acquired by the General Directorate of Mapping (GDM), Turkey during regular mapping campaigns in 2017 and 2018; and as part of a disaster mitigation effort soon after the Elazig Mw 6.8 January 24, 2020 earthquake (Karakas et al., 2021a), which caused the destruction of buildings and fatalities , were used to produce the high resolution DEMs and to prepare the landslide inventory (before and after the earthquake). Since the regular flight campaigns took place in different years for the two provinces, the study area was processed for LSM production in two parts. The total size of the study area is approximately 488 km 2 . The EU-DEM v1.1 was used as the lower resolution DEM. The dataset details are explained in detail in the following sub-headings.

High Resolution DEMs
Two different sets of aerial stereo images acquired in 2017 and 2018 were used for the generation of the high-resolution DEMs in the study area. The images were taken with 80% and 60% overlaps in forward and lateral directions, respectively, and with 30 cm ground sampling distance (GSD). The interior and exterior orientation parameters were estimated via bundle block adjustment and provide in principle a minimum of 22.5 cm planimetric and 30 cm height positioning accuracy based on the adjustment reports (Karakas et al., 2021a). The aerial DEMs were produced sparsely with 5 m grid spacing using Agisoft Metashape Professional (2021). Even though a higher point density was possible, due to the computational limitations, 5 m grid interval was preferred for the final DEM. Here, each point in the DEM has a nominal positioning accuracy similar to the adjustment report (i.e., 30 cm) since most of the study area is open terrain. The post-earthquake aerial stereo dataset with 20 cm resolution was used for the assessment of the LSM results only. The characteristics of the stereo datasets obtained from two different years are presented in Table 1.  Table 1. The characteristics of the photogrammetric datasets used in the study.

EU-DEM v1.1
The EU-DEM v1.0 was produced within the initiative of European Environment Agency (EEA) member and cooperating countries (EEA, 2021); and distributed by Copernicus Land Monitoring Service (CLMS, 2021). The product was obtained by combining the Shuttle Radar Topography Mission (SRTM) and ASTER GDEM datasets with a weighted average method and upgraded the EU-DEM v1.0. It has 25 m spatial resolution and 7 m vertical accuracy (CLMS, 2021). The EU-DEM v1.1 dataset contains 27 tiles. The study area falls into the tile with the ID number of E60N20. This tile was clipped according to Malatya and Elazig parts to be used in the next processes. Compared to the SRTM 30" DEM, it was chosen for this study because of the quality (e.g., spatial resolution and higher elevation accuracy).

Landslide Inventory
The landslide inventory used in the study was identified and manually delineated by experts by visual interpretation of highresolution surface models and orthophotos mentioned previously. There were 247 landslide polygons in total in the inventory. The landslides were classified as inactive (Class-1: C1) and active (Class-2: C2) mass movements according to their activity types (Karakas et al., 2021a). While 27% (67) of the landslides in the study area belongs to C1, 73% (180) belongs to C2. The landslide area coverages are between 267 m 2 and 1.82 x 10 6 m 2 in size. The landslide inventory of the study area is presented in Figure 1.

Landslide Conditioning Factors
A total of eight conditioning parameters were used in the study.
The topographic features such as altitude, slope, aspect, curvatures (plan and profile), TWI and SPI were obtained from both DEMs. The features were used frequently in the literature (e.g., Gokceoglu and Ercanoglu, 2001;Gokceoglu et al., 2005;Nefeslioglu et al., 2012;Sevgen et al., 2019;Can et al., 2021). In addition, the lithological units were used as conditioning factors. The input features and their data sources used are summarized in Table 2. The statistical information of the parameters for the model training area (shown with the red square in Figure 1) and for the landslide polygons in the model training area are given in Tables 3 and 4, respectively.    Figure 1).

Landslide Susceptibility Mapping Approach
Based on the recent literature, the data-driven ML methods are frequently used for LSMs production. Among those, the RF was preferred due to its applicability to different problems and remarkable performance in LSM production. The method uses multiple decision trees for training and prediction. It was proposed by Breiman (2001), and assembles bootstrap and random subsamples (Liu et al., 2021). In the RF method, a training set is selected from the whole sample set randomly. The individually created decision trees are combined to form a RF.
In order to obtain accurate and reliable results, it is important to select the appropriate parameters in the model. For hyperparameter optimization of the model, the random search method, which is a more effective method than the grid search (RandomizedSearchCV, 2021;Bergstra, 2012) was utilized in this study. The optimization process was performed in both LSM predictions with the two input DEMs. The values obtained in both optimization processes were similar and were jointly applied ( Table 5). The algorithms were performed by using the scikit-learn library (Scikit-learn, 2021) in Python environment.  The area denoted with the red square in Figure 1 was used for model training and validation. The tests were performed by using the landslide data outside the red square. The landslide polygons within the region inside the red square were used as landslide samples in the model training. The non-landslide samples were randomly selected from the areas outside the landslide polygons within the same red square. For the model training and validation, the samples were split as 80/20. The ratio of landslide and non-landslide samples was selected as 1:1.5. The landslide and non-landslide pixel counts are summarized in Table 6. Since there are eight input features in the study, a total of 280,424 pixels consisting of landslide and non-landslide pixels were used.

Accuracy Assessment and Validation
The performances of the RF classifiers were evaluated by using the test dataset. The predictive performances of the models were investigated with the ROC curve using the AUC value. A visual comparison of the result maps obtained using datasets with different resolutions was also carried out based on the part of the landslide inventory that was not used in model training. In addition, the importance of input features in model prediction was assessed by feature importance based on mean decrease in impurity (MDI). Here, the feature importances were computed as the standard deviation and mean of accumulation of the impurity decrease within each tree (Scikit-learn, 2021). The importance of each feature was obtained as the sum of the number of splits containing the feature in proportion to the number of samples it splits. This method sorts the numerical features to be the most important features. Finally, the values in final LSMs were classified into five classes with equal interval probabilities in ArcGIS software from ESRI.

RESULTS AND DISCUSSIONS
Here, the influences of both DEMs, i.e., the high-resolution aerial DEM and the EU-DEM v1.1, were assessed based on the statistical results obtained from the RF classifier, the feature importance analyses, and the visual assessment of the landslides outside the model training region. Figure 3 shows the predictive performance results for the two datasets with different DEMs. The AUC values were equal to 0.93 and 0.95 for EU-DEM v1.1 and aerial high-resolution DEM, respectively. In addition, the overall accuracy values of the models were 85% and 87% for the EU-DEM and highresolution DEM, respectively. Based on the statistical results, it was observed that the high-resolution DEM data exhibited slightly higher prediction performance.

Importance of Predictor Features
The relationships between the model predictions and predictor features are shown in the bar plot in Figure 4(a-b) for both datasets. The horizontal axis in the bar plot shows the mean decrease in impurity value of each predictor feature given on the vertical axis. The importance of a feature basically refers to how much a feature is used in each tree of the forest. When Figure 4 is analyzed, the eight predictor features are ranked by feature importance measures based on MDI. The predictor feature with the higher percentage value has more importance in model estimation. Therefore, it can be observed that the altitude (22%), aspect (19.2%), slope (18.8%) and lithology (15.9%) features were found to be more important than the other features ( Figure 4a)

The LSM Results and Visual Comparisons
The LSM results produced from both datasets are presented in Figure 5(a-b). The predicted landslide probability values were classified in five groups with equal interval probabilities. The graph of the landslide probability distributions for all five classes are presented in Figure 6(a-b) for the two sub-regions of the study area. The comparison results in the study provide evidence on how DEM quality and resolution affect the LSM results both qualitatively and quantitatively. In Figure 6, it can be interpreted that the landslides in the study area can be defined better in the 5 m resolution DEM compared to the 25 m resolution EU-DEM v1.1. According to the maps given in Figure 5, it can be said that both models produce similar spatial patterns. The similarities in histogram distributions of the probability values denoted in five classes ( Figure 6) also confirm this result. However, with the higher resolution DEM, it was possible to obtain a more detailed LSM, which can be useful for spatial planning. Although similar statistical values (i.e., AUC values) were obtained from both datasets, the map obtained by using the EU-DEM remain as a coarser output. Based on the results, it can also be said that the larger landslide regions could be identified from both datasets successfully. On the other hand, the smaller activities could be detected better with the higher resolution (and accuracy) aerial DEM.
A number of landslides were presented at a larger scale in Figures 7 and 8 for visual comparison. In Figure 7, the landslides were selected from the model training region, whereas in Figure 8 only landslides in the test region were shown. Again, as can be seen in the Figures, relatively smaller landslides can be better identified in results obtained with highresolution DEM (5 m). Details may be compromised at low resolution (25 m). When LSMs in areas not seen by the model (i.e., test area) were analyzed, it was seen that the results obtained with the 5 m DEM are more sensitive (see Figure 8cd). These findings highlight that the higher spatial resolution and accuracy is favourable for the LSM production.

CONCLUSIONS AND FUTURE WORK
In this study, the effect of DEMs with different spatial resolutions and height accuracy for landslide susceptibility mapping in a landslide-prone area was investigated. For this purpose, the DEMs produced from very high-resolution aerial photographs with 30 cm GSD and EU-DEM v1.1 with 25 m grid spacing were used. In order to compare the influence of the DEM resolutions, the LSMs were produced by the RF method.
In the predictive performance results, the RF method showed 0.95 and 0.93 values for high resolution DEM and EUDEM v1.1 results, respectively. In visual comparisons, the importance of the DEM resolution and accuracy could be observed better. A suitable DEM resolution and the method selection for the study area and the study purposes are important for achieving reliable and accurate results. In addition, the accurate landslide inventory extracted from high-resolution data is important for more accurate susceptibility maps as even small-sized landslides can be determined from these data.
As future work, apart from DEM source and resolution, the other factors, which have an impact on the accuracy and reliability of LSMs will be evaluated. Among those, the LSM method and its parameters, the input features, and the landslide inventory quality can be listed.