PREDICTION OF DEFORMATION CAUSED BY LANDSLIDES BASED ON GRAPH CONVOLUTION NETWORKS ALGORITHM AND DInSAR TECHNIQUE

: Around the world, the occurrence of landslides has become one of the greatest threats to human life, property, infrastructure, and natural environments. Despite extensive research and discussions on the spatiotemporal dependence of landslide displacements, there is still a lack of understanding concerning the factors that appear to control displacement distribution in landslides because of their significant variations. This paper implements a Graph Convolutional Network (GCN) to predict displacement following the Moio della Civitella landslide in southern Italy and identify factors that may affect the distribution of movement following the landslide. An interferometric technique, known as permanent scatter interferometry (PSI), has been developed based on Synthetic Aperture Radar (SAR) satellite imagery to derive permanent scatter points that can be used to represent the deformation of landslides. This study utilized the GCN regression model applied to PSs points and data reflecting geological and geomorphological factors to extract the interdependency between paired data points, resulting in an adjacency matrix of the interval [0, 0,8). The proposed model outperforms conventional machine learning and deep learning algorithms such as linear regression (LR), K-nearest neighbors (KNN), Support vector regression (SVR), Decision tree, lasso, and artificial neural network (ANN). The absolute error between the actual and predicted deformation is used to evaluate the proposed model, which is less than 2 millimeters for most test set points.


INTRODUCTION
There are many geological hazards in the world, one of the most common, including landslides.Frequently, landslides result in the destruction of the structure and infrastructure of villages and towns, creating a danger to residents as well as causing significant property damage (Miele, 2021), (Del Soldato, 2019).A great deal of attention has been paid to monitoring and predicting disasters by industry and academic institutions (Bozzano, 2011), (Gao, 2022).The city of Moio della Civitella (Salerno Province) is among the sites with the greatest concentration of landslides in the world, which damaged its urban settlement (Infante, 2019), (Di Martire, 2015).A number of factors contribute to the difficulty of predicting landslides and mapping them under settlement cover, including slow movement, human intervention, and not considering the interdependence between geological and geomorphological features when entering as features to Machine Learning and Deep Learning Algorithms (MLA & DLA) (Di Luzio, 2022).By analyzing and predicting geological hazards, these severe effects can be mitigated.This is the most valuable dataset that can be used to evaluate and predict the progression of future landslides based on the displacement throughout time (Jiang, 2021).Urban landslides can be detected and monitored using satellite remote sensing data, which has overcome many of the challenges associated with it.In recent years synthetic aperture radar imaging has been widely applied in this context to provide multi-temporal representations of maps of deformation rates which can be used for identifying landslides under settlement cover (Macchiarulo, 2021), (Costantini, 2017).To accomplish this, SAR satellite data (X-band imagery acquired in the COSMO-SkyMed mission) was analyzed by the application of the Differential Interferometry SAR (DInSAR) technique.DInSAR has been used to determine targets representing landslide deformation over an urban settlement in the case study.Various types of Machine Learning Algorithms (MLA), such as Logistic Regression, Decision Trees, and Support Vector Machines (SVMs), have been implemented for precise and timely landslide prediction (Hong, 2016), (Liu, 2021).Also, several Deep Learning-based models for predicting landslides, such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), have been implemented to determine the likelihood of landslides occurring at a given location (Hajimoradlou, 2019), (Jiang and Chen, 2016).It has been demonstrated that Graph Neural Networks (GNNs) can be employed for various applications in this field (Hua, 2021).
GNNs can learn spatial interdependency for the nodes of a graph, leading scientists to real predicting landslides.As a mention to the recent publications which discuss the implementation of the GNNs in different ways, the following can be noted in both newly published articles.According to (Kuang, 2022), one method for identifying the neighbors of a point is to use K-Nearest Neighbours (KNNs) based upon a set of coordinates.With the help of this algorithm, it is going to be possible to detect the actual neighbors of each point as well as a value that represents how much dependency exists between each pair of points.As part of another study, (Jiang, 2022) applied the current feature approach to determine the adjacency between paired data points in a particular area and implemented a GCN-GRU for seven sites in the area.A GCN was used in this study to detect the interdependency between paired points by correlation distance in the timeframe of SAR data and then predict landslide displacements for Permanent Scatter points (PSs) in the future.

CASE STUDY
In the southern Italian province of Salerno, the Moio della Civitella landslides are located in the Cilento, and Alburni National Parks, which are European and Global Geoparks.It affects the Crete Nere of the Saraceno Formation, which largely crops out in the region.The main constituents of this formation are agrellites with carbonate intercalations and siliciclastic arenites weathered at the outcrop.The geological characteristics of this area are similar to those of the southern Italian Apennines, which are highly tectonized (diffuse, pervasive discontinuities, intense fractures, extremely variable bedding, etc.).The Quaternary sequences are made up of heterogeneous debris encased in silty-clayey matrixes (Di Martire, 2015).Several factors contribute to instability at Moio della Civitella, including differences in lithology and hydrogeological behavior of rocks forming a slope.This area is located between 600 and 200 m a.s.l. and is characterized by a hilly terrain with low gradient slopes, heavily influenced by erosion and gravitational forces.
As can be seen from the landslide map, the most significant slope movements directly impact populated areas, lifelines, and the main routes of communication in the region.In accordance with (Cruden, 1993), the general typologies are flows and rotational and translational slides (Fig. 1).As a result of these slope movements, it was believed that the leading cause of the slope movements would be the result of ancient phenomena that affected large portions of the slopes, if not the entire extent of the slopes.In the same way, landslides that directly affect urban areas are also negatively impacted by this phenomenon.As a result of the presence of such landslides, the area of Moio della Civitella has been extensively investigated using topographic measurements, inclinometers, and GPS networks (Matano, 2019).

SAR data
Recent advances in satellite remote sensing have resulted in significant progress in identifying and monitoring urban landslides.A wide range of methods based upon synthetic aperture radar imagery has been applied in this context, resulting in multi-temporal deformation rate distribution maps that help identify landslides under settlement cover and for retrospective and operational monitoring (Herrera, 2011).An analysis of the COSMO-SkyMed missions' X-band imagery was conducted in this study.These satellite products are particularly suitable for determining the location of landslides in urban areas because they possess a high spatial resolution and a short revisiting period.DInSAR data obtained from (Infante et al., 2019) were used in this study.In detail, COSMO-SkyMed image stacks were analyzed, 66 descending images for the 2012-2016 time span (Infante, 2019) (Table 1), proper to implement the GCN for obtaining the best interval for correlation distance and valid at implementing machine learning regression algorithms such as LR, KNN, SVR, Decision tree, lasso, and ANN for prediction of landslides deformation in this case study.

DInSAR
As a result of the DInSAR technique(Gabriel, 1989), ground motions associated with subsidence, landslides, earthquakes, and volcanic phenomena have been measured and monitored(Di Martire, 2014).However, the DInSAR method is subject to spatial and temporal decorrelations and delays caused by atmospheric effects and orbital and topographic errors (Colesanti, 2003).Since the early 2000s, several algorithms have been developed to track ground deformations with high accuracy and evaluate historical deformation series.In the long run, this development has allowed us to overcome some of the inherent limitations of the algorithms (temporal and spatial decorrelation, atmospheric disturbances, as mentioned above).As a result of the DInSAR technique, the precision of the results has improved to about 1-2 mm/year and 5-10 mm/year for rate maps and time series of deformations, respectively (Tizzani, 2007), (Trasatti, 2008).

Graph Convolutional Networks (GCNs)
The purpose of this study was to compare different conventional machine learning and deep learning algorithms with a particular type of graph neural network (GNN) called graph convolutional networks (GCN) to find out whether there are any interdependencies between paired data points when the velocity can be predicted based on geological characteristics (Zhang, 2019).Working based on a filtering system passing through the nodes of a graph, GCNs extract new features on a graph ( , ) G V E = that has a N×D feature space i  for every node i, where N is the number of nodes and D is the number of features; moreover, An N×N zero-one adjacency matrix is used to represent the interdependency between each pair of nodes as follows: 0 1 ... 0 1 1 ... 1 1 1 ... 0 Each element of A is either 1 or 0. The value ij A is 1 if there is a path from i to j, and it is 0 otherwise.The hidden layer of GCN at a time ( ) 1 l + can be represented as follows: Where ( ) 0 HX = the input nodes and L is the number of layers.
The forward propagation rule can be represented as follows: where ( ) l W is the trainable weight matrix for the l-th layer and ( ) .
 is an activation function such as the Relu.Finally, the normalized version of forward propagation equation can be designed as follows: where the adjusted version of the adjacency matrix is normalized as a positive definite matrix , where Â A I =+.The reason for adding an identity matrix to A is that all the feature vectors the target node's neighbors are summed up except the node itself, so an entity matrix is added to aggregate information from the target node as well.
The new embedding node features can be fed into a loss function to implement the forward propagation and a backpropagation strategy can be applied to train the weight matrix ( ) l W using an optimized version of gradient descent.In this study, GCNs are applied to create a regression model to predict the velocity based on the geological features consisting of elevation, slope, general curvature, NDWI, TWI, SPI, geologic map, land use, flow direction, plan curvature, and profile curvature.The interdependency between each pair of points was evaluated by a zero-one adjacency matrix, and a new hyperparameter was set to find the best adjacency matrix that improves the accuracy of model predictions.First, the interdependency between each paired data point was computed based on correlation distance which obtains a value in the interval [0, 2], then a new hyperparameter is defined as the upper bound of the interval.Therefore, the proposed hyperparameter was added to the hyperparameter tuning process, improving the model's accuracy.The following formula can obtain the correlation distance between a pair of points: The closer the correlation distance to zero, the higher dependency exists between X and Y; therefore, pairs with a correlation distance less than or equal to c obtain 1 the adjacency matrix, representing significant interdependency, and the rest of the matrix is filled with 0. Hence, the hyperparameter c was the proposed approach in this paper to detect how data points in a particular area can be affected by each other based on the value of their features.Furthermore, some of the most widely used machine learning regression algorithms such as linear regression (LR) (Maulud and Abdulazeez, 2020), Knearest neighbours (KNN) (Sarker, 2021), Support vector regression (SVR) (Sharifzadeh, 2019), Decision tree (Pekel, 2020), lasso and artificial neural network (ANN) (Lee, 2017) were implemented on the dataset as the rivals of our proposed strategy since all of them follow the rule of independence between training examples (Sarker, 2021).There is a core assumption of independence between training examples (data points) in all conventional machine learning algorithms.This fact helps us detect the existence of interdependency between paired data points if they have come up with poor evaluation metrics.Four commonly used evaluation metrics represent the validity of regression models.R-squared is the ratio of label variation that can be described by the set of features; therefore, the closer it is to 1, the more the predictions can be close to the real values of labels (Miles, 2005).Mean square error (MSE) is an error metric that provides the mean of squared differences between predictions, and the real values of labels and root mean squared error (RMSE) is the second root of MSE (Das, 2004).Mean absolute error is another evaluation metric defined by the mean of absolute differences between predictions and real values of labels (Qi, 2022).In all error metrics, instead of Rsquared, the optimal level of fit occurs when they are close to zero.

RESULTS AND DISCUSSION
In this study, the GCN method was employed to determine the amount of non-Euclidean interdependency between any pair of points within the study area.The velocity of landslides, obtained by spatial and temporal processing from 2012 to 2016, was used to represent landslide velocity in a regression model.Taking twelve predisposing factors into account, GCN was modelled to determine the best correlation distance between data points, which was deemed the most critical hyperparameter for the prediction task in the future.It was found that the interval [0, 0,8) was the most effective range of correlation distance for creating an adjacency matrix after setting the hyperparameters since it provided the best evaluation metrics among other scenarios after setting the hyperparameters.All MLA, DLA, and GCN models are evaluated based on the evaluation metrics mentioned above (Table 2).All evaluation metrics show that the GCN model outperforms other methods.
The results have shown that GCN was by far the best algorithm based on regression evaluation metrics, representing strong dependency between data points and conventional machine learning algorithms performed poorly because of the core assumption of independency between training examples, assumed in conventional machine learning and deep learning algorithms such as linear regression, Lasso, SVR, decision tree, KNN, and ANN; therefore, they are unable to detect dependency between data points.
The interdependency between each pair of points in the dataset, pointed out as correlation distance in this paper, is not perceived as the Euclidean distance, and it refers to the similarity between each pair of points based upon the values of the most important geological features including elevation, slope, general curvature, NDWI, TWI, SPI, geologic map, land use, flow direction, plan curvature, and profile curvature.In other words, two data points that are far from each other in Euclidean metrics can have strong dependencies based on the value of their features.On the other hand, there may exist a poor dependency between tow data points that is so close to each other in the area.Therefore, we have come to the conclusion that data points are not independent of one another based on their feature values since GCNs outperformed all other conventional machine learning algorithms that follow the core assumption of independence between each paired point in the area.
As can be seen in Figure 3, the image on the left is related to the velocity displacements between 2012 and 2016, which is placed for comparison next to another image that is related to the predicted velocity displacements on the test set date to explore and better understanding how the proposed prediction model (GCN) worked.As shown in this figure, GCN with a mean absolute error of less than 2 mm has been able to correctly predict the amount and location of deformation in the case of positive and negative displacements in test set data.

Figure 1 .
Figure 1.Landslide inventory map (a) (Hydro-geomorphological Setting Plan, South Campania River Basin Authority, 2015) and some examples of damage recorded to infrastructures within the test area (b, c).

3. 2
Geological dataGlobally, landslides are one of the most common natural disasters.In this case study, morphological and geological factors such as elevation (A Digital Elevation Model (DEM) of the Shuttle Radar Topography Mission (SRTM with 30 resolution)), slope, Normalized Difference Vegetation Index (NDWI)(Ammirati, 2022), Topographic Wetness Index (TWI)(Novellino, 2021), Stream Power Index (SPI)(Di Napoli, 2021), geology, land use, flow direction(Di Napoli, 2020a), total, plan, and profile curvature(Di Napoli, 2021), are considered the leading causes of landslides.The predisposing factors mentioned above have been classified (Figure2) and used to learn and train the GCN models and create relationships and connections between them and predict landslide deformation.

Figure 2 .
Figure 2. Classification of Features to use for GCN and other machine learning algorithms (X Axes: Number of clustering)

Figure 3 .
Figure 3. Location and map of displacement by a) DInSAR technique and b) GCN (Unit: CM)

Table 1 .
Details of the COSMO-SkyMed acquisitions.

Table 2 .
Evaluation metrics for the proposed approach