MACHINE LEARNING FOR CLASSIFICATION OF AN ERODING SCARP SURFACE USING TERRESTRIAL PHOTOGRAMMETRY WITH NIR AND RGB IMAGERY

Increasingly advanced and affordable close-range sensing techniques are employed by an ever-broadening range of users, with varying competence and experience. In this context a method was tested that uses photogrammetry and classification by machine learning to divide a point cloud into different surface type classes. The study site is a peat scarp 20 metres long in the actively eroding river bank of the Rotmoos valley near Obergurgl, Austria. Imagery from near-infra red (NIR) and conventional (RGB) sensors, georeferenced with coordinates of targets surveyed with a total station, was used to create a point cloud using structure from motion and dense image matching. NIR and RGB information were merged into a single point cloud and 18 geometric features were extracted using three different radii (0.02 m, 0.05 m and 0.1 m) totalling 58 variables on which to apply the machine learning classification. Segments representing six classes, dry grass, green grass, peat, rock, snow and target, were extracted from the point cloud and split into a training set and a testing set. A Random Forest machine learning model was trained using machine learning packages in the R-CRAN environment. The overall classification accuracy and Kappa Index were 98% and 97% respectively. Rock, snow and target classes had the highest producer and user accuracies. Dry and green grass had the highest omission (1.9% and 5.6% respectively) and commission errors (3.3% and 3.4% respectively). Analysis of feature importance revealed that the spectral descriptors (NIR, R, G, B) were by far the most important determinants followed by verticality at 0.1 m radius.


INTRODUCTION
In the past decades a step change in close range remote sensing technologies has allowed techniques such as photogrammetry to be employed by an increasingly diverse range of users, not only the specialist (Eltner et al., 2016;Westoby et al., 2012). The inevitable result of this proliferation has been an abundance of high-quality data for which automated processes of classification have become a practical necessity (Grilli et al., 2017), since manual labelling and classification are cost-and time-demanding and unfeasible for large datasets. In this context, at the 2019 Innsbruck Summer School, Obergurgl (Rutzinger et al., 2018(Rutzinger et al., , 2016, a team of researchers applied machine learning (ML) to a point cloud derived from dense image matching of a terrestrial photogrammetric survey. This came as part of a larger survey in in a mountain environment with also a remotely piloted aircraft system (RPAS) over the whole valley . Near infrared and RGB information was collected from both RPAS and terrestrial surveys, as previous literature has largely proven that any vegetation component can validly be labelled with spectral features (Alba et al., 2011).
The fields of interest of the participants comprise a diversity of applications that can benefit from close-range sensing: from primary colonisation of recently deglaciated ground, through slope stability and evolution, to the surveying and interpretation of rarely preserved 700-million-year-old landforms. These users represent some of the numerous examples that may benefit from common data manipulation techniques, allowing statistical data to be derived from classifications within point cloud data.
Within this study the aim was to i) classify relevant surface types within a small section of a mountain valley floor, ii) compare the efficacy of optical and geometric properties in distinguishing between key surface types and finally iii) to evaluate photogrammetric methods and machine learning approaches with respect to the group members' research interests.
A terrestrial photogrammetry survey was undertaken on a partially snow-covered river bank comprising peat, loose soil, rock and vegetation and these different components were each assigned a class. Point cloud segments representative of each were used for training a machine learning model, subsequently used to classify areas within the entire point cloud with a degree of reliability.

STUDY SITE
The study area is located at the main alpine divide of the Austrian Alps at the border between the State of Tyrol (Austria) and the Province of South Tyrol (Italy). The Rotmoos valley (46° 50' 24'' N, 11° 01' 59'' E) extends c. 6 km from SE to NW and covers an area of c. 1 km² with an altitudinal range from c. 2240 m to c. 3400 m. The area is characterised by an inner alpine climate and surrounded and protected by mountains. The nearby weather station (Obergurgl, 1938 m) shows a low mean annual precipitation of c. 819 mm, with maxima from June to August. Mean annual air temperature is + 2.2 °C, with the highest monthly means of around + 16 °C in July and the lowest mean of -8.3 °C in February (data period 1971-2000; ZAMG -Austria's national weather service 2018). During the last glacial, the valley was shaped by glacial erosion through multiple advances of the Rotmoos glacier. The last glaciation of the valley floor was during the Younger Dryas period. After the retreat of the Younger Dryas glacier, the valley was filled with up to 40 m of sediment (Patzelt, 1995) and remained ice free during the last re-advance of the Rotmoos glacier during the Little Ice age. The attributed prominent terminal moraine complex is located c. 1 km up valley from the study site. In the distal and central part of the valley, a peat bog developed that covers an area of c. 800 by 120 m. The peat deposits are up to 2.65 m thick and radiocarbon dates from the base and top of the peat are c. 5994 and 1629 years before present respectively (Bortenschlager, 2010). Today, the peat bog is dissected and eroded by the river Rotmoosache, a tributary of the river Ötztaler Ache. A c. 20 m stretch of its bank is the object of this study (for location see Figure 1, red star). The study section comprises steep peat faces, which are highly water saturated and partly covered by snow and vegetation (Figure 1 bottom).

METHODS
Data acquisition was planned together with a team that acquired UAV imagery. Three of their ground control points (GCPs) were measured with differential GNSS (Global Navigation Satellite System) in order to georeference the final product in a projected coordinate system. A control measurement between points revealed sub-centimetre accuracy of the GCPs. In the study area for this investigation eleven GCPs were placed on the eroding scarp ( Figure 1 bottom) and georeferenced using a total station positioned on one of the measured UAV GCPs.
Terrestrial photogrammetry was used to survey the eroding scarp surface. Images were acquired using a consumer-grade RGB camera Canon EOS 450D (27mm) and a NIKON D-200 with HOYA R72 filter, modified to operate in the Near-Infrared region (NIR) of the electromagnetic spectrum (750 -1,500 nm). The modification allowed the CCD sensor to record reflected radiation above 720 nm. As shown in Figure 2, the natural sensitivity of the CCD sensor includes wavelengths up to 950 nm, but these are filtered out by the camera filter. By removing this filter and adopting an external filter, NIR information can be recorded in the image.

Pre-processing
In total 24 images were imported to Agisoft Metashape (AM). The GCPs were automatically detected and located by the

•OBERGURGL
software. The coordinates of the GCPs that were measured using the total station in the field, were loaded into AM. After camera alignment, dense image matching was performed in order to obtain a dense point cloud. The same workflow was applied to the NIR imagery acquired with the modified camera.
The point clouds from the RGB and NIR imagery were imported to CloudCompare software. First, NIR information was merged to the RGB point cloud using three nearest neighbours. The software finds, for each point in the RGB cloud, the three closest points in the NIR cloud, and appends the average from the three NIR values to the RGB point. Additionally, 18 geometric features were calculated within CloudCompare (see Table 1). A description on the computation of the eigenvalue and vector based features is given by Hackel et al. (2016). The geometric features are calculated by considering a number of neighbours. In CloudCompare the neighbours are identified using a user-defined radius. In this work three radii were tested: 0.02 m, 0.05 m and 0.1 m. Respectively each distance had the following number of neighbouring points -average (standard deviation): 4.8(3.1), 12(8), 24.12(16). The final cloud had ~2.16 million points. The final cloud was exported as a text file with information on the coordinates (x, y, z), RGB and NIR values and the 18 geometric features for each radius. The final feature count was therefore 54 geometric features and four spectral features (NIR, R, G, B), for a total of 58 descriptive features that can be used for classification. Further analysis was carried out using the statistical software R and R Studio.

Classification
The final cloud was imported in a text file format as a table (data.frame) in R 3.6 (R Core Team, 2018). A random forest classifier was used as previous tests have shown positive results Pirotti and Tonion, 2019) and initial tests run on the data of this study supported the use of this classifier. This choice is however debatable as many factors must be taken into consideration and the issue is expanded upon in the Discussion Section. For the random forest model, the number of trees in the ensemble was set to 200 and the number of variables to split a node was set to 16, after tuning the model trying a grid of 6x6 reasonable values of number of trees and number of variables. Six surface classes were defined, dry grass, green grass, peat, rock, snow and target (Table 2). This last class is represented by the 11 black and white targets used for GCPs.
For classification with machine learning (ML), manually labelled (classified) points were used for training and testing. Manual labelling was a crucial task. For this study, subsets for each of the six classes were extracted from the original point cloud by manually clipping regions with points having a defined unique class. Table 2 shows that the number of labelled points per class is quite balanced, except for the "rock" class. The rock class is under-represented in the study area, as the surveyed area is mostly covered with grass, peat or snow. It was nevertheless included as it does represent a class of its own and cannot be reasonably merged in the other classes.  Table 2. List of classes with number of points in the labelled subset and colour related to Figure 4 .

ID
The labelled point set was further split into training (50%) and testing (50%) subsets. Random Forest was used to create a classification model based on the training data. The efficacy of each variable in creating the model was reported as Mean Decrease Accuracy (MDA). To assess accuracy of the model, the points in the test dataset were classified and the predicted classes compared to the labelled classes using a confusion matrix and accuracy metrics. Finally, the fitted Random Forest model was used to classify the entire point cloud, to generate a labelled 3D model of the study area ( Figure 4).

RESULTS
The machine learning approach provided a very high overall classification accuracy of 98% across all classes, with a Kappa index of 97%. These figures are related to the independent testing dataset. Predictor accuracy was highest for the following classes: target, peat and snow, followed by rock and dry vegetation. Green and dry vegetation had both highest commission and omission errors, thus showing a likely mutual misclassification. Accuracy was high for snow, rock, peat and targets (> 98%). However, when dividing vegetation into dry and green, the observer's accuracy drops to below 98.1% and 94.4% respectively (Table 3).  Table 2.
Feature importance over the whole classification process was also analysed (Figure 3). Figure 3 revealed that spectral descriptors were the most influential in classification. Within these, NIR ranked highest followed by red and then overall RGB. The next most-important non-spectral descriptor was the verticality at 0.1 m radius.  Looking at importance of each feature for class scale (Table 4) shows that verticality at the highest radius size (0.1 m) was particularly important for the "target " and "green grass" classes, along with point density features (features 5, 6 and 7 -i.e. number of neighbours, surface and volume density -see Table  1). Surface variation (16) and sphericity (17)

DISCUSSION
The primary objective of this investigation was to test the performance of a well-known machine learning algorithm, Random Forest, for classification of point clouds from a terrestrial photogrammetric survey. The following discussion will focus on the strengths and weaknesses of the techniques applied and their resulting outputs, allowing suggestions for future improvements. Moreover, consideration will be given to the replicability of the method used herein, to address the respective research questions of other fields of research.

Effectiveness of methods and outputs
Visual comparison of the classified point cloud and original images immediately reveals a striking qualitative similarity, (Figure 4) which is supported by the confusion matrix and the Kappa index of agreement and other accuracy metrics which have high values. It must be noted that the accuracy metrics are calculated over an independent set, but still over a small number of points, i.e. ~50 thousand labelled points from a total of a point cloud with ~2 million points (~2.5%). The points have been chosen from across the dataset, to avoid spatial autocorrelation (see Figure 5), and further splitting into training and testing datasets have been done with stratified random sampling (strata according to classes), thus keeping independency, but still training and testing data are limited to a small dataset. This implies that classification accuracy metrics can be very high, but not necessarily reflect the performance over all the area. A visual analysis from the classified set ( Figure 4) shows that some points of snow patches are erroneously classified as targets. This is probably due to similarity in colour (white target and white snow) and in shape of the object, as a snow patch around a 10 cm radius will appear close to flat, just like a target. Since colour and verticality are the most important features (Figure 3), a similarity in these features will result in class mixing. Moreover, there is a lack of point cloud data within the snow patches ( Figure 4); a more quantitative approach reveals that, whereas discrimination between rock, snow, soil and vegetation was reliable, the distinction between wet and dry vegetation was more problematic (Table 3).
The weakness discriminating between the two classes of vegetation (green and dry vegetation) is likely due to the gradational boundaries between the two classes, where one blends into the other. This is exacerbated by their physical proximity, as they do not occur in discrete areas of dry and wet vegetation. In any case this distinction between wet and dry is somewhat arbitrary, using qualitative colour choices within the image for the selection of training data. Future workers should consider the ground truthing of wet and dry areas by touch or using a more quantitative approach with moisture detection equipment.
The lack of data cloud points within the snow areas results from the high snow albedo in contrast to the relatively dark remainder of the images, resulting in over-exposure of the former. This could be overcome by multiple images from the same position using different exposure settings (i.e. ISO, shutter speed, F-stop), or by using a camera with greater bit depth. The former is potentially labour intensive if images require merging by hand before construction of the point cloud, whereas the latter solution is limited by the available camera. Other options might be taking images in the RAW format and using post-processing in order to fix the over-exposed spots or using a polarizing filter, which increases contrasts and the overall colour saturation. These solutions may however be equally labour or cost intensive and have their limitations.

Descriptors
The importance of NIR and red descriptors suggests that the Normalized Difference Vegetation Index (NDVI) could be used as a proxy for changes along the eroding riverbank. Although generally much less important than spectral features, verticality was the highest non-spectral descriptor and especially useful in dividing rocks from other classes. Table 4, verticality and density-based descriptors like surface and volume density and number of neighbours do have importance for the Random Forest method. It must be noted that correlation does not necessarily indicate causation and therefore importance might not be related to class-intrinsic information, but to a coincidental relationship. For example most targets were placed vertically thus verticality might support classification, yet this would not be useful in a scenario where targets were placed at different angles. An important rule is that machine learning and artificial intelligence in general work as well as the similarity to trained data. Care must be taken when applying a model trained with a dataset which has different characteristics than the dataset to be classified.

Neighbourhood size:
The questions of ideal radius size and ideal number of neighbours are important when considering descriptors that use neighbours to describe shape and morphology (Pirotti and Tonion, 2019;Weinmann et al., 2015). The ideal method would involve finding the best number of neighbours for each point from a range by adopting a minimal entropy approach. Although effective this requires very intensive calculation as entropy has to be determined for a range of neighbours for each point. For this study therefore three radii were tested, instead of adopting the minimal entropy method. As a compromise, considering that a number of geometric features, including verticality, showed highest importance at the largest chosen radius, (0.1) the study could be extended to test if even larger radii could give better results.

Class definition
In general, the choice of classes within any point cloud will be guided by the research question at hand. However practical limitations of the data may restrict what can be distinguished by the machine learning process. With this in mind it is suggested that future work could attempt unsupervised classification of the point cloud data. On the one hand this may provide insights into the type of further classifications that could reliably be made and on the other it may reveal patterns that are not otherwise obvious. This would overcome a limitation of the technique employed herein. Specifically, spectral attributes (i.e. examination of the photographs) were used to select classes and training segments therefore it is unsurprising that spectral descriptors are the most influential.

Data acquisition
In terms of practicalities two potential obstacles were encountered, both relating to time. First the terrestrial photography of the survey team took place within the UAV flying area of another team, allowing our data to be placed into a broader context. However, both teams worked on the area at the same time, resulting in significant waiting time during which the area could not be accessed whilst imaged from the air. Improved coordination would significantly reduce time in the field improving efficiency. Second, although only a small area was imaged for terrestrial photogrammetry, it was time consuming in terms of processing time. Caution must be applied in similar surveys in determining the ideal scale for surveying the area of interest as a function of required detail and objects to be classified.

Future work
Several aspects can be further investigated. An interesting aspect is the impact of point density on results. As mentioned in the previous section, the processing time does impact significantly on the method. Depending on the study area size and types of classes it is very likely that there is an ideal density below which the classification accuracy drops unacceptably. Addressing this is relatively simple, as the method can be applied to gradually decimated point clouds. The presented workflow could also be preceded by a point cloud segmentation to enable an object-based classification. As outlined by Vosselman, (2013), this would allow for the computation of additional features such as shape, which could be well suited for discriminating targets from other classes. If rocks have a common morphological appearance in the study area due to their transport history, descriptors of size and shape could also improve the classification. As outlined in Section, 5.2.1, the importance of some features is likely related to peculiarities of the research area. Therefore, future work should also assess the transferability of the proposed method. This could for example include alteration of class definitions, size of the research area and environmental characteristics. Additionally, the study design should be tested for robustness under different weather conditions. At the day of the study most surfaces were relatively dry. Moist weather could change the surfaces reflectance, making it harder to distinguish certain classes (for example dry and green vegetation or rocks and peat).

Relevance to the research interests
The potential applications of these methods are diverse, in terms of both scale and classification type. At the metre to ten-metre scale, classification techniques can distinguish categories of surface features upon boulders, to help elucidate their transport history. At the kilometre scale this method may be used to classify bedrock surfaces using UAV data, that may be host to subglacially striated areas. The advantage of machine learning in this approach is that these surfaces have distinct geomorphological characteristics, of roughness and curvature, but are often in remote areas. The resulting classification could allow targeting of the most likely areas for ground examination. Machine learning techniques based on the Random Forest algorithm are used to detect landslides and to assess landslide susceptibility maps for large regions e.g. Catani et al., 2013;Kim et al., 2018;Stumpf and Kerle, 2011;Taalab et al., 2018). In terms of monitoring and predicting landslide movements, these methods can be of great help for civil protection and risk mitigation.
Distinguishing between biotic and abiotic classes to better understand their relative influences on a recently deglaciated landscape could also be a target for a classification framework of point cloud data. Moreover, integration of the classification with spatial data can be used to investigate the relationship between primary succession and relief. More environmental applications can be distinguishing between healthy and damaged vegetation, for example due to icing events in Arctic tundra environments.

CONCLUSIONS
During the course of the Summer School the participants captured, constructed and merged geo-referenced point clouds that contained NIR and RGB data respectively and used these to train a machine learning process that segregated the merged cloud into six classes. The target, snow, rock and peat classes were reliable whereas distinguishing between wet and dry vegetation classes was more problematic, likely due to ambiguous training segments. Optical descriptors were far the most important attributes in classification, although, this is predetermined by the selection of training areas based on visible light properties. Shape features from 3D points clouds bring some improvement over the overall classification results, and this can be further addressed including the laser scanning data (Pirotti, 2019) from the RPAS survey of the area. During data acquisition, processing and analyses each of the participants learned new skills and identified practical applications of those skills in their own study areas.