FAST WEAKLY SUPERVISED DETECTION OF RAILWAY-RELATED INFRASTRUCTURES IN LIDAR ACQUISITIONS

: Railroad environments are peculiar, as they combine dense urban areas, along with rural parts. They also display a very speciﬁc spatial organization. In order to monitor a railway network a at country scale, LiDAR sensors can be equipped on a running train, performing a full acquisition of the network. Then most processing steps are manually done. In this paper, we propose to improve performances and production ﬂow by creating a classiﬁcation of the acquired data. However, there exists no public benchmark, and little work on LiDAR data classiﬁcation in railroad environments. Thus, we propose a weakly supervised method for the pointwise classiﬁcation of such data. We show that our method can be improved by using the (cid:96) 0 -cut pursuit algorithm and regularize the noisy pointwise classiﬁcation on the produced segmentation. As production is envisaged in our context, we designed our implementation such that it is computationally efﬁcient. We evaluate our results against a manual classiﬁcation, and show that our method can reach a FScore of 0.96 with just a few samples of each class.


INTRODUCTION
Railway networks are widely developed in many countries, allowing for fast and reliable people transportation, as well as freight transportation. In order to serve entire countries, railway network grew rapidly in the 20 th century, culminating at 400000 kms for the US, 150000 kms in Russia, 64000 kms for Germany or 42000 kms for France according to the International Union of Railways. Monitoring such network is costly both in terms of time and money. This is the reason why the automated monitoring of railway networks is such an important yet complex task.
Usually, the monitoring of railway environments is divided in several tasks, starting by the detection of the key elements in railroad infrastructures. This can be done through camera-based data (Banić et al., 2019), or LiDAR-based data (Stein et al., 2016). Both sensors can be equipped on trains and used for mapping railway networks. However, LiDAR sensors allow for more detailed acquisitions, and do not suffer from light variations or distorsion. Also railway networks are composed of both dense urban areas and rural or forested areas. LiDAR sensors are able to adapt to all these types of environments and LiDAR acquisitions are less sensitive to vegetation presence, as laser beams can cross through the vegetation. Hence, LiDAR sensors will be prefered for this study.
The detection of key elements along railway networks has already been investigated. Many works focus on the detection of 1 type of elements. For instance, Elberink et al. (2013); Lou et al. (2018) focus their study on track detection. Gézero and Antunes (2019) proposed to detect all linear elements in LiDAR scans. This includes rail tracks but also catenaries. Such object can be detected with a classification approach based on a SVM, with results regularized with a CRF (Jung et al., 2016). Arastounia (2015) proposed to tackle the detection of linear elements in rural areas by combining geometrical properties and topological relationships in rail corridor. Some works also focus on the detection of other elements, such as tunnels (Sánchez-Rodríguez et al., 2018).
In our study, we propose to view the problem of infrastructure detection in rail corridors as a classification problem, where each point of a LiDAR acquisition belongs either to an object of interest or to a dedicated class containing all objects not related to railroad infrastructure. The classification task from LiDAR point clouds has been thoroughly investigated (Weinmann et al., 2015;Vicari et al., 2019), with promising algorithms, especially in urban areas. However, there exists only a few studies on the classification of all key elements in rail corridors (Arastounia, 2012). Also, most of these works focus only on urban scenes (Arastounia and Oude Elberink, 2016) or rural areas (Arastounia, 2015) but not both. For a more extensive review on the automated detection of railroad infrastructures from LiDAR data, we refer the reader to the work of Soilán et al. (2019).
Recently, deep-learning approaches such as SPG (Landrieu and Simonovsky, 2018) or SegCloud (Tchapmi et al., 2017) achieved really promising results. However, training a deep learning algorithm would require a large dataset of annotations in railway environments. Such ground truth will be costly both in terms of time and money to produce and at the time of this paper, there is exists no data publicly available. Hence, we restrict our study to classification algorithms that work with little ground truth.
The Random Forest algorithm (Breiman, 2001) is able to address the problem of classifying large 3D point clouds with a minimal ground truth. It has already been used for classifying LiDAR point clouds in urban scenes (Chehata et al., 2009;Niemeyer et al., 2013) as well as forested areas (Shen and Cao, 2017). However, LiDAR scans are noisy and have irregular density, and we fear that it may decrease the classification quality. Hence, as advocated by Lim andSuter (2009) andShapovalov et al. (2010), we propose to improve the results of a noisy pointwise classification by running a pre-segmentation algorithm. Niemeyer et al. (2016) propose to use a CRF as a post-processing step. Similar approach has been investigated in rail environments, with promising results, but only for specific objects, such as wires (Chen et al., 2019). Recently, Guinard and Landrieu (2017) used the 0-cut pursuit algorithm as a pre-segmentation step to improve classification results in urban areas.
Following the approach of Guinard and Landrieu (2017), we design a weakly supervised approach for classifying LiDAR point clouds in railroad environments. We also investigate the use of different regularization methods to improve classification results. We start by presenting the geometric descriptors used as input for the classification. Then, we present the classification framework. The third technical part is focused on the regularization approaches used for improving classification results. Last we present some experiments done on real world data.

DESCRIPTORS COMPUTATION
Most of classification algorithms, including the Random Forest, rely on local and global descriptors, computed at a point scale, or at a global scale (Hackel et al., 2016;Xing et al., 2019). Such descriptors are usually easy and fast to compute, while providing meaningfull geometrical information. In order to describe the local geometry of each point we define six descriptors: linearity, planarity, scattering, verticality, omnivairance and curvature, which we represent in Figure 1.
The features are defined from the local neighborhood of each point of the cloud. For each neighborhood, we compute the eigenvalues λ1 ≥ λ2 ≥ λ3 of the covariance matrix of the positions of the neighbors. The neighborhood size is chosen such that it minimizes the eigentropy E of the vector λi, in accordance with the optimal neighborhood principle advocated in Weinmann et al. (2015): As presented in Demantké et al. (2011) and Weinmann et al. (2015), these eigenvalues allow us to qualify the shape of the local neighborhood by deriving the following values: The linearity describes how elongated the neighborhood is, while the planarity assesses how well it is fitted by a plane. Finally, high-scattering values correspond to an isotropic and spherical neighborhood. The combination of these three features is called dimensionality.
In our experiments, the vertical extent of the optimal neighborhood proved crucial for discriminating ballast and walls, and between poles and catenaries, as they share similar dimensionality. To discriminate these classes, we used a descriptor called verticality, introduced in Guinard and Landrieu (2017). This descriptor can be computed from the eigen vectors and values defined above. Let u1, u2, u3 be the three eigenvectors associated with λ1, λ2, λ3 respectively. We define the unary vector of principal direction in R 3 + as the sum of the absolute values of the coordinate of the eigenvectors weighted by their eigenvalues: The authors shows that the vertical component of this vector characterizes the verticality of the neighborhood of a point.
As advocated by Blomley et al. (2016), these descriptors are computed at different scale, for small, medium and large neighborhoods. In our case, we compute these descriptors for each point, considering respectively their 10, 20, 40 and 80 neighbors. The results can be seen in Figure 1.

CLASSIFICATION
We now focus on the classification part of our approach. In order to perform a classification in one step, on multiple classes, and with little ground truth, we decided to use the Random forest algorithm (Breiman, 2001). This algorithm already proved its efficiency and versatility when it comes to classifying LiDAR point clouds in various environments, from urban to forested areas (Chehata et al., 2009;Niemeyer et al., 2013;Li et al., 2019), hence we argue that it is a suitable algorithm for classifying complex rail corridors.
In our case, we are working with 5 classes, composing the vast majority of railroad scenes. These classes are: vegetation, ballast, walls & fences, linear objects and other. Linear objects include rails, poles and catenaries. The last class comprises elements appearing on railroad acquisitions but not useful for our application, such as nearby buildings or cars.
To train the algorithm, we manually hand labeled 5691 points including 3162 vegetation points, 843 ballast points, 346 points for the walls & fences, 1286 for linear objects and 54 points for representing the other objects.
Thanks to a training set with a limited number of points, containing elements of all classes with various local geometries, we expect the learning process to be fast, while achieving good overall predictions.

REGULARIZATION
One of the main drawbacks of a computing a pointwise classification is the noise in the prediction. In order to overcome this noise, and produce a more regular prediction, we propose to use two different approaches: 1. Regularization based on the nearest neighbors prediction, 2. Regularization based on a geometrically homogeneous segmentation.

Nearest Neighbors
The first approach for regularizing the pointwise classification consists in associating to each point the label of the majority of its neighbors. Such post-processing step allows for reducing the noise in the classification while being computationally fast. In the experiments, we test this approach at different scales, from 20 to 80 neighbors.

Segmentation
In order to improve the classification process, we also decided to add a pre-segmentation step, as in Guinard and Landrieu (2017). This is motivated by the fact that railway environment ballast fences vegetation catenaries Figure 2. Illustration of the classification of the segments, based on the sum of point's classifications probabilities. Each pie corresponds to a single segment, and the adjacency relationship illustrates the segment-graph. Each color represents a different class. In our pipeline, the most represented color in a single segment will be associated to every point of the segment.  Table 2. Detailed results when a pre-segmentation step has been done. The first column corresponds to the results of (Guinard and Landrieu, 2017).
are mostly man-shaped. Thus, we argue that it should display a certain geometric regularity. Such geometric regularity should be translated as a regularity on the computed descriptors: points belonging to a same object should display similar descriptors values. A pre-segmentation step, using descriptors values as input, and grouping points accordingly can be able to capture the underlying structure of the scene.
As LiDAR point clouds contain millions of points for small scene, we expect the segmentation algorithm to be able to handle massive data in a reasonable computation time. Also, railroads are composed of objects of various sizes, from ballast that composes the majority of the scene, to traffic signs, that contain a few dozens of points. Hence, the segmentation algorithm should produce a segmentation where regions can have various sizes and shapes, according to their local geometry.
Let G = (V, E) be the graph representing our data, where each point vi ∈ V is a point of the cloud, and each point is connected to its 5 nearest neighbors. Let fi ∈ R 24 be the vector of descriptors associated to each point. We want to find g , a piecewise constant approximation of the signal f ∈ R 24×V , such that it minimizes the following energy: with δ(· = 0) the function of R 24 → {0, 1} equal to 0 in 0 and 1 everywhere else. The first part of this energy is the data fidelity part, ensuring that the segmentation produces geometrically homogeneous segments. The second part of the energy is the regularization term, forcing the segmentation to show simpleshaped segments. This energy is non-convex, non-continuous and non-differentiable, thus hard to solve.
We decide to use the 0-cut pursuit algorithm of Landrieu and Obozinski (2017) to find an approximate solution to this segmentation problem. In fact, 0-cut pursuit proceeds in a topdown manner, by iteratively splitting the data, and has no prior on the number of regions, nor their size or their shape. This algorithm already proved its ability to process large amount of data (such as LiDAR acquisitions), while producing a geometrically homogeneous segmentation of the scene.
Using this pre-segmentation step allows us to reformulate the pointwise classification problem as a segment-wise classification problem, where the final label associated to each point corresponds to the label associated to its segment. To do this, we still produce a random-forest-based pointwise classification, but then associate to each segment the most popular label among its points, thanks to a voting strategy. The last consists in associating to each point the label of its segment. This is illustrated on Figure 2.

Dataset
To the best of our knowledge, there exists no public labeled railroad dataset. Hence, we used our own private dataset, which consists in a few kilometers long train-based LiDAR acquisition on the French railroad network. This acquisition has been cut in 100m × 100m parts which are processed independently.
We selected and manually labeled a part of this dataset. The composition of the training set is detailed in Section 3. The test set has been manually labeled as well and contains ∼ 700k points. It is composed of 191114 vegetation points, 243205 ballast points, 74866 points for the walls and fences, 162543 points for linear objects and 29517 points representing other objects. The test set can be seen on Figure 3a. Our approach is evaluated using the unweighted FScore on the labeled part, and by visual validation on the rest of the dataset.

Results
The results are evaluated using the F1-Score metric. In this paper we compare a pointwise classification with a Random Forest, to its regularized classifications, based on 20 and 80 neighbors. We then add a presegmentation step to improve the classification results and also improve the results by averaging the prediction on the 20 and 80 nearest neighbors.   The random forest-based classification displays great results for vegetation detection, with a FScore higher than 0.9. However, it fails to retrieve the limits between ballast, linear objects and other objects. This is due to the fact that the limits between such objects may display the same geometry. Also, this method suffers from the noise in the original acquisition. Hence, the regularisation of the classification based on 20 and 80 neighbors show great improvements, both in terms of objects borders detection and in terms of FScore. The detailed results are visible on Table 1, and visuals are shown on Figure 3.
In order to improve results, we added a segmentation step, based on the 0-cut pursuit algorithm. This allows us to group points by similar geometries and to decrease noise influence. The segmentation in itself is visible on Figure 3b. We remark that ballast and vegetation are cut in a limited number of regions, while catenaries and geometrically complex areas are divided in numerous regions. This shows that the segmentation was able to efficiently group large numbers of points when possible, while preserving geometrically complex areas. Improvements in terms of classification can be seen on Table 2. In fact, the use of a presegmentation step increases the FScore of a random forest classification by .4, and a regularization based on the 20 nearest neighbors improves the FScore to .96, with FScores per classes higher than .92. More results for the best approach: presegmentation + regularization on 20 neighbors are shown on Figure 4. The scene conrresponds to a 100m × 100m acquisition along rail tracks. Close-up on rails, catenaries and walls are displayed as well. We see that our classification displays a great accuracy on geometrically complex objects such as rails and catenaries. In fact, the FScore for the linear object class rises from 0.45 for the Random Forest classification to 0.98 when aided by a pre-segmentation step and a regularization on 20 neighbors.
Such results are promising and shows that a random forestbased classification aided by pre-segmentation step and a postclassification regularization can efficiently discriminate the most present types of objects in railway environments.

Computational performances
The computational performances for a standard 100m × 100m tile containing 3.4 million points are displayed hereafter. The code has been executed on a single core, with a Intel Core I7 @ 2.80 GHz processor and 16 GB of RAM. Also, all the code has been written in C++. The processing times are displayed on Figure 5. The processing times include reading and writing steps that are done at each step. This is due to the fact that each step is independent and can be separated from the pipeline.
For the descriptors computation, using already determined neighborhood sizes allows for faster computation than finding optimal neighborhood sizes at run time. In our case, a quick benchmark on neighborhood sizes coupled to the use of 4 different neighborhood sizes allows us to find close to optimal neighborhood sizes and to collect enough informations to perform a meaningful pointwise classification.
The segmentation part -which is optional in our case, follows a top down approach, meaning it is able to scale-up more easily than bottom up approaches. Also, the iterative solving scheme used by the 0-cut pursuit algorithm gives us a good approximation of a non-convex problem that would be otherwise unsolvable in a reasonable amount of time. Last, the training time remains low thanks to a limited ground truth composed of ∼ 5000 points, which extremely little compared to the size of training datasets needed for deep learning methods. This limited ground truth also ensures that the trained forest is small enough to have a fast prediction.
The overall processing time remains low enough to consider using such pipeline for production.

CONCLUSION
In this paper, we investigated the classification of LiDAR data in railroad environments. Such environment is very peculiar, as it contains a specific spatial organization, with pairs of rails in the central part of the scan, surrounded by ballast, they also combine dense urban areas near train stops with rural areas between cities served. In order to improve maintenance flows, we propose to detect the key elements of railway environments thanks to a classification of the scene.
To the best of our knowledge, there exists no public benchmark, nor large-scale ground truth on railroad environments that could be used for training, so we designed a weakly supervised approach, using only a few hundred samples per class. This approach is based on local descriptors computed at different scales and can be improved by a pre-segmentation step. We investigated a post-processing regularization based on the nearest neighbors classification. Our method's computational performances allows us to envisage its usage for production purposes.
Further work will focus on country-wide classification and its evaluation. Also, as proposed by (Xu et al., 2019), a pseudolabelling approach could be used to add more data in the training process, enabling the use of deep learning algorithms.