CHANGE DETECTION AND DEFORMATION ANALYSIS BASED ON MOBILE LASER SCANNING DATA OF URBAN AREAS

Change detection is an important tool for processing multiple epochs of mobile LiDAR data in an efficient manner, since it allows to cope with an otherwise time-consuming operation by focusing on regions of interest. State-of-the-art approaches usually either do not handle the case of incomplete observations or are computationally expensive. We present a novel method based on a combination of point clouds and voxels that is able to handle said case, thereby being computationally less expensive than comparable approaches. Furthermore, our method is able to identify special classes of changes such as partially moved, fully moved and deformed objects in addition to the appeared and disappeared objects recognized by conventional approaches. The performance of our method is evaluated using the publicly available TUM City Campus datasets, showing an overall accuracy of 88 %.


INTRODUCTION
Mobile laser scanning (MLS) is a fast and efficient way to collect high resolution 3D measurements of extensive areas. The possible benefits of such data are numerous. It can, for example, be used for disaster assessment, to extract street sign information or for the generation of city models. Regardless of the actual use case, the knowledge and products derived from the data needs to be kept up to date. A re-evaluation of all data once new measurements are available is inefficient. The active search for changed areas within the dataset is a much more suitable approach. In literature, this process is referred to as automatic change detection. The big challenge regarding automatic change detection is the incomplete observation of the environment. While some scenarios allow a complete observation of the area-of-interest, most real-world datasets suffer from occlusions or the fact that some areas are only observed in one epoch, but not in the other. Is a method unaware that some parts of the dataset have not been ob- * Corresponding author served in both epochs, then it will accidently flag objects within these areas as changed. For this reason, it is necessary that a change detection method is able to handle such incomplete observations. One way to do this is to consider the occupancy information embedded within the range measurements.
There are several categories of change detection algorithms that can be classified by runtime and their capability to handle incomplete observations. An illustration can be seen in Figure 1. Approaches solely based on point clouds are not able to handle incomplete observations, but are usually quite fast. Less fast are volumetric methods such as occupancy grids, since they are usually generated using raycasting-like algorithms. They are able to recognize incomplete observations, however, at the cost of discretization errors. Ray-based methods are the most precise since they use all of the available information. Unfortunately, they are required to have a quadratic runtime complexity and therefore are much slower than algorithms based on volumes and point clouds.
Our approach combines the best of two worlds. We use point clouds in order to get a high spatial resolution and apply information extracted from a low-resolution occupancy grid to include occupancy information. Therefore we are able to get fast runtimes in combination with the capability to handle incomplete observations. In addition, our approach allows to easily identify special cases such as partially moved, fully moved and deformed objects, whereas other approaches usually only identify appeared and disappeared objects.
Section 2 gives an overview over the state-of-the-art of occupancy grids, change detection and deformation analysis. In Section 3, we give a short definition of what we define as changes and describe the different classes of changes identified by our approach. A detailed explanation of our method follows in Section 4. The capabilities of our approach are evaluated in Section 5 based on a synthetic dataset especially created for this purpose, as well as the TUM City Campus datasets.

Occupancy grids
Occupancy grids are a memory-friendly way to store information about free, occupied and unobserved areas. An approach for indoor mapping proposed by (Moravec, Elfes, 1985) utilizes a 2D grid with occupancy information. The latter one was represented by a probability stored in each of the grid cells, implying either free, occupied or unseen space. The grid has been generated using ultrasonic sensors. Although the aperture angle of each sensor was wide, the resulting grid turned out to be a quite accurate representation of the environment.
An octree-based approach to represent arbitrary 3D structures has been presented by (Meagher, 1982). Initially only binary occupancy information had been considered, but later probabilistic occupancy information has been added by (Payeur et al., 1997). (Hornung et al., 2013) introduced propability clamping in order to ensure fast adjustment of the representation to a dynamic environment. This also enabled a nearly lossless compression strategy. The framework implemented by the authors has gained popularity within the robotics community and is known by the name OctoMap.
Based on the theoretical foundation provided by this work, we proposed a concept for occupancy representation on a global scale (Gehrung et al., 2016). The problem of artifacts caused by the discretization inherent to occupancy grids has been solved using an iterative refinement algorithm for occupancy octree generation (Gehrung et al., 2018).

Change detection
The meaning of the term change detection depends on the context and the field of research in which it is used. It plays an important role in photogrammetry and remote sensing, where it is used for tasks such as urban area monitoring and updating geo-databases (Du et al., 2016). Approaches similar in effect, but different in name can be found in the field of 3D computer vision. Here it is applied to tasks such as the Detection and Tracking of moving Objects (DATMO) (Litomisky, Bhanu, 2013) or for the distinction of dynamic objects and static background (Azim, Aycard, 2012). However, a far better categorization of methods can be made on the basis of the sensor system used, more precisely on the processing of the measurements generated by it. Since this work deals with mobile laser scanning, only approaches that use 3D data are considered.
Surface points. Point clouds are a common representation for 3D measurements, so using them in change detection approaches is straightforward. (Girardeau-Montaut et al., 2005) proposed several simple cloud-to-cloud comparison algorithms, organizing the point clouds using an octree in order to increase performance. (Zeibak, Filin, 2008) converted point clouds from a stationary terrestrial laserscanner into depth images in order to compare multiple epochs with each other. The downside of these approaches is that they can't distinguish between changes and incomplete observations.
Rays. In addition to the surface points, 3D measurements usually provide information about the free space traversed by the measurement. This information is critical in order to distinguish changes and incomplete observations. A popular approach utilizing this was proposed by (Underwood et al., 2013). A ray comparison strategy based on a spherical grid is used in order to compare point clouds from different epochs or view points. (Hebel et al., 2013) apply the Dempster-Shafer theory of evidence to derive changes based on rays originating from airborne laser scanning. A voxel grid is used for performance reasons. (Xiao et al., 2015) proposed a similar approach for mobile laser scanning data. Methods of this category allow accurate conclusion, but require the evaluation of all rays within the region of interest and are therefore computationally very expensive.
Cluster based. Some approaches utilize segmentation techniques in order to extract clusters, which in turn are used for the actual change detection. This requires well working segmentation algorithms or classifiers for cluster extraction. (Schachtschneider et al., 2017) use an occupancy grid in order to assess the temporal behavior of clusters extracted from point clouds of urban environments. (Aijazi et al., 2013) proposed an approach that classifies clusters into known permanent and temporary classes. A similarity map derived from an evidence grid is used, inter alia, for change detection.
Occupancy grids. Grid based occupancy representations provide similar advantages in terms of information as the raybased ones, albeit in a condensed form. (Pagac et al., 1996) utilized the Dempster-Shafer theory of evidence to generate a 2D occupancy grid that serves as environment representation for autonomous driving. (Wolf, Sukhatme, 2004) use a similar approach for robot navigation that utilizes two grids to determine static and dynamic parts of a scene. (Azim, Aycard, 2012) also divide measurements of an environment into static and dynamic elements by utilizing conflict search on an occupancy grid based on the Octomap framework. We have successfully applied probabilistic occupancy grids for short-term change detection in order to detect and extract moving objects (Gehrung et al., 2017). Additionally we proposed a fast indicator for detecting changes (Gehrung et al., 2019). An octree-based structure utilizing Gaussian kernels has been used in order to represent the occupancy state of the environments. Thereby, we also coined the term delta octree, which describes the differences of occupancy and observation between two epochs.
Most of the approaches mentioned above do not scale to large amounts of sensor data, such as the one generated by a state-ofthe-art mobile laser scanning system. Surface point based approaches as well as cluster-based ones are not able to handle occlusions, ray-based approaches encounter performance issues. Approaches based on occupancy grids either suffer from information loss due to crude resolutions or require great effort in order to generate high spatial resolutions. Also, none of the proposed techniques can correctly recognize partly moved objects out-of-the-box. (Lindenbergh, 2010) distinguishes between change detection and deformation analysis. In contrast to change detection, deformation analysis does not determine a purely binary change, but rather the degree of change. There are different classes of approaches, each one of them working on a different abstraction level. Point-by-point deformation analysis uses individual measurement points or the corner points of a regular grid, like the approach proposed by (Lague et al., 2013). Object-oriented deformation exploits some kind of object representation, such as geometric primitives (Xu et al., 2013) or building footprints (Rutzinger et al., 2010). A middleway between both classes of approaches are morphological maps, where the distances between points and a geometric representation of the object are determined (Pesci et al., 2013).

About the term changes
The scope of this work is on long-term changes that typically appear when comparing epochs that have a timespan of days, weeks or month in between. Short-term changes such as moving objects are not considered. Furthermore, we define change as the result of a semantically meaningful action that is applied to some kind of object instance. Examples for this would be moving a car or breaking a hole into a wall. Usually change detection is limited to determining appeared and disappeared areas by comparing two or more epochs. This leads to the following problems. Whenever an object has been partially moved, it is broken down into an appeared, a disappeared and an unchanged part instead of being recognized as the same object at different locations (see Figure 3). Also, when an object is replaced by another one of about the same size, detecting a change by comparing surface points or occupancy information only could lead to partial or no detection at all.

Class definitions
In order to deal with the problems mentioned above, the change detection approach presented in this work classifies changes into five classes and a residue class. An illustration can be seen in Figure 2. The classes are defined as follows.
Appeared. An object instance appeared. The space encompassed by it is free in epoch t but occupied in epoch t + 1.
Disappeared. An object instance disappeared. The space encompassed by it is occupied in epoch t but free in epoch t + 1.
Partially moved. An object instance has been moved between epoch t and t + 1, but in a way that it overlaps when comparing both epochs with each other.
Fully moved. An object instance has been moved between epoch t and t + 1, but in a way that it does not overlap when comparing both epochs with each other. The object instance has the same appearance in both epochs and the distance between its center points is less than a predefined threshold.
Deformed. An object instance encompasses the same space in both epochs t and t + 1 and has approximately the same dimensions, but a different form. If an object instance exists in both epochs, but with a significantly different size, the object instance is considered to be disappeared and another one to be appeared.
Unchanged Residue class which is applied to object instances that are considered to be unchanged. It is used in case no class of the above applies.
Appeared and disappeared are the classes usually used in change detection. The partially moved class solves the typical change detection related issue mentioned above and is a valuable source of information, as the class fully moved is. The deformed class is required to denote object instances that have significantly changed their appearance or have been replaced by instances of other objects.

Reasons for combining voxels and point clouds
As mentioned in Section 2.2, change detection methods either utilize volumes or some variation of point clouds. An advantage of point clouds is their ability to represent the environment with the highest resolution available, given the sensor measurements. However, point clouds gained from mobile laser scans do only provide occupancy information if the latter ones are processed as rays, but the evaluation of all available rays within an area is a time-consuming process, especially when considering large areas or huge amounts of measurements common for mobile laser scanning. Volume-based data structures like octrees or occupancy grids have the inherent advantage that they are able to represent occupancy as well as explorational information. They enable the derivation of a so called delta octree that encodes information about changes between epochs (Gehrung et al., 2019). It is also able to mark areas for which no statement about a change of occupancy can be made. The challenge with volume-based technologies is that the maximum resolution is limited. For one thing, this is due to performance issues regarding the voxel generation. Even fast algorithms such as the one using iterative refinement previously proposed by us (Gehrung et al., 2018) require large runtimes for resolutions of about 10 cm.
Another limitation is the well known Nyquist-Shannon sampling theorem that describes the condition required for the sampling of a continuous signal. In this case it can be used as an analogy to understand the effects that take place when an environment is reconstructed from laser scanning data. The generation of a voxel grid from laser scans represents such a reconstruction. Figure 4 shows the effects appearing when a voxel grid is constructed using a resolution too fine for the available laser scans. A fragmentation occurs in a way that the comparison of two epochs would compare most voxels with unseen areas. The reason for this is that the laser scanner sampled the environment in a frequence too low to allow for its reconstruction at the given resolution. The correct frequency would be a function of the scanner mount angle, the distance to the surface and the speed of the mobile mapping system. As a result, the actual sampling frequency varies greatly in practice.
This work attempts to combine the advantages of both volumes and point clouds. The goal is to propose a change detection algorithm that is able to utilize occupancy and explorational information from a voxel-grid with a coarse resolution in order to detect and evaluate candidates selected from a point cloud with a higher resolution. This approach is also intended to solve the problems regarding partially moved and deformed objects described in Section 3.1.

Overview
This section provides a quick overview of the approach presented in this paper. A full explanation is given in the following sections. A flow chart illustrating the process can be seen in Figure 5. The input consists of the point clouds of both epochs as well as the delta octree. In a first step, candidates are selected from the point clouds of both epochs. Clusters representing appeared and disappeared volumes are extracted from the delta octree and intersected with the candidates. This results in what we refer to as basic instances. These represent object instances that either appeared or disappeared. This term has been chosen in order to describe the kind of information usually generated by state-of-the-art change detection approaches. Based on the basic instances, the extended instances are determined. These are partially and fully moved object instances, as well as deformed object instances. In a final step, the basic and extended instance lists are checked for plausibility. Each individual step is described in the subsequent sections.

Candidate selection from point clouds
The candidate selection describes the segmentation of the point cloud into clusters. These clusters are considered to have some kind of semantic meaning, they are therefore instances of an object set. They can, for example, correspond to people, cars, vegetation or buildings. For the sake of simplicity, we assume that removing the ground plane and applying a simple Euclidean clustering algorithm will provide said clusters.
Filtering is an important step following the clustering, be it in order to remove clutter or to provide additional boundary conditions. These could be a given cluster dimension when searching for persons or if a focus on large objects is required. The following criteria are used, specified as intervals.
• Number of points • Cluster height, weight and length • Height above ground Other criteria are possible, therefore the above list is not necessarily complete.

Extracting voxel-clusters from the delta octree
The delta octree is a volumetric data structure that encodes the changes within the occupancy and observation of the environment. Note that all ground points had been removed before generating the occupancy grid in order to increase performance and to avoid clutter. The delta octree is segmented into clusters representing appeared and disappeard objects. Starting from a seed list, flood filling is applied in order to get said clusters. As with point clouds, filtering is used. The following criteria are used, specified as intervals: • Volume • Height above ground Other criteria are possible, therefore the above list is not necessarily complete.

Determining basic instances
The lists of appeared and disappeared object instances, the basic instances are determined by calculating the overlap between the candidates extracted from the point clouds and the voxel clusters. All voxel clusters are intersected with all point cloud clusters within a given range. For each correspondence, the proportion of points inside the voxels of the cluster is calculated. Filtering based on the activity threshold ta is applied in order to determine whether or not a candidate is added to the list. This process is executed for appeared and disappeared clusters.

Determining extended instances
The extended instances are derived from the basic instances. They represent partially moved, fully moved and deformed object instances. A search within the local neighborhood between both appeared and disappeared object instances is executed. Therefore, all appeared and disappeared instances are compared with each other. If the volumes of the bounding boxes have about the same order of magnitude, further comparisons as shown in Figure 6 take place. Figure 6. The case discrimination that is applied to determine the extended instances.
An intersection test based on axis-aligned bounding boxes is applied. If it is successful, an advanced intersection test using a k-d tree is used in order to determine whether or not object instances actually intersect. If they do, their similarity is compared using the technique described in Section 4.6. If there is a similarity, the object instance pair is considered to be partially moved. In case they are not similar, the pair is considered to be deformed. If there is a similarity but no intersection, the object is considered to be fully moved.

Comparing candidate similarities
Basic instances are checked for similarity by comparing the point clouds representing them. Two point clouds A and B of different epochs are compared in a point-wise manner. Since it cannot be assumed that there are no occlusions and the environment is completely observed, observability information extracted from the delta octree is incorporated. The difference between two clouds is determined using the error function e. Both clouds are considered to be similar to each other if the function value is less or equal to a similarity threshold ts.
Before both clouds can be compared, cloud A needs to be aligned to cloud B. Therefore, an initial transformation is determined using a centroid-based technique. Based on this, RANSAC-enhanced ICP is applied. Cloud A is then compared to cloud B in a point-wise manner using a variation of the mean square error. It is defined as NA is the number of points in cloud A. The deviation di between two points in cloud A and B is defined as if area unobserved in B. (2) For each point p A i in cloud A, the neighbor point p B i in cloud B is determined. A maximum distance is used for the neighborhood search. If a neighbor is found, di is set to the squared distance between both points, otherwise to a penalty constant. The constant is designed in a way that a comparison to the similarity threshold ts fails as soon as a predefined percentage of all points in cloud A does not have a neighbor in cloud B. The percentage of points is defined by the penalty threshold tp. In case no neighbor is found, the delta octree is checked whether or not the area has been observed in cloud B. If this is the case, di is set to zero, since it is impossible to say whether or not that part of the object instance has changed or not been observed. The error function e combines both comparisons e B A and e A B . It is defined as where the score for a comparison in each direction is weighted by the number of involved points. The logarithm is used to stretch the result space in a way that benefits the selection of the similarity threshold ts.

Plausibility check
The purpose of the plausibility check is it to keep the lists containing the basic instances consistent with the lists of the extended instances. Since the latter ones are derived from the former ones, there are some object instances that are part of both the basic and extended instances. A simple search is applied to remove all of these object instances from the basic instance lists.

Synthetic dataset
In order to verify that our approach is capable to identify the change classes defined in Section 3, we created a synthetic LiDAR dataset using the Blensor plugin for the free software Blender. For each class we created a simple three-dimensional scene, each with two epochs. Each scene consists of a flat surface with one or more geometric primitives and is available in two versions. In a fully observed version, the scene has been scanned from all cardinal directions, therefore four LiDAR scans were generated. In the partially observed version, only a single LiDAR scan showing the scene from one cardinal direction has been created. In addition, a demo scene containing examples for all classes has been created. It serves for illustration purposes only.

TUM MLS datasets
The evaluation with real-world data is based on the TUM City Campus 1 datasets (Gehrung et al., 2017). They consist of two measurement rides that took place in spring 2016 and winter 2018. It is well suited for change detection, since there are a variety of minor and major changes between both epochs. Both datasets are publicly available under a Creative Commons License.
The datasets have been recorded using the measurement vehicle MODISSA (Borgmann et al., 2018). It is equipped with two Velodyne HDL-64E LiDAR sensors mounted at an angle of 25°o n the vehicle front roof. This allows recording both the area in the vicinity of the vehicle as well as building facades. The datasets consist of a sequence of individual scans, with each scan covering approximately 0.1 seconds of data aquisition. All single points in all scans have individually been georeferenced using the on-board Applanix POS LV navigation system that combines navigation data from two GNSS antennas, an inertial measuring unit and a distance measurement indicator (Diehm et al., 2020).
In order to further increase the quality of intra-and interepoch registration of the scans in the dataset, we applied methods known from graph-based SLAM. In an additional fineregistration step, we combined multiple consecutive scans to chunks which in turn were registered against an accumulated point cloud of the first epoch using ICP. From this data we extracted a subset containing the TU Munich inner city campus.
Labels for all moving objects were manually generated and used to remove them from both epochs. In addition, we created labels for all objects that disappeared from the first epoch and appeared in the second one. The labels are used during the evaluation as ground truth for changed objects. Evaluating the plausibility of an object instance class is done by a human observer, due to reasons mentioned in Section 6.2.

Synthetic dataset
The synthetic dataset has been used for developement purposes and to determine initial parameters for the real-world data. For cluster activity threshold and penalty threshold ta = 0.1 and tp = 0.25 were found to be appropriate. The similarity threshold has been defined as ts = 1.5. The maximum search distance for neighbors is set to 0.3 m. The synthetic dataset shows that object instances in every scene are recognized and classified correctly, even if the object instance is only partly observed. It also shows that there is a clear difference in the error functions value for objects that are similar (e < 1.0) and those that are not (e > 4.0). Furthermore, it demonstrates that considering the effects of incomplete observation is important in order to suppress apparent deviations of object instances due to occlusion. Figure 7(a) and 7(b) show the demo scene.

TUM MLS datasets
For reasons explained in (Gehrung et al., 2016), we broke the dataset down into 16 3D tiles, each of them with an edge length of 32x32x32 m. Our change detection method was applied to each of them separately. The parameters are the same as in Section 6.1. The edge length of a voxel is 0.5 m. Figure 7(c) and 7(d) shows pedestrians correctly identified as fully moved. It also demonstrates the effect of segmentation errors. In the first epoch the point cluster of the dumpster is merged with the point cluster of the building. Therefore the dumpster is recognized as appeared, not as deformed. Figure 7(c) and 7(d) demonstrate that an object instance can be part of multiple extended instances. The golden object could either have been deformed (switched with another, unknown object) or been replaced by one of the objects to both of its sides. During the evaluation, it turned out that the main cause of incorrect classifications can be attributed to the above-mentioned segmentation errors. Those distort the results and make the evaluation challenging, since the assessment of such cases requires interpretation by a human observer. This is the reason why the qualitative evaluation was done based on labels, but manually. Table 1 shows the quantitative results of the evaluation. The overall accuracy of our change detection method is 88 %. The exclusion of all misclassifications caused by segmentation errors would lead to an overall accuracy of 93 %.
It should be noted that both the partially moved and deformed cases are too underrepresented to draw meaningful conclusions. This is mostly caused by said segmentation errors, but also due to the nature of the datasets. Table 2 shows precision and recall with and without considering segmentation errors. Both precision and recall are high for classes with many examples. The precision for appeared and disappeared object instances is  high, since those are the simplest cases to detect. The precision for partially moved and deformed object instances is in the medium range, most likely because of the underrepresentation of these cases as mentioned above. Both precision and recall increase drastically once the segmentation errors are excluded.

Runtimes
All reported results were generated on a machine with 64 Gigabyte of RAM and an Intel Core i7 processor with 3.5 GHz and 36 cores. The accumulation of all LiDAR-measurements and and distribution on 3D tiles needs about 10 minutes. Generating the occupancy grid requires 17.9 minutes per 3D tile and epoch on average, that is 4.8 hours per epoch. Creating the delta octrees for all 3D tiles and epochs takes less than 2 seconds. The runtime for our change detection approach varies with the amount of cloud clusters and the number of points within. It is between 30 seconds and 10 minutes. While the overall runtime of our method for the TUM City Campus datasets is in the range of hours, point-based approaches usually require minutes, whereas ray-based methods such as the one presented by (Underwood et al., 2013) require days.

CONCLUSION AND FUTURE WORK
In this paper we presented a novel change detection approach that is able to identify partially moved, fully moved and deformed object instances. Our method is able to handle incomplete observations, but is far from being as computationally expensive as comparable approaches. The overall accuracy is 88 %, but can be improved to at least 93 % by eliminating segmentation errors in the candidate selection. For future work, we propose to solve these segmentation errors using advanced techniques auch as semantic segmentation. Furthermore, we intend to further decrease the runtime by only generating the occupancy grid at locations that contain possible candidates.