3D ONLINE TERRAIN MAPPING WITH SCANNING RADAR SENSORS

Environmental perception is one of the core requirements in autonomous vehicle navigation. If exposed to harsh conditions, commonly deployed sensors like cameras or lidars deliver poor sensing performance. Millimeter wave radars enable robust sensing of the environment, but suffer from specular reflections and large beamwidths. To incorporate the sensor noise and lateral uncertainty, a new probabilistic, voxel-based recursive mapping method is presented to enable online terrain mapping using scanning radar sensors. For map accuracy evaluation, test measurements are performed with a scanning radar sensor in an off-road area. The voxel map is used to derive a digital terrain model, which can be compared with ground-truth data from an image-based photogrammetric reconstruction of the terrain. The method evaluation shows promising results for terrain mapping solely performed with radar scanners. However, small terrain structures still pose a problem due to larger beamwidths in comparison to lidar sensors.


INTRODUCTION
Over the last years, considerable progress was made in the broad research field of robotic navigation and mapping. Many proven methods have been developed and published for applications such as automated driving, obstacle avoidance, 2D/3D mapping and target detection. However, most of these methods rely on optimal environmental conditions (e.g. clear visibility) in order to work properly. In practice, visibility conditions are often poor, especially for outdoor applications. Day and night cycles change the illumination conditions. The weather is constantly changing and impeding the visual perception of optical sensors, e.g. due to rain, snow, fog, or dust. In emergency operations smoke can also compromise the visibility. Nonetheless, optical sensors like laser scanner and visual-spectrum cameras are predominately used in robotics to sense the environment. Laser scanners are fast, accurate and can sample the environment in very high resolution, but they are sensitive to bad weather conditions. Cameras are versatile and cheap, but the quality of a stereo 3D reconstruction heavily decreases under non-ideal conditions, e.g. poor lighting or dirt on the lenses (Woodside Capital Partners, 2016). Under these conditions, such sensors are prone to erroneous measurements or require constant cleaning. As a consequence, these non-robust sensor technologies cannot be used for autonomous vehicles in the above-mentioned application areas.
Recent developments of scanning radar sensors enable environment measurements like typically known of lidar sensors, but with a more robust performance in harsh environments. On the downside, these radar sensors are quite inaccurate compared to a lidar scanner and prone to erroneous measurements due to their physical principle. While the distance measurements based on radar are rather precise, the angular precision is quite low. This is mainly due a very large footprint of the emitted wave with a typical half-power beamwidth between 2 • and 8 • . Besides that, reflected echos can also stem from the side lobes of the antenna, but are typically assigned to the main lobe; this causes a large dis- Figure 1: Wavelength impact on particles (Winkel et al., 2011): Short wavelengths of lidar sensors (compared to particle sizes) tend to scatter back if propagating through dusty mediums; transmitted radar waves of mmWave radars at 76 GHz have the advantage of traveling through dust particles due to the longer wavelength. placement of the echo when georeferencing the radar measurements. Additionally specular reflections often occur with millimeter wave (mmWave) radars due to the large wavelength of up to several mm (compared to lidars with a wavelength of approx. 1 µm), where even relatively rough surfaces appear like a mirror to a radar wave (Clarke et al., 2013). This can lead to falsely detected targets, so-called ghost objects as well as missed targets, if the wave is reflected away from the sensor. In Figure 2 these two problems are visualized: indirect paths of the emitted radar wave can occur due to specular reflection on objects (blue color). The longer traveling times result in ghost targets at greater distances along the direct path axis.
In robotics and the field of autonomous vehicles, it is essential to enable vehicle perception of an environment with their own onboard sensors. The latest push in the automotive sector, driven by the demand for autonomous cars, accelerated the work in the research on radar technologies and signal processing methods (Brisken et al., 2018). Autonomous ground vehicles are typically equipped with lidar sensors, cameras and radars (Ilas, 2013). In comparison to the first two sensor technologies, radars are not commonly used for terrain mapping tasks due to their lower mea-ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume V-1-2020, 2020 XXIV ISPRS Congress (2020 edition)  Figure 2: Example of possible radar multipath propagation compared to direct path propagation: the black (real) target is measured from path A-B-A; the blue target (ghost target) is measured from path A-B-O1-A.
surement accuracies. A significant advantage of radar compared to lidar is the robustness against harsh environmental conditions. This property can be achieved, because the transmitted electromagnetic wave of a mmWave radar has a much bigger wavelength compared to lidar (Figure 1). At an operating frequency of 76 GHz, a radar wavelength of about 4 mm allows the propagation through dust and water particles (Winkel et al., 2011).
A sensor analysis for lidar and radar showed, that in a test scenario with rain and fog the robustness of tested lidar sensors dropped rapidly, while radar sensors reached 100 % of the target acquisition probability (Ryde and Hillier, 2009). Brooker et al. (2006) stated that under extreme conditions, where the visibility may be reduced to just 1 m by coal dust or water droplets, radar sensors perform nearly undisturbed by only noticing small signal attenuation. Even radar sensors covered with a layer of clay, mud and coal, were still accurate and reliable for operations (Winkel et al., 2011).

RELATED WORK
Most of the work available about perception and mapping of the environment with range finders from vehicles focuses on the whole simultaneous localization and mapping (SLAM) problem. This is mainly due to the fact that the pose of a robot is needed to transform sensor measurements into a global frame. However, if the pose of a robot is given at any time, mapping with known poses can be performed (Thrun et al., 2005). In the photogrammetric community, this process is often also denoted as direct georeferencing on the basis of a given vehicle trajectory. Currently, only few work on the topic of radar-only mapping exists and it needs to be differentiated between the mapping of obstacles in 2D, 2.5D and 3D maps. 2D maps are often used in the automotive area for obstacle detection tasks (Brisken et al., 2018;Werber et al., 2015). Hillier et al. (2015) used a 95 GHz (λ = 3 mm) radar sensor for a comparison with Riegl and Sick laser scanners on the basis of a digital terrain model (DTM). The radar data processing showed that using only the radar's first-point-return decreased point quality, because the glancing angle of incidence reported a nearer target than in reality (due to the large beamwidth of the radar sensor). To reduce clutter in the data, fixed value thresholding of the signal intensity was performed. They concluded that the used radar sensor had high measurement uncertainties and erroneous measurements due to the side lobe antenna characteristic. It was also shown that radar provided a rough environment map, where big obstacles could be identified and that measurements were robust in harsh environmental conditions. They also pointed out that laser scanners have superior performance compared to radars, but radar technology could be additionally used to give the indication, if laser range measurements have been degraded by environmental conditions. Clarke et al. (2012) used horizontally mounted radars at an operating frequency of 24 GHz (λ = 1.2 cm) to map environments in 2D occupancy grids. Hence, the ground itself was assumed to be traversable and only obstacles like cars and buildings were mapped. For occupancy modeling, the whole radar spectrum was used, i.e. the intensity of the backscattered echo for all range bins between the sensor and the environment for each beam. Occupancy probabilities for the modelled beam were calculated as a function of intensity in the range bin and of the angle to the main beam axis. The grid sizes were chosen with 0.55 m for a car park scene and with 1.1 m for an outdoor field with vegetation. This large grid sizes indicate the high measurement uncertainties, as the used sensor had a nominal azimuth beamwidth of 5.5 • . The whole radar spectrum returned by the radar sensor was also used to perform ground segmentation (Reina et al., 2011) and obstacle segmentation (Reina et al., 2015). The recursive mapping approach described in the next section uses a voxel structure to represent the environment. In this context, Hornung et al. (2013) published a voxel-based framework named OctoMap to represent a probabilistic 3D environment map. The voxel map was utilized by Maier et al. (2012) for robot perception in indoor scenarios. They used an Asus Xtion depth camera, which allowed the 3D mapping of dense voxel environments. After segmenting the planar floor, a 2D projection of the local map was used for further tasks like path planning or collision avoidance. The OctoMap-based representation was also utilized for terrain mapping and motion planning of a quadrupedal robot equipped with an Asus Xtion depth camera (Mastalli et al., 2018). The chosen voxel size of 2 cm limited the use for mapping of large-scale environments due to high memory requirements. Therefore, a robot-centric map was used which only displayed the camera's field of view.

RECURSIVE MAPPING APPROACH
In this section, a new 3D mapping method which takes the radar sensor characteristics into account is introduced; we denote this method as recursive mapping approach. The main goal of the method is to map the relatively high uncertainty of radar measurements (especially in comparison to lidar measurements) into the 3D space. Thus, a radar target does not just produce a single point in 3D space, but occupies a larger volume defined by several voxels. Doing so, in a first step, a 3D voxel occupancy map is aggregated over time where the value of each voxel represents its occupancy probability (Section 3.1). Then, in a second step (Section 3.2), the above mentioned problems of multipath, faulty assignment of radar targets to the main lobe, and the non-consideration of the beam's incidence angle (Section 3.1.1) are mitigated by deriving the final output of this method, a 2.5D height map (DTM).
To provide a general overview, Figure 3 depicts the main outputs of the method and the evaluation strategy chosen for this paper. Due to the noisy sensor characteristics and the sparse point clouds generated with radar sensors, creating a 3D probabilistic voxel map with a standard 2D inverse sensor model (ISM), which considers the uncertainty of range only, leads to insufficient results. Therefore, we use a 3D ISM which better reflects the antenna characteristics by additionally considering the lateral uncertainty of the beam. Our approach calculates occupancy probabilities for all voxels, that are located inside of a simplified 3D radar beam.
The states of these voxels in the global voxel map represent the perceived environment. For the implementation, the probabilistic map-framework OctoMap (Hornung et al., 2013) is used, which is a memory-efficient octree structure enabling the creation of 3D voxel maps. The framework provides methods for the insertion and update of voxel states and for pruning/expanding of nodes to dynamically adjust the voxel map as needed.

Terrain Determination Evaluation Map
image-based offline reconstruction as dense point cloud

Recursive Mapping Approach
radar-based online reconstruction as voxel map

Evaluation Method
Figure 3: Method overview.

3D Voxel Occupancy Map
For each shot, the occupancy probabilities inside the modelled radar beam (Section 3.1.1) are calculated with the newly introduced ISM (Section 3.1.2). By knowing the pose of the robot, the measurements can be inserted into a global voxel map. Each voxel of the map is treated individually with a Static State Binary Bayes Filter (Elfes, 1989). The voxel state for map cell mi based on all measurements z1:t can be described as the occupancy probability p (mi|z1:t): Each voxel can have a occupancy probability p (mi|z1:t) between 0 (free) and 1 (occupied), while a probability value of 0.5 means the state of a voxel is unknown. The first factor of Equation 1 describes the voxel occupancy probability from previous measurements z1:t−1. The second factor of the multiplication corresponds to the ISM. It represents the occupancy probability for map cell mi from a measurement at time t and is used to update the cell. Finally, the third factor describes the initial cell state. It is typically defined as p (mi) = 0.5, which indicates an unknown cell state. To avoid truncation errors by computing floating-point multiplications near the values 0 and 1, log-odds representations (Hornung et al., 2013) can be used to update the probabilistic voxel states. The term log-odds is defined as the natural logarithm of the odds of a probability: This leads to the final map cell occupancy model with log-odds notation: The relation between occupancy probabilities and their log-odds representations is shown in Table 1. It can be seen that cells, which have strong evidence of occupancy converge to +∞ and cells that are evidently free converge to −∞. If a cell state is unknown, the log-odds are exactly 0. Due to the reason that the log-odds values have no defined limits, system inertia can occur, because a static map is assumed. To reduce cell state inertia, the maximum and minimum log-odds values can be limited.
−∞ cell state is certainly free 0.5 0 cell state is unknown 1 +∞ cell state is certainly occupied Table 1: Relation between occupancy probabilities and log-odds representation and their corresponding cell occupancy evidence.
An example for a generated voxel map from real radar measurements is shown in Figure 4: on the top, occupied (red color, p (mi|z1:t) > 0.5) and free (green color, p (mi|z1:t) < 0.5) voxels are visualized, while on the bottom only occupied voxels are marked. Brighter colors indicate a higher evidence, that a voxel is occupied/free.

Voxel-based Beam Modelling
In order to update all individual cells that lie in a radar beam, a simplified beam model is defined. The half-power beamwidth (HPBW or θ) of a radar beam is defined as the angle between the points measured at −3 dB and the maximum beam peak (National Communications System Technology and Standards Division, 1996). Hence, the power at these points is equal to ≈ 1 2 of the maximum beam peak of a radar lobe. Under the assumption of a pencil beam and a uniformly illuminated antenna, the beam can be described by a cone bounded by the HPBW (Reina et al., 2015). This simplification of the radar beam can be done, because outside of the HPBW, the power density decreases sharply so that the impact on a measurement is negligible. While mapping the sensor measurement into a global map, the incident angle is usually not knownimplied that there is no additional environment information available from which it can be derived, e.g. a DTM. To overcome this ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume V-1-2020, 2020 XXIV ISPRS Congress (2020 edition) issue, an ideal beam footprint that is orthogonal to the main beam axis is assumed. If the radar sensor is tilted as shown in Figure 5 (left), this assumption can be wrong. In the 2D example a radar

Scenario 1
Sensor Scenario 2 Figure 5: Examples of real (red color) and assumed (black color) beam footprint: Assumed radar beam footprint at non-orthogonal incidence angle (left) and orthogonal incidence angle (right).
sensor is aimed at the ground. The terrain is marked as a rough brown line, the red lines show the real radar beam footprint on the uneven terrain and the black lines describe the assumed radar beam model used for determining the voxels inside the beam. For scenarios like in Figure 5 (right), the orthogonal modelled beam footprint is almost identical with the real beam footprint on the terrain. All voxels, that lie inside this modelled beam are determined with a voxel traversing algorithm. The input parameters are the HPBW of the deployed radar sensor, the sensor origin and the radar target position. Figure 6 shows radar beams formed by voxels with different voxel sizes. For the visualization of the modelled beams, a HPBW of 4 • and a simulated target range of z = 22 m were chosen. It can be seen that for high voxel resolutions, the algorithm performs as wanted to find the voxels inside of the modelled radar beam. However, the larger the voxel size, the more the discretization effects become observable. Hence, for larger voxel sizes, radar beams cannot be represented accurately, but the number of voxels in the beam can be reduced drastically. Doubling the voxel size results in a decrease of voxels by a factor of 8. This trade-off can be crucial for the deployment on strongly memory-constrained systems. It can be observed, that at ranges near the sensor origin, the discretization effects have a large impact.
(a) Voxel resolution: 0.05m (b) Voxel resolution: 0.1m (c) Voxel resolution: 0.2m (d) Voxel resolution: 0.3m Figure 6: Discretization effects for voxels in radar beam at different resolutions, the displayed examplary radar beam has a beamwidth of θ = 4 • from the sensor position to the target position at a distance of z = 22 m.

Inverse Sensor Model
For voxels inside the modelled beam, an occupancy probability p (mi|zt) (cf. second factor in Equation 1) is estimated. To take the characteristics of radar sensors into account, a 3D ISM is defined. The ISM consists of three individual components, that together form the resulting voxel occupancy probability for a measurement zt, that is used to update the voxel map:  (2016) where a lidar sensor was used instead of a radar sensor. It is a convolution of an ideal sensor model and a Gaussian distribution in order to model the noise of the distance measurement. Within the space between the radar sensor and the reflecting object, low occupancy values are assigned to indicate free space. Around the radar target, the occupancy probability rises to a maximum value to model occupied space. Thus, the range component describing the probability of occupancy for voxels does not correspond to a probability density function (PDF). Instead, each value of the function is independent for the probability of voxel occupancy and needs to be treated individually. This results from the assumption of conditional independence of each map cell. The functions describing the range component are defined as followed: where Pu = unknown voxel state Pf = occ. prob. indicating free space Po = occ. prob. indicating occupied space l = distance along the main beam axis z = measurement distance to radar target σz = distance measurement uncertainty An example of the range component is shown in Figure 7. lateral component models the radar characteristics, i.e. the probability of a radar target being located at the beam main axis is the highest and is steadily decreasing towards to the border of the simplified beam model. The lateral component approximates a normed bivariate Gaussian function, i.e. its maximum value is equal to 1. The proposed function is therefore no PDF. The lateral component is defined as: where ρ = correlation coefficient (ρ = 0) di = distance in lateral direction i µi = mean (µi = 0) σi = lateral uncertainty in direction i

Scaling Component
The scaling component is used to decrease the probability value, the longer the measurement distance z between the radar sensor and the radar target becomes. This component results in the effect, that measured targets at larger distances do not influence the map update as strong as targets near to the sensor. Thus, the increased angular uncertainty at higher distances due to the large beamwidth is taken into account. The chosen function for the scaling component Cscale is defined by the distance z only. In this work, a linear scaling factor between [0, 1] with the parameters of the minimum distance zmin and maximum distance zmax is used: where z = measurement distance to radar target zmin = minimum distance zmax = maximum distance 3.1.3 Update of Cell States As described above, the first step is to determine all voxels inside of a radar beam between the sensor origin and the measured radar target. This step is displayed in Figure 8 (top), where all determined voxels inside the radar beam are visualized as grey voxels. Afterward, for each voxel an occupancy probability is calculated with the 3D ISM ( Figure  8 (mid)), which contains the three individual components: range component, lateral component and scaling component. In Figure  8 (bottom) an example of a radar beam with occupancy probabilities calculated with the inverse sensor model is shown (red = high occupancy probability, green = low occupancy probability).

Terrain Determination
The 3D voxel map created with the recursive mapping approach described in the previous section, can be used as a basis for the derivation of further maps. For evaluation purposes, a 2.5D height map (DTM) is derived here. Radar targets measured due to multipath propagation can lead to voxels being added below the terrain. Therefore, the presence of ghost targets has to be considered. To do so, all vertical voxels inside an xy cell (on horizontal plane) are analyzed. The height of a cell in xy plane is derived from the voxels with the same x and y coordinates, i.e. a column of voxels. To take erroneous measurements into account, the occupied voxels are grouped into clusters. A cluster is defined as a series of at least two occupied voxels, which are not separated by unknown or free space. Clusters far below the average height of nearby clusters can then be discarded. If voxels inside of the voxel map are believed to be occupied, they will contain occupancy probabilities To favor higher occupancy information compared to values near unknown voxel states, the height information (z-coordinate of a voxel center point) of the individual voxels is weighted with its occupancy value in a weighted arithmetic mean. To achieve a non-influencing weight wi for values near the unknown state, the occupancy value for voxel i is scaled between [0, 1] by a linear scaler as in Equation 9 with the voxel occupancy probability pi, po min = 0.5 and po max = 1.0: where pi = voxel occupancy probability po min/po max = minimum/maximum occupancy probability that indicates occupied space For each analyzed voxel cluster with n voxels, the weighted arithmetic mean h of the voxel center point heights is calculated with Equation 10. With this method, the cell height values are interpolated between the voxel center points hi based on the occupancy probabilities.
where hi = height of voxel center point wi = weight between [0, 1] An example of a voxel map and the corresponding height map is shown in Figure 9.

EXPERIMENTS
In order to evaluate the recursive mapping approach, radar measurements were taken to generate a height map (DTM). Addition-ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume V-1-2020, 2020 XXIV ISPRS Congress (2020 edition) ally, for evaluation purposes, a reference DTM was generated by applying a standard photogrammetric workflow.
The sensor measurements have been acquired with an off-road truck. The truck was equipped with an indurad 360 • 2D Scan-ningDynamicRadar iSDR radar sensor, a front-looking mounted camera and an RTK-GNSS receiver. The sensor setup on the off-road truck is shown in Figure 10. The radar sensor used in this experiment is a 2D mechanically-rotating profile scanner. It gives a single point (echo) for each emitted radar wave, i.e. the full radar spectra cannot be accessed. These echos correspond to the highest registered intensities for each angular step. The field test ground was an off-road area, where the ground mostly contained calcareous soil including large boulders widely scattered throughout the field. The test area used in this experiment has an approximate size of 1000 m 2 . It consists mainly of a road, surrounded by soil ridges on the sides. Due to earlier rain showers, the concentration of water in the soil was high, which led to muddy areas and a significant number of puddles. The relatively large wavelength of the radar waves led to specular reflections on the puddles, resulting in no-data areas in the final georeferenced point cloud.

Radar Sensor
Figure 10: Sensor setup: Off-road truck equipped with an indurad 360 • 2D ScanningDynamicRadar iSDR radar sensor, an RTK-GNSS and an RGB camera (for the generation of a reference model).
Together with the radar measurements, an image sequence was acquired by the front-looking camera. It served two purposes: (a) to estimate the trajectory of the radar sensor (which enables mapping with known poses), and (b) to generate a reference DTM in order to evaluate the outputs of our method. Both, the trajectory and the image-based DTM, have been generated by applying a standard photogrammetric workflow. The reconstruction is made by a bundle adjustment with a subsequent dense image matching.
In the bundle adjustment, the accurate RTK-GNSS information was used as a direct observation of the camera positions. The ex-terior orientations of the images (poses) correspond to the trajectory of the camera on the vehicle. By adding to this trajectory the relative orientation between the camera and the radar sensor, the trajectory for the radar sensor was determined. The main benefit of this approach is that the measurements of both sensors (camera and radar) share the same trajectory, thus systematic errors of the 3D models due to an erroneous trajectory can be excluded. Therefore, all measurement error metrics are relative ones, since both the radar sensor and the camera are mounted on the same vehicle.
By applying the first step of our method, a 3D voxel map with a voxel size of 0.2 m was created. In the second step, a height map was derived with the same grid size. Figure 11 shows the main results of our evaluation. The top left photo shows the reconstructed scene. The generated height map by applying the recursive mapping approach of this scene is shown in left mid image. As mentioned above, the data gaps mainly correspond to the puddles where specular reflection occurred and thus no echo was received. The two upper images on the right side show the image-based reference model as point cloud and DTM. The lower part of the image shows a comparison of the radar DTM and the reference DTM as height difference model. The histogram shows a mean difference of −0.103 m and a standard deviation of 0.160 m. It can be seen, that the radar DTM mainly corresponds to the reference DTM on the road. However, larger differences can be observed at the surrounding soil ridges. This effect stems mainly from the relatively large beamwidth of the radar sensor, which prevents a sharp reconstruction of terrain edges. Additionally, some errors stem from the discretization of space through the voxel structure, which does not allow to represent small-scaled terrain structures.

SUMMARY AND OUTLOOK
This paper presented an online mapping approach for terrain mapping with radar sensors. Radar sensors typically have beamwidth's of many degrees, which leads to low angular precision. A 3D probabilistic voxel map consisting of occupied and free voxels is used for the insertion of radar measurements, where a voxel is treated individually as a Static State Binary Bayes Filter. For every measurement, a cone consisting of voxels is formed to model a simplified radar beam. For each of these voxels, an occupancy probability is calculated by a 3D inverse sensor model, which takes the radar antenna characteristics into account. Finally, the voxel occupancy states are added to a global voxel map. Due to specularity effects, where relatively rough surfaces can appear like a mirror to a radar wave, ghost targets below the terrain have to be considered. Voxel clusters are defined to exclude voxel concentrations far below the surrounding voxel clusters. Afterwards the occupancy probabilities of occupied voxels are used to calculate the terrain height. For the method evaluation, a manually steered off-road truck, equipped with a camera, a radar sensor and an RTK-GNSS, was used to conduct test measurements. The ground-truth was created from an image-based photogrammetric reconstruction of the terrain. The generated height map was then compared to the reference map by a difference grid model. It was shown that with scanning radar sensors promising results for the 3D terrain mapping can be achieved, however, small structures still pose a problem due to the larger beamwidth in comparison to lidar sensors. In future work, the camera-derived and the radarderived results could be additionally compared against a third observation, e.g. by a stable laser scanner. With this, absolute error measures can be derived instead of relative ones.