PERFORMANCE EVALUATION OF A MOBILE MAPPING APPLICATION USING SMARTPHONES AND AUGMENTED REALITY FRAMEWORKS

In this paper, we present a performance evaluation of our smartphone-based mobile mapping application based on an augmented reality (AR) framework in demanding outdoor environments. The implementation runs on Android and iOS devices and demonstrates the great potential of smartphone-based 3D mobile mapping. The application includes several functionalities such as device tracking, coordinate, and distance measuring as well as capturing georeferenced imagery. We evaluated our prototype system by comparing measured points from the tracked device with ground control points in an outdoor environment with four different campaigns. The campaigns consisted of open and closed-loop trajectories and different ground surfaces such as grass, concrete and gravel. Two campaigns passed a stairway in either direction. Our results show that the absolute 3D accuracy of device tracking with state-of-the-art AR framework on a standard smartphone is around 1% of the travelled distance and that the local 3D accuracy reaches sub-decimetre level. * Corresponding author


INTRODUCTION
The demand for capturing accurate 3D information is growing in a wide variety of indoor and outdoor disciplines such as BIM (Building Information Modelling), facility management or (indoor-) navigation. Until recently, environment mapping was a demanding task, requiring highly specialized multi-sensor systems such as terrestrial or mobile laser scanners. Then, new high quality portable mobile mapping systems (MMS) were introduced such as the BIMAGE backpack (Blaser et al., 2017). With such a backpack, 3D point clouds and highly detailed 3D image spaces (Nebiker et al., 2015) of construction sites, tunnels, narrow streets or other areas inaccessible by a car-based MMS can be captured. However, when it comes to keeping the data up to date, using such high-end MMS would be too costly and the system would be restricted to a small group of experts. Hence, there should be a simple and cost-effective solution allowing facility or infrastructure managers to keep the digital twin of their site up to date.
In recent years, the computing capacity of mobile devices has rapidly increased, which is enabling more and more computing intensive applications. A typical example is Augmented Reality (AR) applications, which are very demanding with respect to scene tracking and augmentation in real-time -tasks which were not possible on mobile devices a decade ago. Since Niantic has released Pokémon Go in 2016, the number of AR applications is rapidly increasing. Although Pokémon Go was a geospatial AR application, the most common AR applications are placing virtual 3D objects in an arbitrary scene using either a smartphone or AR glasses. These 3D objects can be as simple as a toy figure or as complex as a scaled 3D city model. Most often, these AR applications are restricted to work only in a single room or a small area.
With the introduction of the AR frameworks ARCore (Google, 2019) and ARKit (Apple Developers, 2019), developing AR applications has been greatly simplified. These AR frameworks support device motion tracking and scene understanding. Visually distinct features from the camera image -called feature points -combined with inertial measurements from the device's IMU are used to calculate the device's pose relative to the environment. Clusters of feature points that lie on horizontal or vertical surfaces such as tables or walls are detected as planar surfaces. Both ARCore and ARKit require mobile devices with calibrated cameras and the generated point cloud is at world scale.
At the Institute of Geomatics at the FHNW, we are developing a new AR mapping application, which shall combine the advantages of the local tracking of an AR framework with referencing the device to a reference image dataset, which has been georeferenced in a geodetic reference system (Nebiker et al., 2015). As a first step, we developed an application, which is able to motion track the device, measure points, localize itself to a specific reference system and capture photographs with absolute orientation (Hasler et al., 2019). Once it is possible to align the captured photographs to a georeferenced image database, this application is ready for various mapping tasks with high global accuracy. In our previous work we presented our application and conducted accuracy experiments in indoor environments. Now, we expand these accuracy experiments towards outdoor areas.
Our paper is structured as follows: in chapter 2, we discuss the related work. In chapter 3, we describe our development and architecture and in chapter 4 and 5, we outline our accuracy experiments and their results. Finally, in chapter 6 we give a conclusion and an outlook to future developments.

RELATED WORK
There are different types of mapping systems. On the one hand, there are static mapping systems like terrestrial laser scanners, which scan the environment with high precision at the cost of a time-consuming data collection process. On the other hand, mobile mapping systems (MMS) are becoming more popular since the data collection can be performed while driving or walking through the environment. Different MMS have been proposed, which can be categorized either by the platform type or by the sensors used. Generally, there are backpack-based systems, handheld systems and trolley-based systems. The Würzburg backpack (Lauterbach et al., 2015), for example, consists of a 2D laser profiler and a Riegl VZ-400 laser scanner. The BIMAGE research backpack system by FHNW (Blaser et al., 2018) combines two Velodyne VLP-16 scanners with a multi-head panorama camera. Among the commercially available backpacks is the Leica Pegasus:Backpack. It uses two Velodyne VLP-16 scanners for the SLAM algorithm and has five divergent rear-facing cameras which are primarily used for visualisation and digitising (Leica Geosystems AG, 2017). Such high-end backpack systems generate accurate 3D information, but they are typically expensive and relatively heavy to carry. Handheld LiDAR-based MMS like the Zebedee (Bosse et al., 2012), are easier to carry even in longer mapping campaigns. The Zebedee combines a lightweight laser scanner and an IMU to generate 3D point cloud. Numerous comparable commercial products such as Zeb Revo from GeoSLAM or Stencil from Kaarta (Zhang, Singh, 2017) are available. Kaarta's Stencil uses a Velodyne VLP-16 scanner together with an IMU for point cloud generation. Wheel-based MMS like the NavVis 3D Mapping Trolley has multiple cameras and four laser scanners (NavVis, 2019). Trolleys like this can be equipped with many sensors but are restricted to flat and obstacle-free ground surfaces. These type of MMS can generate dense and accurate 3D information but geo-referencing is either done in post processing or by measuring ground control points (Lehtola et al., 2017).
For measuring and mapping with consumer devices, there are several AR mapping tools already available. Lenherr, Blaser (2020) evaluated some of them in terms of functionality and accuracy. Smartphone applications such as CamToPlan or Magicplan (Magicplan, 2019) allow users to map and create floorplans. However, there are only a few AR applications which support absolute geo-referencing. One approach to overcome the absolute pose estimation problem is attaching a highly accurate GNSS receiver to the smartphone. Schall et al. (2009) combine a RTK receiver with an ultra-mobile PC and sensor fuse it with inertial measurements and a visual orientation tracker. One commercially available product is Trimble SiteVision (Trimble, 2019). This system uses a GNSSreceiver attached to a smartphone to receive centimetre-level position accuracy. The orientation of the device is then calculated from the travelled trajectory combined with a visual tracking. Another way to solve the absolute geo-referencing problem is presented by Christen et al. (2020). They compare a local orthophoto produced by the mobile application with a cadastral map or a global orthophoto as a reference. These approaches rely either on a GNSS signal or on an available orthophoto and therefore work only outdoors.
Precise automatic absolute geo-referencing, regardless of the environment, is of great interest for numerous applications. Visual Localization is a promising automatic geo-referencing approach with intense research activity. Sattler et al. (2018) distinguish the following visual localization categories: 3D structure-based, 2D image-based, sequence-based and learningbased localization. In our previous work, we introduced a method for visual localization and pose estimation of a single image to georeferenced RGB-D images (Nebiker et al., 2015;Rettenmund et al., 2018). This method works in indoor and outdoor environments and does not require ground control points. However, the success of this method depends on ideal conditions such as up-to-date reference images, same lighting, seasonality and similar viewpoint. Since then, new and more robust methods have evolved. DenseSFM proposes a Structure from Motion (SfM) pipeline that uses dense CNN features with keypoint relocalization (Widya et al., 2018). Sarlin et al. (2019) also use learned descriptors to improve localization robustness across large variations of appearance. These approaches are more robust than using classical local features like SIFT and its variants and have the potential to solve the absolute image orientation problem. However, these approaches are computationally heavy and at the time of writing, only a few approaches are computing in real-time. Therefore, we have not yet implemented one of these approaches in our AR mapping framework.

OUR ACHITECTURE
In this chapter we briefly describe our AR application. Further information can be found in Hasler et al. (2019). The main goals of our development included: a simple operation on a broad range of devices, a compatibility with the two most prominent mobile operating systems Android and iOS, and a real-time capability. The minimal functionality should include the possibility to interactively localize the device in an absolute reference frame using control points, to perform point and distance measurements and to capture georeferenced images.

Underlying Frameworks
Our development is based on the widely used game engine Unity. Unity provides a large number of packages, which can be included into a project. Unity-based applications can be deployed to various operating systems. Our project is developed with Unity's AR Foundation package, which includes built-in multi-platform support for AR applications (Unity, 2018). This makes it possible to develop an application, which can run either Google's ARCore or Apple's ARKit depending on the user's device and operating system. 3.2.1 Device Tracking: The foundation of our application is the device tracking. The underlying AR frameworks support motion tracking by fusing multiple sensors such as accelerometer, gyroscope, magnetometer and camera. Visually distinct feature points from the camera image combined with inertial measurements are used to estimate the device's relative pose to the environment. Furthermore, the framework estimates horizontal and vertical planes with detected feature points, which are mostly located on walls and on the floor.

Implemented Functionality
Once the AR app is started, device tracking starts immediately. The origin of the local AR reference frame coincides with the location where the app was initialised -with the heading of the device defining the direction of the forward axis, the up-axis pointing vertical and the right axis perpendicular to the right.
Since either Google's ARCore or Apple's ARKit only run on calibrated devices and multiple sensors are fused, the AR reference frame is automatically at world scale.

Measurement Functionality:
After the AR app has been initialised, local measurements can be taken immediately. The app supports two measurement modes: point and 3D distance measurements (Figure 1). Other modes such as area, volumetric or slope measurements could additionally be implemented. Both point and distance measurements can either be made directly using individual feature points from the device tracking or using the detected planar surfaces. Measuring on detected planar surfaces has the advantage that measurements can be carried out continuously, even when a surface is lacking visual features.
To execute a measurement, a single tap on the screen at the desired location is required. Depending on the measuring mode, a pop-up window with the local or global coordinates or the 3D distance appears. A 2D distance measurement mode (top down, floorplan) could be implemented additionally, if needed. The coordinates of measured points can be saved to a text file with local and if available global coordinates for further processing.

Global Referencing:
For absolute geo-referencing of captured images and for conducting measurements in a global reference frame, the device needs to be related to a reference frame. For a first version, we realized a 6 degree of freedom (6DoF) transformation using ground control points (GCPs) in order to transfer the local scene into a global reference frame.
To start the referencing process, a list of GCPs can be imported from file. After a successful import, at least three points need to be measured with the AR app and referenced to a GCP by choosing from a dropdown list. Again, measurements can be directly conducted on feature points or on detected planar surfaces. Then, the 6DoF transformation is calculated according to Umeyama (1991) and the residuals are displayed (Figure 2 left). The transformation can easily be evaluated based on residuals and dynamically adjusted by additional point measurements or by the exclusion of points. Once the transformation calculation is correct, any object in the global reference system can be augmented in the camera feed of the app. For verification purposes, the app overlays the GCPs into the camera feed.

Photo Capture with Pose:
Finally, it is possible to capture geo-referenced images. Every time a user takes a photo, the app stores the local pose and if available the global pose (position and orientation). With the app, it is also possible to upload the photo with its pose to a web service. For verification purposes, the captured photograph can be displayed in the AR scene at its real location and with its original pose (Figure 2 right).

PERFORMANCE EVALUATION
We carried out accuracy evaluations based on 3D point measurements in order to determine the performance and stability of motion tracking and the subsequent measuring and mapping accuracy in outdoor environments.
In the following experiments, we investigate the deviations along four trajectories in the park of the FHNW campus in Muttenz near Basel (CMU). In the first case, the trajectory forms a loop with identical start and destination points and in the other three cases, the trajectory describes a route with different start and destination points. We then compare the results with the achieved accuracies in the indoor environment. In all the following experiments, we used common high-end smartphones such as Google Pixel 2 and Samsung Galaxy S9.

Test Site
Our test site is located in the park of the new FHNW Campus in Muttenz/Basel (Figure 3). The site has an extension of roughly 70 x 150 meters and consists of different ground surfaces such as gravel, bitumen and grass. Most of the terrain is almost flat and on the same altitude except for an embankment with a stairway in the east of the park. The height difference of the stairway is roughly 5 meters.
As a reference, we established 46 ground control points (GCPs) which we measured with a GNSS rover Leica GS14 in RTK mode. The overall accuracy of the reference system is < 3 cm. The GCPs are exactly defined natural points like manhole covers, traffic markings and intersections between different ground surfaces.

Trajectories
On this site we conducted experiments with four different trajectories with lengths between 114 and 350 meters (Figure 4).

Full loop:
In the first of the four mapping experiments, we measured a trajectory, which forms a closed loop. In this campaign, we measured 20 points in total. At the beginning, we measured five GCPs and we additionally measured 15 check points (CP) along the loop. The trajectory length is about 350 meters and first covers concrete surfaces and then gravel, grass, and finally again concrete. The height difference of this track is less than 1 meter. The transformation parameters to the global reference frame were derived using the five GCP measurements at the beginning of the trajectory. Once the transformation parameters were estimated, all measured points were transformed into the global reference frame and the residuals to the GCPs were calculated.

Stairs Down:
In the second mapping experiment, we measured a 114-meter trajectory, which follows a stairway downwards with different start and destination positions. After measuring three points at the beginning, we measured four points along the stairway (one at the top, two in the middle and one at the bottom). We then measured eight additional points below the stairs before finally measuring five GCPs. In total we measured 20 points. We again used the GCPs to calculate the transformation from the local to the global reference frame and then transformed all points with these parameters. Finally, the residuals were calculated for all measured points. In contrast to the other campaigns, we measured the GCPs in the end at the bottom of the stairs.

Stairs Up and Down:
The third mapping experiment is similar to the second experiment but in reverse order. We started at the bottom of the stair, where we measured 13 GCPs. We then measured seven points along the stair and on top of the stair before descending again and measuring 10 additional points while heading towards the main building. This trajectory is in total 260 meters long with 30 measured points.

Half loop:
The last experiment starts at the same position as the first and ends at the bottom of the stairway. We measured 5 GCPs at the beginning and then 12 CPs along the 220-meter trajectory.

RESULTS
In this chapter, we show and discuss the results of our four trajectory investigations and compare them to our earlier indoor accuracy investigation (Hasler et al. 2019). First, we examined the tracking quality with statistical analyses from 3D measurements along the four different trajectories, one open and the other three closed. Secondly, we compare the tracking quality between outdoor and indoor environments.

Trajectories
The accuracy of the trajectories was assessed by comparing the measured check point coordinates along the mapping paths with their reference coordinates. The RMSE in both horizontal and vertical direction are surprisingly small considering the travelled distances in the range of 114 meters to 350 meters. Table 2 indicates the maximum error in each direction with respect to the travelled distance (both absolute in meters and relative to the total trajectory length). The maximum horizontal error of all campaigns is 3.9 meters after 258 meters along the trajectory in campaign number 3. The maximum vertical error is 2.2 meters after 282 meters in campaign number 1.  As can be seen in Figure 5, drifts increase with the distance travelled from the start location. The 2D error of the first trajectory (full loop) first increases but after 136 meters starts to decrease again until around 240 meters and then starts to increase again. This phenomenon is only present in the first trajectory. All errors of the second trajectory seem to decrease, however the GCPs of this trajectory are measured at the end and therefore the errors decrease the closer the points are to the end of the trajectory. In the third trajectory, the X component of the position error starts to rapidly increase after roughly 180 meters. This increase happened after the stairway was passed upward and downwards and the device probably started to drift away in the direction of walking.
The vertical error in both the second and third trajectory is small regarding the vertical extent of both trajectories. The maximum vertical error of the second trajectory is only 40 cm and of the third trajectory it is 90 cm at the very end but only 32 cm after going the stairway up and down again. The maximum vertical error at the top of the stairway after going up was 18 cm (Trajectory 3).
The error in the X dimension of the fourth trajectory steadily increases along the travelled path whereas the y-error remains low at the beginning and then starts to rapidly increase. The vertical error is again smaller than both errors in the horizontal dimension.

Comparison with indoor results
When comparing our outdoor results to the indoor mapping campaign (Hasler et al., 2019), the 3D RMSE of the GCP measurements and of the overall measurements have increased (Table 3). Only the second outdoor campaign (stairway down) achieved similar accuracies. The maximum horizontal error in the indoor campaign was 1.6% of the total distance. In our outdoor experiments, we achieved a maximum horizontal error between 0.2% and 1.5% of the total distance.
In contrast to the indoor mapping campaign, neither a rapid vertical shift nor a loop closure was detected during all four outdoor campaigns. In the first outdoor campaign, the AR device did not recognize its final location as the initial location and therefore did not perform a relocalization. The reason no rapid vertical shift happened is probably that there were no repetitive structures along the trajectory. Even the gravel surface provided enough unique features.

CONCLUSION AND OUTLOOK
We carried out performance investigations in a challenging outdoor environment without manmade structures with four different campaigns. The four different trajectories covered different surface types and even included vertical displacements following a stairway. In our mapping test campaigns, we showed that AR tools are surprisingly accurate with a max 3D error of the full loop campaign of 2.89 m or 0.6% over a distance of 350 meters in a very demanding environment ( Figure 5, left). The analysis of the difference vectors in Figure  6 and the RMSE of the GCP measurements indicates that the local accuracy is even higher. All this shows that AR tools have a huge potential in accurately tracking mobile devices in outdoor environments without specific and expensive hardware.
In summary, we demonstrated that AR Frameworks are an interesting alternative to costly high-end mobile mapping systems in certain application areas.
In the future, AR mapping apps could provide a low-cost frontend to an ecosystem of image-based mobile mapping and visual localization services. As demonstrated in this paper, consumer devices could be used for carrying out relatively accurate 3D measurements and for updating existing image-based infrastructure services, e.g. by providing accurately georeferenced fault or change reports to facility managers.
Future work includes the combination of the high local accuracy of an AR tool with GNSS or a visual positioning service as an absolute positioning system.