EVALUATION AND OPTIMISATION OF CROWD-BASED COLLECTION OF TREES FROM 3D POINT CLOUDS

The term "Crowdsourcing" goes back to Jeff Howe (Howe, 2006) and represents a neologism of the words "crowd" and "outsourcing". Unlike outsourcing, where companies outsource certain tasks to known third parties, crowdsourcing outsources tasks to unknown workers (crowdworkers) on the Internet. This allows companies to access large numbers of workers who would otherwise not be available. In this paper, we will discuss an approach for the crowd-based collection of trees by means of minimum bounding cylinders from 3D point clouds. We will demonstrate the used web-interface and compare the results with reference data. To improve the quality of the results, we collect the data not only once but multiple times. This enables us to implement a so-called “Wisdom of the Crowd” approach where we can identify automatically outliers and derive integrated cylinders. We will show in this paper that this approach increases significantly the quality of the results.


INTRODUCTION
Quality control and improvement is a major challenge of crowdbased data collection (Liu et al. 2018, Leibovici et al., 2017 because data provided by crowdworkers can be erroneous (Vaughan, 2017). The crowd is composed of people with unknown and very diverse abilities, skills, interests, personal objectives, and technological resources (Daniel et al., 2018). Most crowdworkers are unfamiliar with the standards on spatial data collection and they may feel no need to follow such rules (Hashemi and Abbaspous, 2015).
The more complex the crowdsourcing task, the more heterogeneous will be the results of different crowdworkers. Another problem can be dishonest workers who try to maximize their income by submitting as many tasks as possible, even they did not complete the tasks or did the tasks only sloppy (Hirth et al., 2013). Additionally, spam and adversarial workers may exist who could be very harmful to the quality of the collected data (Zhang et al., 2016). Therefore, one of the fundamental challenges in crowdsourcing is inferring the ground truth from noisy data collected by non-experts (Zhou et al., 2012). Quality control and quality improvement is a hot research topic in crowdsourcing (Zhang et al., 2016).
We can distinguish two different approaches for the collection of geospatial data by the crowd: Collection by volunteers without payments (Volunteered Geographical Information -VGI [Goodchild, 2007]) and collection by paid crowdworkers. Crowdsourcing projects that are based on the work of unpaid volunteers need an active community who has an intrinsic motivation for collaboration. The main factors, that users collect voluntarily geospatial data, are that their contributions are made freely available and that other users benefit from providing digital maps (Budhathoki and Haythornthwaite, 2012). If this is not the case, other incentives must be provided such as monetary payments.
The most important VGI project is OpenStreetMap (OSM). OSM quality has been a subject of a considerable amount of research [e.g. (Fonte et al., 2017) or (Degrossi et al., 2018)]. The basic quality control concept of OSM is that users verify the data of other users, which leads to an increasing quality over time (Barron et al., 2014). However, in particular areas it may also happen that the quality decreases (Fonte et al., 2017). OSM has no centralized quality control (Degrossi et al., 2018), whereas in paid crowdsourcing the quality control is under the responsibility of the employer. Fonte et al. (2017) proposed several quality indicators to handle the specific nature of VGI data, such as demographic and socio-economic indicators, which are important when users collect data in areas in which they live, which is mostly not the case in paid crowdsourcing. The data collection in OSM is an open-end process, whereas campaigns in paid crowdsourcing are usually limited in time.
Generally, there are two different approaches to control and improve the quality of paid crowdsourced data (Zhang et al. 2016): "Quality Control on Task Designing" and "Quality Improvement after Data Collection". The first approach tries to guide the crowdworkers to provide high quality data. Many methods exist for this, such as qualification tests, reputation systems, task assignment, task and workflow optimization, training, real-time quality assurance, quality checkpoints or incentive payment mechanisms. A comprehensive overview of these techniques can be found in (Daniel et al., 2018).
In the second approach, additional procedures are used to improve the quality after the data has been collected. A common idea is the repeated data collection by different crowdworkers. After data collection, mechanisms are used to filter out noisy data and to infer the truth.
The process of estimating the truth from multiple collected data is called "ground truth inference". A ground truth inference algorithm uses multiple collected noisy data as input and generates as output the estimated truth. If for example labels are collected multiple by different crowdworkers, a straightforward approach is to use the most common label. The employer duplicates the task and n different workers complete the task. The result, which has the majority, is assumed correct (Hirth et al., 2013). Salk et al. (2016) examined the use of majority classification for the identification of croplands in remote sensing images. They defined a binary crowdsourcing task, evaluated the accuracy of the results, and compared them with expert validations. Hecht et al. (2017) realized an intrinsic quality control of semantic data to evaluate the results of crowdsourced classifications of building footprint data. In order to reduce noise, they collected the data multiple and aggregated the results with majority voting. Zhou et al. (2015) proposed to use the information measure minimax conditional entropy. They assumed that labels are generated by a probability distribution. By maximizing the entropy of this distribution, they can estimate true labels from a set of noisy labels.
Whereas majority classification can be easily realized for labelling tasks, it is difficult to use it for spatial data collection tasks. The reason is that labels can be classified into a finite number of classes whereas the geometry of a spatial object can have any shape. However, Walter and Sörgel (2018) found out that high quality spatial data can be achieved with paid crowdsourcing by collecting the data not only once but multiple times by many crowdworkers and then integrating the different representations into one common result.
This follows the idea of the "Wisdom of the Crowd". Surowiecki (2005) has shown in his book "Wisdom of the Crowd -why many are smarter than the few and how collective wisdom shapes business, economics, societies and nations" on many examples from very different fields that averages of multiple guesses are often better than the best individual guess. Large groups of people are smarter and can solve complicated problems even better than specialists can. For this, we need multiple representations as an input (which can be easily realized with paid crowdsourcing) and an averaging process that integrates the multiple results.
In this paper, we want to demonstrate the "Wisdom of the Crowd" principle on the crowd-based collection of trees from 3D point clouds. The interpretation of 3D point clouds is a nontrivial task and can be challenging for non-experts. Most of the existing work in the field of crowd-based geospatial data collection is based on 2-dimensional image data. To the best of our knowledge, (Herfort et al., 2018) is the only work that uses 3D point clouds in the context of crowdsourcing. We will demonstrate in this paper that groups of crowdworkers are smarter than individual crowdworkers are and that averaging multiple collected instances into one integrated representation leads to higher quality data.
The rest of the paper is organized as follows. In section 2 we will show the datasets from which the crowdworkers had to collect the data. The web-based interface, which contains the tools for the data collection, is demonstrated in section 3. In section 4 we discuss the quality evaluation approach and in section 5 the integration process is presented. The results of the data collection and data integration are shown in section 6. A discussion of the results can be found in section 7.

DATA
For our tests we used two datasets: the ISPRS Vaihingen 3D Semantic Labelling dataset (Niemeyer et al., 2014) (V3D) and a dataset from the State Office for Spatial Information and Land Development, Baden-Württemberg (Landesamt für Geoinformation und Landentwicklung LGL, Baden-Württemberg) (M3D). Both datasets were collected by airborne laserscanning and coloured using rgb orthophotos. We selected five circular sections with a radius of 50 m from the V3D dataset and 14 sections with the same extent from the M3D dataset. The crowdworker had to collect the trees only in an inner area of the circle with a radius of 30 m. We carefully collected the reference data for both datasets by ourselves.
The V3D dataset has a point density of 4 to 7 points/m² and contains mainly detached houses with surrounding gardens with trees and bushes (see Figure 1). The M3D dataset has a point density of 4 to 32 points/m² and contains the surrounding of the Mercedes Benz Museum, Stuttgart with single-line and doubleline tree rows and groups of trees (see Figure 2).  Figure 3 shows the web-based Graphical User Interface of the program for the crowd-based collection of trees. The interface was developed with JavaScript and HTML. The 3D visualisation is realized with the JavaScript 3D library Three.js (Cabello, 2019). The functionality on the server is implemented with PHP. The program consists of three parts: Visualisation of the point cloud: the users can rotate, move and zoom the point cloud with the mouse. It is possible to reset the view or to select between different standard views.
Functionality for the collection of the data: the users must click on New in order to add a new cylinder to the 3D scene. The position of the new cylinder can be changed by dragging the mouse. The radius and the height of the cylinder can be changed by clicking on the corresponding control buttons. The bottom of the cylinder is automatically adjusted to the terrain.
Management of the cylinders: all collected cylinders are shown in a list. Wrongly collected cylinders can be deleted from the list. An already collected cylinder can be activated by clicking on the corresponding element in the list. There is always only one cylinder active. If a cylinder is active, the position, size, and radius of this cylinder can be changed. The crowdjob can be finished by clicking on the Submit button. The collected data will then be submitted to the server. Figure 3. Web-tool for the crowd-based collection of trees from 3D point clouds All crowdjobs were published on the commercial platform microWorkers (www.microworkers.com) which takes over the recruitment and the payment. According to their website, the platform has access to more than 1,500,000 registered crowdworkers (January 2019). The payment is $0.10 for each job plus $0.01 per collected tree.

QUALITY EVALUATION
A common approach for quality evaluation is to subdivide all collected data instances into the categories: The problem is that not all crowd-based collected cylinders can be matched uniquely 1:1 to a reference cylinder. Figure 4 shows on three examples that also 1:n, n:1 and n:m relations are possible. Therefore, we evaluate the quality of the crowd-based collected data using an approach of Rottensteiner et al. (2005) which is based on the evaluation of mutual overlaps. In our application, we extend the approach to three dimensions.
The set of crowd-based collected cylinders is Cn and the set of reference cylinders is Cr. For each cylinder cn ∈ Cn and cr ∈ Cr we calculate the ratios qnr and qrn with qnr = Vn∩r / Vn and qrn = Vn∩r / Vr where Vn∩r is the intersecting volume of the crowdbased collected cylinder cn and a reference cylinder cr, Vn is the volume of cylinder cn, and Vr the volume of cylinder cr.  Figure 4. Examples of different relationships between reference data and crowd-based collected data: (a) n:1 relationship: three reference cylinders are collected with one cylinder by a crowdworker (b) 1:n relationship: one reference cylinder is collected with two cylinders by a crowdworker (c) n:m relationship: three reference cylinders are collected with two cylinders by a crowdworker For each cylinder cn ∈ Cn and cr ∈ Cr we calculate now the ratios of total overlap dn = VCnr∩n / Vn and dr = VCrn∩r / Vr with VCnr∩n is the intersecting volume of Cnr with cn and VCrn∩n is the intersecting volume of Crn with cr. Based on the ratios dn and dr, we subdivide all cylinders cn ∈ Cn and cr ∈ Cr into four categories according to Instead of defining TP, FP and FN as total number of cylinders of each category, we use the total volumes of the corresponding cylinders to weight trees with large volumes higher than trees with small volumes. This affects also the Completeness and Correctness.

INTEGRATION OF MULTIPLE COLLECTED CYLINDERS
The multiple collected cylinders are integrated in two steps. First, we use a DBSCAN clustering algorithm to detect clusters and remove outliers by evaluating the x-and y-coordinates of the cylinders. In a second step, we integrate all cylinders of each cluster.

Clustering with DBSCAN
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based algorithm for the detection of clusters and outliers (Ester et al. 1996). The advantages of DBSCAN are that the number of clusters must not be specified in prior -like in K-means -and that it is robust to outliers.
DBSCAN requires two parameters: (1) Epsilon defines the maximum distance between two points to be considered as neighbours and (2) MinPts defines the minimum number of points in a cluster. Based on empirical tests we use Epsilon = 2.5 m and MinPts = 4 as parameter setting.

Integration
DBSCAN removes outliers by evaluating the positions of the cylinders. For the integration of the cylinders of each cluster, we use an iterative approach that removes additional outliers by evaluating also the radius and the heights of the cylinders.
As start for the iteration, we calculate an integrated cylinder CIntegrate by averaging x, y, radius and height of the cylinders of each cluster. In the next step, we calculate the intersecting volumes VIntersect_k of the integrated cylinder CIntegrate with volume VIntegrate and all other cylinders Ck of the same cluster with volumes Vk.
Then, we calculate the two ratios qIntegrate and qk, which describe the mutual overlap: qIntegrate = VIntersect_k/VIntegrate and qk = VIntersect_k/Vk. If qIntegrate or qk is smaller than 50 % the corresponding cylinder Ck is marked as outlier and removed from the cluster and a new iteration is started. Figure 5 shows examples of possible outliers in a bird view.

Results of the crowd-based data collection
The crowdworkers executed 50 jobs on the V3D dataset and 140 jobs on the M3D dataset. Table 2 shows the average working time per crowd job, the average number of collected trees per crowd job and the average number of trees in the corresponding reference.
Integrated Cylinder Outlier  The working time needed to execute one crowd job is typically between 6 to 10 minutes. It can be seen that the average number of collected cylinders is smaller than the average number of trees in the reference that indicates that often not all trees were collected or that several closely neighbouring trees in the reference were collected with only one cylinder by the crowdworkers. The quality parameters of the M3D dataset are significantly higher as the quality parameters of the V3D dataset. The reason is that the trees in the V3D dataset are often surrounded by other vegetation (bushes and undergrowth) whereas this is not the case in the M3D dataset. Another reason could be the different point densities of both datasets. The point density of the V3D dataset is between 4 and 7 points/m² whereas the point density of the M3D dataset is between 4 and 32 points/m². A higher point density can help with the visual interpretation of the point clouds.

Results of the integration
Each crowdjob was processed by ten crowdworkers. Altogether, 346 cylinders were collected from the V3D dataset and 1537 cylinders from the M3D dataset. Figure 6 shows the x-and ypositions of all collected trees on an example of one of the sections of the V3D dataset. The outliers detected with DBSAN are marked with red colour. The remaining trees were subdivided into different cluster (Figure 7). The result of the outlier detection based on the evaluation of the heights and the radius is shown in Figure 8 and the final clusters in Figure 9. Table 4 summarizes the results of the clustering and outlier detection of all crowdjobs for both datasets.     Figure 11 shows the input data, the reference data and the result of the integration on one example data section of the V3D dataset. and Quality [%] after data integration

DISCUSSION
Paid crowdsourcing can be a powerful tool to collect spatial data. However, the main problem is that the quality of the data can be very heterogeneous. This is not only because the objects are collected by untrained individuals but also because of the subjective nature of data collection (Walter, Sörgel, 2018). Figure 11. Input data (a) reference data (b) result of the integration (c) of one example data section of the V3D dataset One method to increase the quality of crowd-based data collection is to collect the data not only once but multiple and integrate the multiple instances into one representation. This is the idea of the "Wisdom of the Crowd" that says that groups are smarter than individuals are, even if the individuals are experts.
We tested this idea on the crowd-based collection of trees from 3D point clouds. For this, we implemented a method for the integration of multiple cylinders with a simultaneous outlier detection and removal. Each crowdjob was duplicated 10 times. We compared the quality of the individual collected cylinders with the quality of the integrated cylinders. The quality of the integrated cylinders is significant higher as the quality of the individual cylinders. We were able to achieve a Correctness of 100 % and a Completeness and Quality of higher than 97 % for the V3D dataset and higher than 99 % for the M3D dataset.
Methods based on ergonomics and psychology are not part of this research. However, they can be combined with the proposed methods for a further improvement of the quality.