SLIC SUPERPIXELS FOR OBJECT DELINEATION FROM UAV DATA

Unmanned aerial vehicles (UAV) are increasingly investigated with regard to their potential to create and update (cadastral) maps. UAVs provide a flexible and low-cost platform for high-resolution data, from which object outlines can be accurately delineated. This delineation could be automated with image analysis methods to improve existing mapping procedures that are cost, time and labor intensive and of little reproducibility. This study investigates a superpixel approach, namely simple linear iterative clustering (SLIC), in terms of its applicability to UAV data. The approach is investigated in terms of its applicability to high-resolution UAV orthoimages and in terms of its ability to delineate object outlines of roads and roofs. Results show that the approach is applicable to UAV orthoimages of 0.05 m GSD and extents of 100 million and 400 million pixels. Further, the approach delineates the objects with the high accuracy provided by the UAV orthoimages at completeness rates of up to 64%. The approach is not suitable as a standalone approach for object delineation. However, it shows high potential for a combination with further methods that delineate objects at higher correctness rates in exchange of a lower localization quality. This study provides a basis for future work that will focus on the incorporation of multiple methods for an interactive, comprehensive and accurate object delineation from UAV data. This aims to support numerous application fields such as topographic and cadastral mapping.


INTRODUCTION
Superpixel approaches, introduced in (Ren and Malik, 2003), group pixels into perceptually meaningful atomic regions.Superpixels are located between pixel-and object-level: they carry more information than pixels by representing perceptually meaningful pixel groups, while not comprehensively representing image objects.Superpixels can be understood as a form of image segmentation, that oversegment the image in a short computing time.Comparisons to similar approaches that can be found in (Achanta et al., 2012;Csillik, 2016;Neubert and Protzel, 2012;Schick et al., 2012;Stutz, 2015;Stutz et al., 2017) have demonstrated their advantages: The outlines of superpixels have shown to adhere well to natural image boundaries, as most structures in the image are conserved (Neubert and Protzel, 2012;Ren and Malik, 2003).Furthermore, they allow to reduce the susceptibility to noise and outliers as well as to capture redundancy in images.With image features being computed for each superpixel rather than each pixel, subsequent processing tasks are reduced in complexity and computing time.Thus, superpixels are considered useful as a preprocessing step for analyses at object level such as image segmentation (Achanta et al., 2012;Achanta et al., 2010).
In general, the success of image segmentation activities is highly variable as it depends on the image, the algorithm and its parameters: an algorithm that performs as desired on one image might result in a lower segmentation quality when applied with the same parameters to another image.This study investigates the applicability of a superpixel approach, namely simple linear iterative clustering (SLIC), in terms of its ability to delineate object outlines of roads and roofs from UAV data.The approach has proven to accurately delineate object outlines (Achanta et al., * Corresponding author 2012).In this study, SLIC is applied on two UAV orthoimages of 0.05 m GSD and extents of 100 million and 400 million pixels.
Object delineation is potentially useful in numerous application fields, such as topographic and cadastral mapping (Crommelinck et al., 2016).Cadastral mapping refers to mapping the extent, value and ownership of land, being crucial for a continuous and sustainable recording of land rights (Williamson et al., 2010).Cadastral mapping is used in this study as an example application field to investigate the applicability of SLIC superpixels for an automatic delineation of object outlines.Such visible outlines can correspond to cadastral boundaries, as a large portion of cadastral boundaries are assumed to be visible (Zevenbergen and Bennett, 2015).Automatically delineating visible boundaries, would thus improve cadastral mapping approaches in terms of cost, time, accuracy and reproducibility.This study investigates SLIC superpixels as part of a boundary delineation workflow.It does not provide a full workflow for automatic delineation of visible cadastral boundaries.However, when used alongside other more conventional mapping techniques, the approach may improve the time and costs associated with wide-area cadastral mapping projects.

Superpixel Approaches
Superpixels oversegment an image by forming compact and uniform groups of pixels that have similar characteristics in e.g., color or geometry.In the past, multiple superpixel approaches have been developed.They can be classified into i) graph-based and ii) gradient-ascent-based approaches:


In i), each pixel is considered a node in a graph.An edge weight is defined between all pairs of nodes that is proportional to their similarity.Then, a cost function defined on the graph is formulated and minimized, in order to extract superpixel segments.Examples of graph-based approaches are (Felzenszwalb and Huttenlocher, 2004;Moore et al., 2008;Shi and Malik, 2000).


In ii), pixels are iteratively mapped to a feature space to delineate denser regions that represent clusters.Each iteration refines each cluster to obtain a better segmentation until convergence.Examples of gradient-ascent-based approaches are (Comaniciu and Meer, 2002;Levinshtein et al., 2009;Vincent and Soille, 1991).
State-of-the-art superpixel approaches have been compared in (Achanta et al., 2012;Csillik, 2016;Neubert and Protzel, 2012;Schick et al., 2012;Stutz, 2015;Stutz et al., 2017) considering speed, memory efficiency, compactness of outlines, their ability to adhere to image boundaries and their impact on segmentation performance.Boundary adherence is often measured via boundary recall, indicating how many true edges are missed, and via undersegmentation, indicating to what extent superpixels exceed outlines of the reference data (Achanta et al., 2012;Neubert and Protzel, 2012).The SLIC superpixel approach, belonging to the group of gradient-ascent-based approaches, appears as the best overall performer: the algorithm is low in processing time and produces compact and nearly uniform superpixels that are positively evaluated in terms of boundary recall and undersegmentation error (Achanta et al., 2012;Csillik, 2016;Neubert and Protzel, 2012;Schick et al., 2012;Stutz, 2015;Stutz et al., 2017).

SLIC Approach
SLIC was introduced in (Achanta et al., 2010) and later extended to a zero parameter version of SLIC called SLICO and compared to state-of-the-art superpixel approaches in (Achanta et al., 2012).SLIC considers image pixels in a 5D space, defined by the L*a*b values of the CIELAB color space as well as their x and y coordinates.Pixels in the 5D space are clustered based on an adapted k-means clustering integrating color similarity and proximity in the image plane.The clustering is based on a distance measure D that measures color similarity in L*a*b space (  ) and pixel proximity in x, y space (  ).The latter is normalized by a grid interval (S) that defines the square root of the total number of image pixels divided by the number of superpixels (k).The compactness and regularity of the superpixels is controlled with the constant m.This parameter functions as a weighting criteria between the spatial distance (  ) and the spectral distance (  ).A larger m, increases the weight of spatial proximity, which leads to more compact superpixels with boundaries adhering less to spectral outlines in the image.
(1) SLICO replaces the constant values for m and S used in (1) to normalize spectral and spatial proximity, by iteratively normalizing their proximity.The proximities are dynamically normalized for each cluster considering the maximum observed spectral distance (  ) and spatial distance (  ) from the previous iteration.This leads to a more consistent superpixel compactness and a reduced need to define parameters.
In general, only pixels within D are considered during clustering, which makes SLIC fast and computational efficient compared to conventional k-means clustering.Another advantage of SLIC is its ability to be applied to images greater than 0.5 million pixels, Processing time scales linearly with the number of pixels.Further, it is simple to implement and demands low computational and memory cost.Its boundary recall is lowest compared to other approaches.However, the risk of losing meaningful image edges remains, when an edge is placed inside a superpixel (Achanta et al., 2012).
SLIC implementations are available in OpenCV (Bradski, 2016), VLFeat (Vedaldi, 2013), GDAL (Balint, 2016), Scikit (Scikit-Learn Developers, 2012), Matlab (MathWorks, 2016) and GRASS (Kanavath and Metz, 2017).They are mostly based on the two SLIC versions proposed in (Achanta et al., 2012).For the first version (SLIC), the parameter k specifies the number of approximately equally sized superpixels.Optionally, the compactness parameter m can be set to control the trade-off between superpixels' homogeneity and boundary adherence.This version generates regular-shaped superpixels in untextured regions and highly irregular superpixels in textured regions (Figure 1a).For the second version (SLICO), only the parameter k can be defined, while m is adaptively refined for each superpixel.It generates regular-shaped superpixels across the scene, regardless of texture (Figure 1b) (Achanta et al., 2012).

Superpixels in Remote Sensing
The benefits of analyzing groups of pixels instead of single pixels, has been verified from a computer vision perspective for multiple applications such as object recognition (Malisiewicz and Efros, 2007;Pantofaru et al., 2008).This has similarly been done from a remote sensing perspective for object-based image analysis (OBIA) (Blaschke, 2010).The use of superpixels in computer vision is increasingly popular, whereas only few studies in remote sensing consider superpixels (Acuña et al., 2016;Chen et al., 2016;Csillik, 2016;Ortiz Toro et al., 2015;Sahli et al., 2012;Thompson et al., 2010;Vargas et al., 2015;Zhang et al., 2015).
However, the need and acceptance of superpixels in remote sensing is presumed: the local spatial autocorrelation between pixels with a high resolution in remotely sensed imagery is high.Thus one object is often composed of many pixels with similar characteristics (Chen et al., 2012).This has led to the formulation of the OBIA paradigm (Blaschke, 2010).Superpixels that group pixels of similar characteristics into an oversegmented image are considered a preprocessing step in conventional OBIA approaches (Zhang et al., 2015).
A comparison of four state-of-the-art superpixel approaches, with SLIC being the best choice considering speed and accuracy, has been conducted on satellite imagery of 0.5 -0.6 m GSD and an extent of 4 million pixels (Csillik, 2016).In further studies that apply superpixels on remote sensing data, SLIC is equally considered as the most suitable superpixel approach (Csilik and Lang, 2016;Ortiz Toro et al., 2015;Sahli et al., 2012;Vargas et al., 2015).SLIC has rarely been applied to UAV data, or for object delineation in topographic or cadastral mapping.This study aims to bridge both of these research gaps.

SLIC Superpixels for Object Delineation
In general, SLIC cannot be considered as a standalone approach for object delineation.Each superpixel needs to be closed even if no object outline is available within the image.The larger k, the more outlines are generated that do not align with object outlines.In order to eliminate those unwanted outlines, SLIC could be combined with further segmentation methods.Another option, proposed in (Sahli et al., 2012) would be to fuse neighboring SLIC regions of similar color to eliminate non-relevant outlines.
Combining the information from multiple segmentations has been investigated in other studies aiming to develop a transferable approach with a constant object recognition robustness and a reduced need for parameter optimization.Object outlines delineated through multiple segmentations are shown to be more reliable and robust compared to those detected by fewer segmentations (Borenstein and Ullman, 2008;Malisiewicz and Efros, 2007;Pantofaru et al., 2008;Russell et al., 2006).This idea can equally be transferred to superpixels: combining superpixels with the output of a contour detector has shown to better delineate object contours, compared to using a standalone superpixel approach (Levinshtein et al., 2010(Levinshtein et al., , 2012;;Levinshtein et al., 2009;Yang and Rosenhahn, 2016).
SLIC superpixels are often combined with the Pb (Martin et al., 2004) or the gPb (Maire et al., 2008) contour detector.These approaches combine texture, color and brightness to calculate probabilities of boundaries (Pb) and globalized probabilities of boundaries (gPb), respectively.The former considers these cues on a local scale, while the latter considers them on both a local and a global scale.Detected contours and superpixel outlines are mostly combined with cost functions that minimize the interclass similarity while maximizing the intra-class similarity.The functions are optimized through learning based on computer vision benchmark datasets (Levinshtein et al., 2010(Levinshtein et al., , 2012;;Yang and Rosenhahn, 2016).These functions are not directly transferable to remote sensing imagery, which have more complex characteristics.However, the approach of gPb contour detection has been investigated as a standalone approach for UAV-based cadastral mapping in (Crommelinck et al., 2017).
The study shows that the approach provides a comprehensive initial detection of candidate objects that could be verified and located exactly by integrating SLIC outlines.
When combining SLIC and a further segmentation approach, such as gPb contour detection, moderate errors of omission are acceptable: outlines missed by SLIC might be detected by the second approach.In general, a low error of omission, i.e., a high level of completeness, is of utmost importance for an automated object detection system before integrating user interaction and thus reducing the system's automation (Mayer, 2008).The user interaction required to manually delineate a missed boundary (error of omission) is more time-consuming than to delete an erroneously included boundary (error of commission).The goal is to minimize the summated time for editing both the error of omission and commission.

UAV Data
Two UAV orthoimages of different extents showing rural areas in Germany and France were selected for this study.Table 1 shows specifications of the data capture.Figure 2 shows the orthoimages of both study areas.

Reference Data
Automatically delineating objects is considered useful for cadastral mapping, as object outlines often align with visible cadastral boundaries (Zevenbergen and Bennett, 2015).
Examples for such objects are roads, fences, hedges and stone walls, as well as outlines of roofs, agricultural fields and tree groups (Crommelinck et al., 2016).From this list, road and roof outlines were selected for this study, as these are the objects with the highest visibility and the most accurately delineable outlines for both study areas.These outlines were manually delineated for parts, where the outlines could be localized exactly.Parts of road and roofs without a precisely distinguishable outline were not delineated as reference data (Figure 2).
(a) (b) Figure 2. Manually delineated outlines of exactly localizable roads and roofs used for the accuracy assessment overlaid on UAV orthoimages of (a) Amtsvenn in Germany and (b) Toulouse in France.Outlines in close spatial proximity, such as two parallel outlines of roads, might appear as a thicker line, as they consist of two parallel lines in the reference data.

Image Processing Workflow
The image processing workflow consists of the application of SLIC on the UAV datasets (Section 3.3.1)and its accuracy assessment (Section 3.3.2).For the SLIC application, a Matlab implementation was used (MathWorks, 2016), which is based on (Achanta et al., 2012).All further workflow steps were implemented in Python as QGIS processing scripts making use of functionalities from QGIS (QGIS Development Team, 2009), GRASS (GRASS Development Team, 2015) and GDAL (GDAL Development Team, 2016).

SLIC Application:
The Matlab implementation, used in this study, provides a SLIC and a SLICO version (MathWorks, 2016).SLICO requires a predefined number of superpixels k, while SLIC requires k, as well as a compactness parameter m that regularized the SLIC outlines.k was chosen in accordance to possible sizes of objects of interest in range [1; 400] m 2 .m was chosen in accordance to recommendations from MathWorks in range [1; 20].Due to the different extents of the two UAV orthoimages (Table 1), this resulted in different numbers for k ranging from 625 to 1,000,000: the smaller the size of one superpixels, the larger the total number of superpixels k (Table 2).SLIC was applied to the entire orthoimage.Table 2. Varying numbers of superpixels k resulting for the two study areas with a coverage of 1,000,000 m 2 (Amtsvenn) and 250,000 m 2 (Toulouse).

Accuracy Assessment:
In order to decrease the processing time of the accuracy assessment, the SLIC outlines were clipped to a buffer of 0.3 m radius around the reference data.
Then, all lines in the reference data and the clipped SLIC outlines were buffered with a radius of 0.1 m.These datasets were converted to a raster format of 0.05 m pixel size.Then, each SLIC dataset was overlaid with the reference data, in order to label each pixel as true positive (TP), true negative (TN), false positive (FP) or false negative (FN).The sum of pixels with an identical label was summarized in a confusion matrix.From the confusion matrix, the error of omission (3) and the error of commission (4) was calculated in range [0; 100]: The error of omission captures the percentage of pixels erroneously labelled as 'no outline', i.e., the percentage of object outlines that are missed by the SLIC outlines.The error of commission captures the percentage of pixels erroneously labeled as 'outline', i.e., the percentage of object outlies that are incorrectly included in the SLIC outlines.These measures are based on (Goodchild and Hunter, 1997) and evaluate to which extent SLIC outlines coincide with actual object outlines.(c, d) m = 20, and (e, f) SLICO, where m is adaptively refined for each superpixel.The first row of images shows superpixels overlaid on the orthoimage of Amtsvenn, while the second row shows superpixels overlaid on the orthoimage of Toulouse, both for k = 10,000.
Figure 3 shows that the regularity of the superpixel outlines can be enlarged by increasing m: the outlines of SLIC are more irregular for m = 1 (Figure 3a,b) than for m = 20 (Figure 3c,d).They run strictly along boundaries of spectral differences for m = 1, while m = 20 allows SLIC superpixels that are more homogeneous in shape, but less homogeneous in spectral content.This regularity in shape is increased even more, when using SLICO, for which m is automatically defined (Figure 3e,f).The superpixels' outlines require further merging steps to delineate objects in the image as closed polygons, which will be investigated in future work.One approach might be to group SLIC superpixels of similar color.Another approach would be to merge SLIC outlines at locations, where another method with a higher detection quality, such as gPb contour detection, locates a boundary.2).
Figure 4 shows the errors of omission for both study areas that range from 36% to 76%.The error of omission is mostly lowest for m = 1, regardless of the number of superpixels k.This observation holds true for data of Amtsvenn (Figure 4a) and Toulouse (Figure 4b).The allowed range for parameter m is [0; ∞].The error of omission is mostly higher for SLICO compared to m = 1 and m = 20 across all investigated cases: due to the high shape regularity that SLICO enforces (Figure 3e,f), the superpixels become spectrally more heterogeneous and their outlines delineate the objects less accurately.The lowest errors of omission for both study areas are obtained for GSDs in range [2; 5] m and amount to 36 -37% for Amtsvenn and to 41 -44% for Toulouse.For GSDs of 1 m, the results of both study areas contain small areas for which the superpixel outlines appear regular-shaped, oriented in one direction and unaligned with object outlines.This effect might be caused by a memory problem due to the large number of generated superpixels.The predefined number of k and the obtained number of superpixels varies in median mean by 0.3%.2).
Figure 5 shows the errors of commission for both study areas that range from 42% to 63%.These numbers strongly depend on the chosen buffer size of 0.3 m around the reference data, in which FP pixels are counted.For a smaller buffer size, the errors of commission would be lower.This buffer size does not influence the error of omission, as this error considers boundary pixels in the reference map only.The errors of commission vary less per GSD and in terms of the SLIC parameters compared to the errors of omission.SLIC outlines need to be closed even when no object outline is available in the image.This hinders the effacement of the error of commission.The results indicate that this effect, i.e., the relative amount of erroneously labeled 'outline' pixels, occurs equally across all investigated cases.

DISCUSSION
The results indicate that SLIC superpixels delineate object outlines most accurately and completely using SLIC with a compactness parameter m = 1 for superpixels' GSDs in range [2; 5] m.Depending on the extent covered in one this results in a different number of superpixels defined as k (Table 2).The regularity of object outlines to be delineated can be considered when deciding on m or SLICO: SLICO results in more regular-shaped outlines and can provide more suitable results, when the object outlines are regular as well.When applying SLIC, the regularity of shape outlines can be slightly increased by increasing m.
The results from (Csillik, 2016), in which SLIC is applied to satellite imagery of 0.5 -0.6 m GSD are closest to those obtained in this study.Csillik suggests using an initial superpixel size of 10 x 10 pixels and 10 iterations for the clustering and refinement of the superpixels.The same number of iterations was used in this study.It is proposed as default by MathWorks.The superpixel size proposed by Csillik would correspond to superpixels of 0.5 m GSD for the data of this study (k = 4,000,000 for Amtsvenn; k = 1,000,000 for Toulouse).As the error of omission increased for GSDs below 2 m and corresponding values of k, these superpixel sizes were not analysed in this study.Furthermore, UAV data can be analysed by considering 3D information in addition to the orthoimage.Future work will investigate the usability of SLIC on digital surface models (DSM) as proposed by Csillik.This could be done by applying gPb contour detection and SLIC superpixels on a DSM.This would allow to identify high gradients in high, which indicate objects such as fences or walls.Incorporating such information could help to localize missed outlines and to erase shadow outlines that are erroneously captured as object outlines (Figure 3a,c,e).
The accuracy assessment applied in this study is based on (Goodchild and Hunter, 1997) and is similarly employed in numerous further studies (Kumar et al., 2014;Shi et al., 2003;Wiedemann, 2003;Wiedemann et al., 1998).It provides a comprehensive and widely used measure for positional accuracy.Disadvantages include its dependency on the applied buffer size and its sole focus on positional accuracy.For this study, it adequately measures to which extent SLIC outlines coincide with actual object outlines.More extensive accuracy assessment approaches suitable for the described application are listed in (Crommelinck et al., 2016).Furthermore, the manually delineated object outlines can contain errors.However, the applied buffer of 0.1 m partly smoothes inaccurately delineated outlines.Inaccuracies might be further reduced by averaging the manually delineated outlines of multiple human operators (Martin et al., 2004).In general, manually drawn reference data is accepted to measure the degree to which an automated system outperforms a human operator (Mayer, 2008).
Even for a workflow that accurately and completely delineates objects from UAV orthoimages, future work is required to determine the amount of cadastral boundaries that are visible and can thus be extracted automatically.However, even a partial extraction of cadastral boundaries could improve the mapping procedure in terms of cost and time.Furthermore, an accurate and complete delineation of objects can be useful in further application fields such as topographical mapping, road tracking or building extraction.

CONCLUSION AND OUTLOOK
This study investigates automatic object delineation from optical UAV data.This supports multiple application fields such as recent endeavors in cadastral mapping, which aim to automatically delineate objects that demarcate cadastral boundaries from high-resolution optical sensor data.In this application field, a suitable workflow is assumed to consists of multiple feature extraction methods (Crommelinck et al., 2016).This study has investigated the potential of SLIC superpixels to delineate objects as part of such a workflow: SLIC was found to be applicable to UAV orthoimages and feasible to accurately delineate object outlines taking into account the high resolution of 0.05 m provided by the UAV orthoimages.
However, the method generates a large number of outlines that do not demarcate object outlines.Future work will investigate the combination of SLIC with the contour detection method proposed in (Crommelinck et al., 2017).This contour detection method has shown to provide a comprehensive initial detection of candidate objects that could be verified and located exactly by integrating SLIC outlines.In addition, information from DSMs is intended to be incorporated along with the information from RGB orthoimages.
The goal is a tool for cadastral boundary delineation that is highly automatic, generic and adaptive to different scenarios.The tool will be most suitable for areas in which objects are clearly visible and coincide with cadastral boundaries.Once the design and implementation of such a tool is tested, its transferability to real world scenarios will be investigated.This will be done in countries like Kenya, Rwanda and Ethiopia, where concepts like fit-for-purpose (Enemark et al., 2014) and responsible land administration (Zevenbergen et al., 2015) are accepted or in place.
(a) SLIC (m = 20) and (b) SLICO applied to an UAV orthoimage of Toulouse with 0.05 m ground sample distance (GSD) and k = 625.SLIC generates regular-shaped superpixels in untextured regions and highly irregular superpixels in textured regions.SLICO generates regular-shaped superpixels across the scene, regardless of texture.SLICO superpixels are spatially more compact, but spectrally more heterogeneous.
SLIC outlines derived for compactness parameters (a, b) m = 1, Errors of omission obtained for (a) Amtsvenn and (b) Toulouse.The number of superpixels k varies according to the extent covered per study area (Table Errors of commission obtained for (a) Amtsvenn and (b) Toulouse.The number of superpixels k varies according to the extent covered per study area (Table