H-RANSAC: A HYBRID POINT CLOUD SEGMENTATION COMBINING 2D AND 3D DATA

: In this paper, we present a novel 3D segmentation approach operating on point clouds generated from overlapping images. The aim of the proposed hybrid approach is to effectively segment co-planar objects, by leveraging the structural information originating from the 3D point cloud and the visual information from the 2D images, without resorting to learning based procedures. More speciﬁcally, the proposed hybrid approach, H-RANSAC, is an extension of the well-known RANSAC plane-ﬁtting algorithm, incorporating an additional consistency criterion based on the results of 2D segmentation. Our expectation that the integration of 2D data into 3D segmentation will achieve more accurate results, is validated experimentally in the domain of 3D city models. Results show that H-RANSAC can successfully delineate building components like main facades and windows, and provide more accurate segmentation results compared to the typical RANSAC plane-ﬁtting algorithm.


INTRODUCTION
3D segmentation is the task of partitioning a point cloud into consistent regions. Typically, it reilies solely on the structural information of the objects (i.e. the geometry and shape of the objects), without taking into account visual information (i.e. the color and texture of an object). However, geometrical features are not able to separate different objects belonging to the same surface (in our case the same plane). On the other hand, 2D segmentation may be able to separate such coplanar objects, by exploiting their differences in visual appearance. It is obvious that 3D and 2D data contain complementary information that, if effectively combined, can provide more accurate segmentation results. However, until recently, most of the existing works address the 3D segmentation problem as a detection of structural objects, by adopting three main methodological steps; a) model-fitting (i.e. detecting objects of specific shape such as planes), b) region-growing (i.e. starting from a seed point, grow the region with neighboring points and stop when a certain criterion is met) and c) clustering-based (i.e. cluster the points into objects based on a set of extracted features).
However, the aforementioned methodologies typically rely only on the structural information of the 3D point cloud, discarding completely the visual information. This visual information is becoming more and more available with the popularity of obtaining 3D point clouds through photogrammetry, combining multiple overlapping images. In fact, the use of highly accurate and dense 3D point clouds resulting from multiple overlapping images using Structure-from-Motion (SfM) and Multi-View-Stereo (MVS) algorithms has become an extremely attractive solution for 3D documentation. The advantage of these datasets is that they contain both 3D and 2D information for the scene, with an explicit structural connection between the two spaces.
Motivated by the above, in this paper we propose a hybrid point cloud segmentation algorithm that jointly considers the structural * Corresponding author information of 3D representations and the visual information of 2D images. The proposed hybrid segmentation method, H-RAN-SAC, aims to successfully segment co-planar objects, that cannot be segmented based solely on geometric features. More specifically, H-RANSAC extends the well-known RANSAC algorithm, RANSAC (Fischler and Bolles, 1981), in this case used for planebased segmentation, by further optimizing its results using the output of k-means-based segmentation applied on 2D images. RANSAC is an iterative model-fitting algorithm, aiming to robustly estimate the parameters of a mathematic model. In our case, we initially use RANSAC to compute the parameters of a mathematical model describing a plane. Subsequently, we use this model on the 3D point cloud to identify the computed plane. H-RANSAC, on the other hand, integrates the 2D segmentation output in each iteration, by first identifying the most favorable 2D image (i.e. the image that depicts the whole plane and diminishes occlusions and radial distortions) from the overlapping images. Then, it applies k-means segmentation to the selected 2D image and finally refines the 3D segmentation so that all points belonging to the same 3D plane correspond to pixels that belong to the same 2D object in the segmented image.
We validate the benefit of the proposed hybrid segmentation approach in the domain of 3D building segmentation, that is becoming more and more popular due to advancements in the field of UAVs and mobile mapping systems. More specifically, we show experimentally that H-RANSAC can successfully achieve building delineation, i.e. the segmentation of buildings into its individual parts such as main facade, windows, etc., while the typical 3D segmentation based on RANSAC cannot distinguish between these objects. In addition, it is important to note that one great advantage of H-RANSAC compared to learning-based segmentation methodologies, is that the segmentation of co-planar objects is achieved without any prior knowledge or annotated data for training, which is particularly useful for analyzing city models that typical come without any ground-truth information.
The rest of the paper is organized as follows. Section 2 reviews the related work, and focuses on the methodologies combining 3D and 2D segmentation. Section 3 describes in detail the proposed hybrid procedure. The evaluation protocol of H-RANSAC consisting of two publicly available datasets and three evaluation metrics is described in Section 4. The experimental results are outlined and discussed in Section 5. Finally, Section 6 discusses and concludes the impact of this work and describes our intentions for future research.

RELATED WORK
This paper presents a 3D point cloud segmentation approach for co-planar objects that jointly considers structural and visual characteristics of the objects, without using any prior knowledge or annotated data for training. Related methodologies working on point cloud segmentation mainly rely on the geometrical features of the scene, while a limited number of works, combine visual and geometrical features. In the following, Section 2.1 presents the traditional segmentation methodologies relying on geometrical features, and Section 2.2 presents the hybrid approaches combining visual and geometrical characteristics.

Point cloud segmentation based on geometry
The typical approaches for point cloud segmentation that may be applied irrespectively of the capturing device and the point cloud generation technique (e.g. LIDAR, terrestrial laser scanner devices, point clouds sourcing from SfM algorithms), solely rely on the point cloud geometry. Such methods may be categorized into three different groups; namely model-fitting, region-growing and clustering approaches, and are explained in the following.
Most of the algorithms belonging to model-fitting approaches are based on either the Hough Transform (HT) (Ballard, 1981) or the Random Sample Consensus (RANSAC) (Fischler and Bolles, 1981) approach. HT is capable of detecting both well-defined shapes as well as arbitrary shapes, while RANSAC is capable of robustly computing the parameters of a given mathematical model (i.e. plane) and thus detecting surfaces that can be modeled in mathematical terms. Comparisons between HT and RANSAC have shown that both algorithms are very sensitive to parameter tuning, with RANSAC being able to achieve better segmentation accuracies . Based on HT, the authors of (Vosselman et al., 2004) review plane, sphere and cylinder detection from city models, while, in (Rabbani and Van Den Heuvel, 2005) the automatic detection of cylinders in point clouds is examined. Moreover, in  comparative results for plane-based detection, in the context of extracting roofs from buildings, are presented. Concerning RANSAC-based algorithms, the authors of (Boulaassal et al., 2007) extract facades from point clouds by manually tuning the parameters of RANSAC. Other works extend RANSAC by incorporating some extra steps in order to develop a more efficient methodology. For example, (Schnabel et al., 2007) extracts planes, cylinders and spheres, after computing normal vectors for each point, in order to perform the assignment in a candidate shape. Furthermore, (Chen et al., 2014) also used an improved RANSAC methodology to detect roof-tops in point clouds sourcing from airbone laser scanners. After segmentation, a Voronoibased primitive boundary extraction algorithm was introduced in order to extract the primitives of each rooftop. Finally, (Li et al., 2017) proposed the clustering of points into planar and nonplanar, according to the extracted normals. The aforementioned studies have shown that RANSAC is robust to high noise and outliers. However, it is based on the strict assumption that all the points belonging to the same pre-defined surface (e.g. plane, sphere, etc.) belong to the same object, which is not true for all types of objects (e.g. cannot distinguish windows from facades).
Region-growing techniques were first introduced in (Besl and Jain, 1988). In this work, the mean Gaussian curvature was computed and resulted in a coarse segmentation, which was afterwards refined through an iterative region-growing procedure based on a variable order bivariate surface-fitting. Also deploying region-growing for 3D segmentation, the authors of (Gorte, 2002) proposed to use Triangulated Irregular Networks (TINs) as seeds and merged triangles with similar angles within a predefined distance. Furthermore, in (Vieira and Shimada, 2005), points across edges were eliminated with the help of a curvature threshold. Afterwards, noise reduction was performed and the remaining points served as seeds. In a similar context, the authors of (Nurunnabi et al., 2012) selected the points with the least curvature as seeds and the criterion for merging points was the computation of the normal. Overall, the efficiency of region growing methodologies for point cloud segmentation is strongly correlated with successful seed selection and merging criteria and these methodologies have not shown to be robust enough.
The final category refers to methods, which approach the segmentation problem through clustering, by assigning each point in the 3D point cloud with a set of features and afterwards applying clustering to the points' feature vectors. A typical example is (Biosca and Lerma, 2008), where the point cloud was segmented with the help of fuzzy clustering. For each point features were extracted (including the height difference of a point relative to its neighbors, the unit normal vector of the tangent plane and the distance of the tangent plane to the origin) and subsequently the results were refined by assigning unlabeled points to the nearest cluster. Combining clustering and region-growing the authors of (Dorninger and Nothegger, 2007) first use hierarchical clustering in four dimensional feature space so as to extract the centers of the clusters and use them afterwards as seeds. Similar regions were merged if the normal distance between the examined point and the seed cluster was within a predefined threshold. Compared to region-growing, clustering methodologies are more robust and do not require the selection of seed points. However, their efficiency depends on the quality of the extracted features.

Hybrid segmentation approaches
So far, there is only a limited number of studies that have combined segmentation in the 2D and 3D space, either by projecting the 3D space onto an image and applying segmentation in the 2D space, or by deploying segmentation in both domains. With respect to the first category, the authors of (Kabaria et al., 2014), proposed to project the point cloud on a gray scale image and segment the 2D image with the help of an active contour segmentation algorithm, in order to identify and classify different coral reefs. In a similar vein, the authors of (Aijazi et al., 2016) transformed the 3D point cloud (object space) into an elevation image, with the help of mathematical morphology, in order to segment urban objects, such as sidewalks and facades of buildings.
More closely related to H-RANSAC are the works that refine the segmentation by incorporating information from the 2D and the 3D space. In this direction, the authors of (Hickson et al., 2014) first identified the object boundaries in the 3D space where depth discontinuities occurred and used this information as an initial segmentation. Then, image segmentation in the CIE LAB color space followed, while preventing any merging to occur across depth discontinuities. Segmentation both in 3D object and 2D image space was based on graph theory. In (Vetrivel et al., 2015), the authors proposed a work flow for building detection from point clouds sourcing from UAV images, which firstly detected planar segments in the point clouds and defined 3D Regions of Interest (ROIs). The 3D ROIs were afterwards identified in the images and hybrid segmentation was employed to them in order to refine the 2D segmentation results. The hybrid segmentation was conducted by applying a region-growing algorithm based on both structural and visual features. Finally, the authors of (Omidalizarandi and Saadatseresht, 2013) have also worked on identifying roof segments using this combined segmentation. First, they detect planar segments deploying a robust-fitting approach and afterwards, they adopt a region growing methodology both in the 3D space based on the direction of surface normals, and in the 2D image space based on intensity values.
In a similar vein, there have been also methods combing information from the 2D and 3D spaces aiming at semantic segmentation. These methods rely on learning-based approaches and require labelled training data. In this direction, (Landrieu et al., 2017) perform ensemble learning, based on 3D features from the point cloud and 2D features sourcing from the projection of points into a horizontally oriented plane. In (Serna and Marcotegui, 2014), elevation images are used and final results are obtained by the re-projection of images into the 3D point cloud. Objects are classified with the help of Support Vector Machines (SVMs). Finally, concerning solely facade parsing, different algorithms are presented in (Martinovic et al., 2015), (Gadde et al., 2017) and in (Jampani et al., 2015). These approaches aim at semantically segmenting the different parts of a facade into doors, windows, balconies etc, with the help of annotated data combining 2D and 3D features with learning-based approaches.
Contrary to H-RANSAC, these approaches incorporate learningbased approaches and rely on annotated date, while H-RANSAC attempts 3D segmentation without any prior knowledge or annotated data. The most similar approaches to H-RANSAC are the ones that do not require manual annotations. As seen from above, these methods typically rely on region-growing segmentation with features originating from both spaces. Compared to what has been proposed so far, this paper presents an extended version of RANSAC, which eliminates the main drawback of a baseline segmentation method based solely on RANSACs robust plane parameter computation (i.e. mis-segmenting co-planar segments) by incorporating visual information from the 2D space. The greatest advantage of H-RANSAC is that objects differing both in texture and in geometry can be separated. This is particularly useful in building delineation, where structural components of buildings are typically co-planar.

METHODOLOGY
H-RANSAC, consists of the following steps ( Figure 1). First, different planes are computed with the help of RANSAC (Fischler and Bolles, 1981). The planes are extracted from the point cloud that has been generated from the overlapping images. RANSAC is an iterative algorithm, where in each iteration aims to compute the parameters of a given mathematical model, ensuring that the solution presents the maximum inliers (i.e. in our case as inliers we consider the points assigned to the plane based on the distance between the points and the plane). In finding the plane with maximum inliers, RANSAC identifies a set of candidate planes by executing the same algorithm multiple times with different initial starting conditions. In order to incorporate the information from the 2D space, H-RANSAC modifies the candidate planes based on the 2D segmentation results. More specifically, for each candidate plane, H-RANSAC selects the most favorable view of the plane from the overlapping images, assuring that it depicts the whole plane, while radial distortions and occlusions are minimized. 2D segmentation based on k-means is applied on the RGB and spatial features of the optimal image and the plane is updated based on a consistency check. This check ensures that all points belonging to the same plane, also belong to the same segmented object in the 2D space. Then, the plane with the maximum number of inliers from the modified candidate planes is selected and removed from the point cloud. This process is repeated iteratively, starting with RANSAC based segmentation on the remaining point cloud, until there are no points left. Finally, after the 3D point cloud has been segmented, H-RANSAC refines the segmentation output by dismissing the small planes with points less than a predefined threshold and assigns them to neighboring objects.

Point cloud segmentation
For the detection of planes in the 3D point cloud, RANSAC forms multiple random subsets of three points which define the candidate planes. For each subset of three points, the distance of the remaining points from the plane is computed and the points with a distance lower than a predefined threshold are considered as inliers.

Image selection
Since the point cloud has resulted from overlapping images, each 3D region of interest is depicted in more than one image. Our objective is to select the most favorable image for 2D segmentation, i.e. the one that has the best chances of detecting the desired object. For this, the most favorable image is selected so as to depict the whole identified plane, while radial distortions and occlusions are minimized. Towards this goal, first we have to identify all the images that depict the whole examined plane.
In order to ensure that the whole plane is depicted, the 3D bounding box of the plane is computed and the four corners are projected onto the images through equation 1, using the exterior orientation of each frame. Equation 1 defines the wellestablished relationship between the 2D and 3D space, known as the collinearity condition. If the projected points fall within the dimensions of the image, the 3D region of interest is depicted in the image and the specific image is selected as a candidate for the image segmentation procedure.
where X, Y , Z refer to the 3D coordinates and x,y to the 2D coordinates. Xo, Yo, Zo represent the reference center of the 3D space and are equal to the linear parameters of the exterior orientation of each image, while xo, yo, c stand for the principal point coordinates and the focal length of the camera respectively. Finally, λ refers to the image scale and R to the rotation matrix.
After having identified the set of candidate images depicting the 3D object, it is important to select the most favorable view with respect to 2D segmentation, by overcoming the problem of unfavorable perspectives (i.e. extremely oblique views). The selection of the most favorable image is based on two rules. The first one depends on the distance between the projected center of gravity of the 3D object and the center of the image. More specifically, the preferred images are those for which the aforementioned distance is minimized, in order to reduce the effects of radial distortion. Furthermore, the most suitable view has to be as vertical as possible, to diminish occlusions. Thus, the angle between the camera axis and the normal of the examined planar object has to be minimized.

Image segmentation
Subsequently, the step of 2D segmentation is performed using kmeans clustering. k-means is an unsupervised way of clustering, initially proposed by (MacQueen, 1967), which classifies the input data into k classes, according to the distances between them. While clustering is mainly performed based on RGB values, in our case, both RGB and spatial features are employed to ensure spatial proximity of the clusters. Thus, each pixel is associated with five features in total consisting of the RGB values and the x,y location of each pixel, forming the following feature vector.
The pixels are clustered around centroids µi∀i = 1...k, which are obtained by minimizing the objective function: where there are k clusters Si, i = 1, 2, ..., k and µi is the centroid or mean point of all the points fj ∈ Si

Merging and refinement
A final refinement is applied to the 3D segmentation results in order to merge small objects that comprise of few points. To this end, the known DB-SCAN algorithm is applied (Density-based clustering), as in (Ester et al., 1996). With the application of DB-SCAN multiple clusters for each object are computed. Subsequently, distances are measured between the different cluster centers and small segments (comprised of points fewer than a predefined threshold) are assigned to the object that the closest cluster belongs to. The application of DB-SCAN algorithm is a crucial step towards the refinement of the final outcome, as an object with a uniform label might comprise of multiple not spatially connected segments.

H-RANSAC
Finally, H-RANSAC is presented in Algorithm 1.
Algorithm 1 H-RANSAC for i = 1 : k do

Dataset
For the assessment of the proposed approach the ISPRS / Eu-roSDR Benchmark for Multi-Platform Photogrammetry 1 and the Gerrard-hall theater dataset 2 have been used. A subset of 20 and 99 images were selected respectively for each dataset. 3D models have been formed using Pix4D mapper 3 and are depicted in Figures 2(a) and 3(a) respectively. Even though the aim of creating the ISPRS dataset was to assess the accuracy and reliability of methodologies for calibration and image orientation computation, as well as the success of different dense matching algorithms, this benchmark was deemed as one of the most appropriate for evaluating the results of our methodology, as it is widely established and was captured with state-of-the art sensor data. Similarly, the aim of creating the Gerrard-hall theater dataset was to assess the performance of indoor and outdoor alignment methods. However, the dataset was selected for evaluation purposes, as it consists of multiple overlapping images depicing a building, which constitute the dataset appropriate to measure the success of our proposed methodology.

Evaluation metrics
The success of the proposed segmentation methodology is quantified by comparing the resulting segments with reference data, i.e. ground truth resulting from manual segmentation, with the help of various evaluation metrics. Several metrics have been used for 3D segmentation, including Global Consistency Error (GCA) and Local Consistency Error (LCE), as proposed in (Martin et al., 2001). Consistency error is also introduced in (Chen et al., 2009) along with other metrics such us Cut Discrepancy, Rand Index and Hamming Distance. While (Vo et al., 2015) and (Yan et al., 2014) use Precision, Recall and F1-measure, as originally introduced in the field of information retrieval (Manning, 1995).
In line with (Vo et al., 2015) and (Yan et al., 2014) we have decided to use Precision, Recall and F1 measure in our experiments. Precision refers to the percentage of correctly retrieved elements, as it results from the number of True Positives (TP), divided by the sum of True Positives (TP) and False Positives (FP): In the context of measuring 3D segmentation results, precision may be declared as follows. Let mj be a segmented object in the ground truth dataset and ai its extracted corresponding object from the proposed methodology. Any point p k ∈ (ai ∩ mj) is counted as True Positive and any point p k ∈ (ai \ mj) is counted as a False Positive. In other words, the false positives represent all points in segment ai that do not have a corresponding point in the reference segment mj.
Recall is the percentage of the ground truth elements that were correctly retrieved and is sensitive to the existence of ground truth segments that were not successfully detected. In mathematical terms, recall is expressed as: Where TP refers to the True Positives and FN to the False Negatives. As explained with mj being the ground-truth segmented object and ai its corresponding object extracted from the proposed methodology, FN is calculated as the number of points where p k ∈ (mj \ ai). Thus, False Negatives are the points in the reference segment mj without a correspondence in ai.
Finally, the F1-score combines the two afore-mentioned measures and may be used as a unique index, showing the capacity of the proposed methodology. F1-measure is computed from Equation (6):

Ground truth data
The ground truth data was segmented manually, in order to assess the success of the proposed hybrid approach. In generating the ground truth segmentations we have used the convention of clearly identifying the different structural components of a building, such as windows, doors, main facades, roofs, pipes, etc. In Figures 2(b) and 3(b), the manually segmented structural components are depicted, for Datasets 1 and 2 respectively. These ground truth segments were used compare the performance of RANSAC and H-RANSAC, with the help of Precision, Recall and F1-score, as defined in Section 4.2.

Identification of corresponding object
From the above, it is clear that in order to assess the efficiency of the proposed methodology, it is necessary to identify the corresponding objects between the algorithm-produced and the manually-generated segments. Thus, a pair for segments (ai, mj) are identified as corresponding if mj mostly overlapped ai and ai mostly overlapped mj.

EXPERIMENTAL RESULTS
This section presents quantitative and qualitative results, comparing H-RANSAC with a baseline consisting exclusively of the RANSAC plane-fitting segmentation algorithm. It is important to note that performance is measured on the basis of successfully segmenting the different parts of the building facade, without assigning any semantic labels on the identified segments. However, in order to make the presentation of our results easier for the reader, we refer to the different segments of the scene with their semantic labels.

Qualitative results
Figure 2 presents the results for Dataset 1, consisting of the original dataset 2(a), the ground truth segmentation 2(b), the results of RANSAC plane-fitting segmentation 2(c) and the results of H-RANSAC 2(d). RANSAC threshold was set through trial and error over the available data, making sure to avoid oversegmentation. It is evident from Figure 2(c) that the RANSAC based approach failed to separate the important structural components of the buildings, i.e. the windows, as they are almost coplanar with the main facades of the buildings. Moving on to the results of H-RANSAC, our hybrid procedure has successfully segmented the structural components of the buildings, since the windows have been successfully seperated from the main building.
With respect to Dataset 2, results are depicted in Figure 3, consisting of the original dataset 3(a), the generated ground truth 3(b), as well as the results of RANSAC plane-fitting 3(c) and H-RANSAC 3(d) segmentation. The RANSAC based segmentation has segmented Dataset 2 into the two main facades, failing to delineate the doors and the windows of the building. On the other hand, after visual inspection of Figure 3(d), it is evident that H-RANSAC successfully delineated parts of the structural components of the building (i.e. the doors, the windows and the two main facades).

Quantitative Results
Quantitative results are presented in Table 1 (Dataset 1) and Table 2 (Dataset 2) for both the H-RANSAC and RANSAC methods. Since there is no one-to-one correspondence between the ground truth segments and the segments generated by RANSAC or H-RANSAC, some ground truth segments may not have a corresponding part in the segments produced by the algorithms. In these cases, values for the evaluation metrics are not available, and are marked with (-).
Concerning the results tabulated in Table 1 and with respect to the facade 1 of the building, the RANSAC based segmentation presents lower precision, as expected from the visual inspection, since a lot of points belonging to the windows are segmented as points of the facade 1 (False Positives). Contrary, H-RANSAC appears to have identified less FP, as it has successfully segmented bigger parts of the windows and thus achieves higher precision for facade 1. On the other hand, the recall of RANSAC plane-based detection is higher than the hybrid procedure, since H-RANSAC has segmented larger segments from the facade 1, some of them being False Negatives.
The performance of both algorithms is rather similar for facade 2, while most differences are observed in the case of windows. H-RANSAC achieves better segmentation of windows and thus presents a much higher recall with only marginal reduction of the precision. This is reflected in the value of F1-measure that is considerably higher than RANSAC. Furthermore, with respect to the other structural components, H-RANSAC has not segmented the water pipe, the roof and the facade 3 as they have been assigned to larger objects of the scene, during the merging step. However, as we can see in Table 1 the scores achieved by RANSAC for these structural components are particularly low, which makes them inappropriate for making any comparisons. Summarizing the results obtained for Dataset 1, we can see that both algorithms perform similarly in detecting facades 1 and 2 and very poorly (not suitable for comparisons) for roof, water pipe, and facade 3. Their main difference derives from the ability to segment windows, which constitute one of the most important structural component of city buildings.
In a similar vein, the results for Dataset 2 are presented in Table 2.
Concerning the two main facades, H-RANSAC achieves higher precision than RANSAC, as the latter returns a lot of False Positives (i.e. points belonging to the doors and windows that are assigned to the two main facades). On the other hand, the RANSAC based segmentation achieves a higher recall rate for the two main facades, similarly to Dataset 1, since H-RANSAC is characterized by more False Negatives. As also observed in Dataset 1, the main advantage of H-RANSAC is the successful segmentation of doors and windows. More specifically and with respect to F1-scores, H-RANSAC and RANSAC perform similarly concerning the segmentation of the main facades. On the other hand, H-RANSAC outperforms the conventional plane-fitting method when referring to the main structural components of the building, such as windows and doors. Indeed, H-RANSAC succeeds in delineating the doors and the windows with an F1-score of 0.7656 and 0.5991 respectively, compared to a total failure of the baseline approach.

CONCLUSIONS
This paper proposes a novel approach for segmenting point clouds that result from multiple overlapping images. H-RAN-SAC performs segmentation incorporating information from both the 3D and 2D space and combines the results in the final outcome. By leveraging geometrical and visual features, objects belonging to the same mathematical surface (in our case planes), can be successfully delineated in different segments. The overall effectiveness of H-RANSAC has been evaluated in the domain of city buildings and facade segmentation. More specifically, doors and windows constitute a typical challenge in building delineation, as they are coplanar with building main facades. Experimental results, outlined in Section 5, have proved the effectiveness of H-RANSAC in successfully segmenting facades into different parts without resorting to learning-based approaches.
Concerning the limitations of our approach and our intentions for future improvements, in this experimental study we have been mainly concerned with small-scale objects that typically fit in the view of a single image. However, this may not be the case for large-scale objects that extent across the view of two or more images. To overcome this shortcoming, we aim to create and use panoramic images covering large-scale objects that will be generated by performing automatic stitching of multiple images. Moreover, in order to avoid trial and error in setting the RANSAC threshold on the distance of the points considered as inliers, we plan to investigate how to automatically set this value by applying different thresholds and examining the residuals of each threshold. Finally, H-RANSAC could be easily extended for segmenting objects belonging to spheres, curves and other mathematical surfaces.