AUTOMATIC POINT CLOUD SEGMENTATION FOR THE DETECTION OF ALTERATIONS ON HISTORICAL BUILDINGS THROUGH AN UNSUPERVISED AND CLUSTERING-BASED MACHINE LEARNING APPROACH

: The article describes an innovative procedure for the three-dimensional analysis of decay morphologies of ancient buildings, through the application of machine learning methods for the automatic segmentation of point clouds. In the field of Cultural Heritage conservation, photogrammetric data can be exploited, for diagnostic and monitoring support, to recognize different typologies of alterations visible on the masonry surface, starting from colour information. Actually, certain stone and plaster surface pathologies (biological patina, biological colonization, chromatic alterations, spots,...) are typically characterized by chromatic variations. To this purpose, colour-based segmentation with hierarchical clustering has been implemented on colour data of point clouds, considered in the HSV colour-space. In addition, geometry-based segmentation of 3D reconstructions has been performed, in order to identify the main architectural elements (walls, vaults), and to associate them to the detected defects. The proposed workflow has been applied to some ancient buildings’ environments, chosen because of their irregularity both in geometrical and colorimetric characteristics.


INTRODUCTION
The present research aims at investigating methods for the semantic segmentation of point clouds, to detect and analyse decay phenomena affecting masonry fabrics surfaces. The work is developed within a wider research context, focused on innovative procedures for the digital documentation and preservation of Cultural Heritage. In this field, the knowledge of the state of conservation of a building and the pathologies of its architectural elements, are fundamental for the identification of coherent interventions and maintenance. Currently, the decay mapping is a manual practice, with low accuracy and huge amounts of time, due to the complexity of architectural heritage environments. The wide diffusion of digital survey techniques, such as close-range photogrammetry and laser scanning, leads to a significant improvement in the survey of ancient buildings ( Remondino, 2011;Aicardi et al., 2018;El-Din Fawzy, 2019). In Computer Aided Design, reverse engineering consists in the measuring, analysing and testing a real object, in order to virtually reconstruct it into a 3D model (Wang, 2011). Particularly, it consents to convert recording data (from photogrammetry or laser scanning) into point clouds, enclosing geometrical coordinates and RGB colour values. A further and less explored development concerns the exploitation of 3D data in restoration and maintenance, to automatically extract information about the state of conservation of the fabric, via a semantic segmentation of point clouds.
In literature, segmentation methods are employed for architectural element detection (Nguyen and Le, 2013;Grilli et al., 2017), to overcome manual practice limitations and to reduce editing. On the contrary, point cloud segmentation is rarely adopted for decay detection (Valero et al., 2019;Xu et al., 2020). In these studies, segmentation algorithms act on two kinds of properties of raw data: geometric features (3D coordinates, normal vectors, derivatives) and colorimetric attributes (RGB colour values).
Specifically, geometric segmentation is mostly deployed for detecting building components through geometric primitives within 3D point clouds (edge-based, region-growing, model fitting, hybrid, machine learning segmentation) (Hackel et al., 2016;Valero et al., 2018;Grilli et al., 2019;Hamid-lakzaeian, 2020;Teruggi et al., 2020;Croce et al., 2021;). In Zhan et al., (2009) colour information has been used to improve the geometric pipeline, for the identification of objects with the same direction but different colours.
On the other hand, colour-based segmentation has been performed mainly on 2D images (Malinverni et al., 2017;Vorobel et al., 2021), or in some cases on 3D data (Valero et al., 2019;Galantucci et al., 2020), for the detection of decay patterns recognizable for their predominant colours or their chromatic differences. Usually standard colour information of images or three-dimensional data is expressed in RGB, which is not a suitable space for a colour-based segmentation, because the spatial proximity, corresponding to the geometric distance between colour-values, is not coherent with the perceptive similarity among colours (Sonka et al., 2014;Gonzalez and Woods, 2018). For this reason, it is necessary to consider other colour-spaces (like HSV, YCbCr, YIQ, YUV, …) where the human perception is taken into account and the two features (geometric distance and perceptive similarity) are related (García-Lamont et al., 2016) .
The proposed approach combines and tests colour-based and geometry-based segmentation methods, in order to identify and analyse together decay evidences and the main architectural elements (walls, vaults, etc.) in a point cloud. The outcomes should be chromatic morphologies corresponding to decay patterns (biological patina, biological colonization, chromatic alterations, spots,...) (ICOMOS, 2008), associated to architectural component surfaces.

METHODOLOGY
Point cloud segmentation can be achieved through a plurality of methods, diversified according to the data grouping criteria, on the basis of some properties or features (like geometry, colour, size, shape, scale patterns,...). The main approaches could be classified into several categories: standard ones, based on the principles of discontinuity (edge-based methods) or similarity (region growing); model-fitting methods, performed by mathematical models (RANSAC, Hough Transform,…); and machine learning applications, Artificial Intelligence algorithms, which make predictions on empirical training data (k-means clustering, hierarchical clustering,..) (Nguyen and Le, 2013;Sonka et al., 2014). In the present research, some of the above-mentioned approaches have been combined, for an automatic historical building point cloud segmentation, as illustrated in the methodological workflow ( Figure 1):  Colour-based segmentation, through machine learning (clustering algorithms) applied to colour attributes of point clouds, considered in different colour-spaces, to distinguish various typologies of chromatic alterations;  Geometry-based segmentation, by model fitting algorithms, for the identification of the kind of architectural element on which decay evidences are detected.
The procedure for the colour-based segmentation, explained in Section 2.1, has been implemented in MathWorks ® MATLAB.

Colour-based segmentation
The colour-based segmentation is structured as a machine learning application, to the purpose of analysing various typologies of surface decay (biological patina, biological colonization, chromatic alterations, spots,...). Machine learning is appropriate to exploit colour properties connected to dense point clouds, for the recognition of chromatic alterations, as a result of an unsupervised learning on the training data (point clouds). In the case of masonry surface pathologies, given the complexity and heterogeneity of their characteristics, the great advantage is that there is no need for pre-fixed labels, because they are introduced by the algorithm itself. The outputs are clusters of points (groups of similar examples data), segmented isolating different colour ranges, which correspond to the decay patterns. Among these methods, some clustering algorithms have been examined, such as minimum Euclidean distance, k-means and hierarchical clustering, in order to find the most suitable to the specific goal. The first segments the point cloud into clusters, on the basis of the Euclidean distance between points. The clustering is made considering a minimum distance between clusters, established in advance. The k-means is an iterative, data-partitioning algorithm, to classify the data set, into an "a priori" fixed number of clusters (k), defined by their centroids. It is an exclusive method because each of the n observations is assigned to only one of the k clusters. The algorithm starts randomly choosing k initial cluster centres (centroid) and computes the distances between each point and each centroid. In every iteration of the procedure the centroids vary their position, to minimize the total within cluster variance (the average distance of the observations in a single cluster from the cluster mean) (Hastie et al., 2008).
On the contrary, hierarchical clustering do not depend on a prior choice of the minimum distance between clusters or the number of clusters, as in the algorithms previously described. It organizes data on its own, in hierarchical representations like cluster trees or dendrograms. Each level of the dendrogram collects groups of data with similar characteristics, and it results from the combination of the clusters at the lower level. Therefore, the whole structure consists in an ordered sequence, which allow the user to define the pruning of the dendrogram, on the basis of the specific application. There are two main kinds of strategies for hierarchical clustering:  Agglomerative approaches, which start with every observation in its own cluster, and, at each level, pairs of clusters are merged into one, moving up the hierarchy recursively. The choice of the pairs is made, according to the smallest dissimilarity between clusters.  Divisive approaches, which begin at the top of the hierarchy, grouping all the observation into a single cluster. They proceed splitting groups of data with the largest dissimilarity between clusters.
In this study, an agglomerative approach is adopted, with the Ward's minimum variance method for the computation of the distance between clusters. It is a recursive algorithm, which starts from the squared Euclidean distance between singleton clusters (clusters composed by a single point), and at each step achieves the pair of clusters that leads to minimum increase in total inner-cluster variance (Ward, 1963). The Ward's minimum variance method is based on the minimization of the total innercluster variance, defined as the sum of the squares of distances among all the elements in the cluster and its centroid, according to the following equation (1): where xr, yr, zr are the coordinates of the centroid of cluster r xs, ys, zs are the coordinates of the centroid of cluster s nr and ns are the number of elements in clusters r and s To achieve a colour-based segmentation, the hierarchical clustering algorithm is applied to chromatic data of point clouds (RGB triplets associated with each point). However, in RGB (red, green and blue), an additive colour model, the metric distance does not correspond to the colorimetric distance between colours. The lighting and shading factors influence the colour perception, producing a mismatch between proximity in RGB space and perceptive colour closeness (Zhan et al., 2009).
Hence, different colour spaces have been examined for the purpose of identifying the most accurate one: a. HSV (hue, saturation, value); b. YCbCr (luma component, blue and red difference chroma components); c. YIQ (luma component, in-phase, quadrature); d. YUV (luma component, blue projection, red projection).
In these colour-spaces, unlike the RGB, the perceptive similarity is more proximate to the Euclidean distance between colour triplets, because they take human perception into account. They are defined by mathematical coordinate transformations from an associated RGB colour space (Gonzalez and Woods, 2018). In HSV, hue corresponds to the colour's position on a colour wheel and is related to the colour transitions from red to orange, yellow, green, cyan, blue, magenta, and finally back to red. Saturation ranges from unsaturated (shades of grey) to fully saturated (no white component). While value coincides with brightness. Indeed, in HSV, unlike in RGB, the chromaticity is detached from the intensity (García-Lamont et al., 2016). YCbCr, YIQ and YUV belong to the family of luminance/chrominance colour spaces, which allow the use of reduced bandwidth for chrominance elements. YUV is defined through a luminance component (Y) and two chrominance components (UV), representing the deviations of blue and red from the luminance. YIQ is analogous to YUV, but it is meant for the analogic television signal. On the contrary, in YCbCr the chrominance characteristics correspond to the deviations of blue and red from luminance expressed in greyscale (Gonzalez and Woods, 2018).

Geometry-based segmentation
In the interest of associating decay patterns to specific architectural elements, an additional step could be implemented in the workflow, concerning a geometry-based segmentation of the original point cloud, through the application of shape detection or best fitting algorithms. In this case an efficient RANSAC (RANdom SAmple Consensus) algorithm is adopted (Schnabel et al., 2007). It works constructing candidate shape primitives (planes, cylinders, spheres,…), in correspondence of randomly selected minimal sets from the source data. These sets are constituted by the smallest number of points required to uniquely define a geometric primitive. The primitives are verified for all points in the dataset, to understand how many of them they can approximate. The procedure is recursive, and it stops with the extraction of the shape approximating the major number of points. The limit of acceptance is related to a predetermined probability that there is no better candidate for the considered set of points. The remaining data are tested against a new primitive, following the same scheme. In summary, a series of parameters are defined to adapt the procedure to the analysed data, thus varying tolerances of inclusion for the segmentation: • kind of primitive (plane, sphere, cylinder, cone, torus); • minimum number of points to uniquely define a primitive; • maximum distance between the set of points and the primitives; • sampling resolution (distance between neighbouring points in the data); • maximum angular deviation between point normal and primitive; • probability that no better candidate exists.

CASE STUDY
The methodological pipeline has been applied to three case studies, consisting in three diverse architectural volumes, pertaining to ancient masonry buildings with typical materic and morphological characteristics (Figure 2). The environments are denoted by an irregular 3D development and extended forms of surface decay. In particular, the cases are:

Colour-based segmentation
The hierarchical algorithm illustrated in Section 2.1 has been applied in the four colour-spaces (HSV, YCbCr, YIQ, YUV) and then compared to understand in which of them there is an optimal separation among colour ranges, corresponding to the various kinds of pathologies, as emerged from the on-site inspection and the ground truth (produced by a manual segmentation of the original point cloud). Consequently, starting from the cluster tree, the appropriate level to cut the dendrogram has been found ( Figure 3). As a result of the experimentation, the proper number of clusters to be considered is 5, because an inferior number is not sufficient to distinguish the variations of the chromatic components, while a superior number generates too small clusters with negligible extension. In Figure 4 there is an example of the application of the hierarchical clustering to Case 1 and Case 3, with a pruning level of 3 clusters. It is possible to observe that in both cases different colours have been grouped together, producing an unacceptable result.  For each colour space, 5 clusters have been generated, for a total number of 20 clusters. After a qualitative comparison of the 20 clusters in pairwise (250 collations), on the basis of their chromatic correspondence, HSV appeared to be the most performing in the isolation of colours ( Figure 5).
To validate this hypothesis, for each of the five clusters segmented in HSV, an analogous one has been identified in the other three colour-spaces, through the evaluation of the clusters' histograms (RGB distribution evaluated both with the three channels split and unified). Taking into account the overlapping, only those clusters with the highest percentage have been considered as analogous (Figure 6 left), while clusters with a little percentage have been excluded (Figure 6 right).  A further passage concerns the overlapping of the four analogous clusters' histograms with the ground truth (manually segmented clusters), as illustrated in Figure 7. The HSV histograms are the ones with the most similar trend with respect to the corresponding ground truth. In fact, on the one hand in the other colour spaces more points are included in the same cluster, containing parts that are not consistent with the analysed alteration (Figure 7 left); on the other hand, in HSV the detected colour range is wider (Figure 7 right), because unlike in the other spaces (YCbCr, YIQ, YUV), the luminance component doesn't outweigh the chrominance ones. Indeed, the resulting segmentation in these colour spaces splits the same colour range and the corresponding decay pattern in more than one cluster. Also, the comparison of the number of points and the related percentage of each analogous cluster of the four colour spaces with its equivalent ground truth, confirmed that HSV is able to better isolate specific colour ranges, associated with the decay patterns, and it is more reliable both in terms of colour interval and in terms of extension (number of points; area, based on the average point density; extension-percentage) ( Table 2). From Table 2, it is possible to observe that, in the five classes (moist area, biological colonization/patina, spots/deposit, unaltered surface, staining), the extension in percentage has a maximum variation of 1% between the ground truth and HSV, while in the other colour spaces the diversity range is wider. As a consequence, it was possible to quantify the extension of each decay pattern in the three case studies:  Case 1: biological colonization (15%); biological patina (14%); unaltered surface (26%); moist area (27%); deposit (17%);  Case 2: moist area (10%); biological colonization/patina (18%); spots/deposit (10%); unaltered surface (57%); staining (6%);  Case 3: biological patina (18%); moist area (26%); biological colonization (20%); deposit (19%); staining (13%).
In figure 8, there is an example of the overlapping of areas and edges extracted from one HSV cluster (biological patina), to the original point cloud, verifying the coherency of the distribution of the obtained segmentation.  Table 2 Case 2 -point cloud segmentation: ground-truth (manually labelled portions); HSV clusters; correspondent clusters in the other colour space

Geometry-based segmentation
Also, the geometric segmentation has been applied to the three case studies. The outcomes of RANSAC are point clouds of the detected primitives, corresponding to the main architectural elements. In Figure 9, a graphical representation of the application to one of the architectural volumes is proposed, where only primitive shapes like planes and cylinders have been searched. The parameters illustrated in Section 2.2 have been defined as follows: • minimum number of points per primitive = 2000; • maximum distance to the primitives = 0.02; • sampling resolution = 0.034; • maximum normal deviation = 25,00°; • overlooking probability = 0.01.
The geometric segmentation has extracted six cylinders (the panels of the cross vault) and two planes (the perimetral vertical walls). In Figure 10, for the two detected planes, the chromatic segmentation led to the identification of some forms of alterations: in the first case, for the left-side wall, the clusters correspond to biological colonization/patina and unaltered surface; while in the second case, for the right-side wall, it was possible to separate the biological colonization/patina from the moist area (visible from the altered colour).The detected areas are consistent with the ones obtained from the colour-based segmentation on the entire point cloud.

Figure 9
Application of the RANSAC to the dense point cloud.

CONCLUSIONS
The present research proposes a colorimetric and geometric analysis and segmentation of 3D point clouds, for diagnostic purposes in the Cultural Heritage domain, through the application of point cloud processing and machine learning.
For the colour-based segmentation, different clustering methods have been investigated, and among them the hierarchical clustering has been preferred. The HSV colour-space is the most consistent with the purposes of the colour segmentation, because it proves to be efficient in the accurate identification of a plurality of chromatic decay morphologies, both in terms of extension and colour interval. The application to three selected case studies enabled the validation of the proposed methodology, detecting a series of chromatic alterations on the masonry surface, previously recognized through a visual inspection of the environments. The advantage of this approach is the possibility to achieve both a qualitative and quantitative analysis of different morphologies of surface alterations, starting from 3D data, with semi-automatic procedures, in support of diagnostic activities. On the other hand, the geometry-based segmentation allows the association of the detected decay patterns to the architectural elements, on which they are located.
On the contrary, a limitation of this procedure could be the difficulty to distinguish and isolate chromatic alterations on decorated surfaces, like frescoes, temperas or wall papers. As future remarks, the methodology could be tested on case studies with different characteristics, in terms of finishing materials and decorative apparatus.