TESSERAE3D: A BENCHMARK FOR TESSERAE SEMANTIC SEGMENTATION IN 3D POINT CLOUDS

3D point cloud of mosaic tesserae is used by heritage researchers, restorers and archaeologists for digital investigations. Information extraction, pattern analysis and semantic assignment are necessary to complement the geometric information. Automated processes that can speed up the task are highly sought after, especially new supervised approaches. However, the availability of labelled data necessary for training supervised learning models is a significant constraint. This paper introduces Tesserae3D, a 3D point cloud benchmark dataset for training and evaluating machine learning models, applied to mosaic tesserae segmentation. It is a publicly available, very high density and coloured dataset, accompanied by a standard multi-class semantic segmentation baseline. It consists of about 502 million points and contains 11 semantic classes covering a wide range of tesserae types. We propose a semantic segmentation baseline building on radiometric and covariance features fed to ensemble learning methods. The results delineate an achievable 89% F1-score and are made available under https://github.com/akharroubi/Tesserae3D, providing a simple interface to improve the score based on feedback from the research community.


INTRODUCTION
Mosaics are decorative art formed by individual entities called tesserae. They are usually of cubic or irregular shape and made of stone, glass, ceramics, metal, or other organic material. These are then assembled using mortar to obtain patterns or figures. As the majority of the mosaics of interest are very old, dating thousands of years (like the mosaic of Germigny-des-Prés which dates from the end of the 8-9th century), the digitization of these masterpieces permits to represents the base element for documentation, preservation, and valorisation (Adami et al., 2018;Ajioka and Hori, 2014;Doria and Picchio, 2020;Placa et al., 2020). In architecture, mosaics are used to decorate soils, walls, or ceilings, including non-flat surfaces which makes specific 3D capture techniques central to their description. Then, segmentation -on which specialists ground their studies through manual operation, visual interpretation and manual drawing (Benyoussef, 2008)-comes to individualize each tessera to attach key information. This is mainly its shape and colour, its composition and material, its place of origin, its dating and state of conservation. This delineates a strong need to speed up the extraction and interpretation of individual tesserae, which is challenging and fascinating giving their uniqueness.
We propose to extend the few existing mosaic segmentation approaches, specifically over 3D point clouds. Existing research on the topic is carried solely on 2D image coupled with machine learning algorithms (Felicetti et al., 2018), or over 3D point clouds but using rule-based deterministic approaches (Florent To briefly summarize, the main contributions of this work are:  A point-wise labelled mosaic tesserae point cloud dataset for semantic segmentation.  A standard baseline for segmentation using random forest and gradient boosting multi-classifier.
The rest of the paper is organized as follows: Section 2 presents a description of existing works about mosaics segmentation. Section 3 describes the objectives and the segmentation baseline. Section 4 gives details about the Tesserae3D dataset and its specificities. Section 5 presents results, and section 6 discuss experimental results and draw future works for this field of research. The conclusion is presented in Section 7.

RELATED WORKS
In cultural and architectural heritage, digitalization is an important tool, and an effective way for documentation, preservation, interpretation (Adami et al., 2018;Doria and Picchio, 2020), and visualization (Kharroubi et al., 2019(Kharroubi et al., , 2020. It became necessary when dealing with rare and unique masterpieces as mosaics. Mosaics digitization has the purpose of producing digital copies of these pieces called 'digital replica' which allows specialists to carry out studies, comparison, and digital investigations in a non-destructive manner. Furthermore, it serves as a record of their state of conservation. An important advance in terms of laser scanner and photogrammetry for reality capture was made possible for specialists for this objective. But, apart from the acquisition of 3D models, actions like segmentation, information extraction, and detection of tesserae remain manual or automatic based only on 2D images. Early work for automatic detection of tesserae (Benyoussef, 2008) proposes a tessella-oriented strategy whose first step consists of isolating tesserae from its cemented network by computing the watershed transformation of a criterion image generated to exhibit the cement network as watershed crests. Then a simple k-means algorithm is used to classify tesserae and segment mosaic images with more accuracy than with a pixeloriented strategy. Additionally, they propose a method to automatically get the main directional guidelines of mosaics by estimating tesserae orientation. In (Bartoli et al., 2017), for the same purpose which is tesserae detection, the authors presented a completely unsupervised approach. It consists of a deformable model to overlap the mosaic and adapt to the actual shape of each tessera, using Genetic Algorithms which evolves a fixed-size set of candidate segmentation according to a multi-objective optimization algorithm. Felicetti et al., (2018) used U-Net deep learning approaches for tesserae segmentation using 2D images. Their method shows high accuracy and better generalization to segment mosaic floor tesserae of the church of S. Stephen in Umm ar-Rasas, a Jordan archaeological site. In recent works, the same authors proposed for the same study site a Mo.Se algorithm (Mosaic Segmentation). It uses a deep cascading learning network for automatic segmentation coupled with images processing techniques (Hierarchical Watershed Algorithm) for refinement, to obtain a vector representation of the mosaic. Their goal was to manage results in a geodatabase for understanding the evolution of the iconographic repertoire (Felicetti et al., 2021). Fenu et al., (2020) also proposed to use a deep learning technique based on U-Net that proved to be effective in segmentation tasks. It's a convolutional neural network, to perform segmentation of the mosaic images so that each segmented region precisely matches a tessera of the mosaic and processes the image at the pixel level.
Recent research trends using learning-based methods have seen a transfer from 2D (image-based) to 3D (point clouds) semantic segmentation, with architecture such as PointNet (Qi et al., 2017), KPConv (Thomas et al., 2019), and RandLA-Net (Hu et al., 2020). Other works on mosaic segmentation using 3D point cloud has tried to get around the lack of training data by adapting an unsupervised clustering process for segmentation, and a reasoning ontology for classification of tesserae using userspecified rules (F. Poux et al., 2017aPoux et al., , 2017b. However, for mosaic segmentation using supervised approaches, there is, to our knowledge, no existing open access point clouds dataset. The latter is one of the key elements to build performant, generalizable, and at-scale machine learning models.

OBJECTIVES
Given a set of 3D points describing the geometry and colour of a tessera, we want to infer one individual class label per point. We provide a baseline method that is meant to represent a typical approach robustly used for the task (Figure 2). We follow a paradigm of covariance feature extraction at multiple scales (3.1), then a feature selection step (Erreur ! Source du renvoi introuvable.) followed by multi-class classification with a Random Forest (RF) and Gradient Boosting (GB) classifiers (3.3).

Feature extraction
Classification is based on the abstraction of data into coherent indicators and feature descriptors that can describe the essential information, both geometric and radiometric, to enable accurate processing and analysis. This challenge remains highly contextual because it aims to detect relevant objects in a specific context (small tesserae separated by very similar mortar joints). So, it is necessary to understand which descriptors should be used to recognize an object composed of several points with attributes in a scene.
We followed a feature extraction process based on the work of (Weinmann et al., 2015), which proposes a general workflow to classify 3D scenes based on the geometric feature. After the recovery of different local neighbourhoods (E.g. a spherical neighbourhood NS, 1mm, where the sphere is centred at point Xi and has a radius of 1 mm), we extracted geometric features by considering the spatial arrangement of neighbouring points. Since the joints between the tesserae vary by 1 mm or more, we chose three small radii (1 mm, 2 mm and 3 mm) in order not to lose the structure of neighbourhoods and to have a representative geometric feature of context.

Feature selection using random forest
The objective of feature selection is to reduce the potential redundant features and noise, speed up the calculations, and identify a small set of discriminative features that can still achieve a good predictive performance. We use the Embedded selection method (tree-based strategy) considered in the RF classifier (Pedregosa et al., 2011). It calculates feature importance using node impurities in each decision tree, and the final feature importance is the average of all decision tree feature importance. Then, we select the most important features, at a specific radius, using the training set. Selected ones are presented in Figure 3 and categorized as:  Radiometric features: Red, Blue, Green, and the average of the three.  Covariance features: To describe the geometric distribution of the points and highlight the discontinuities between the tesserae, we chose surface density, linearity, planarity, and Omnivariance.

Explicit feature classifiers
While within standard supervised machine learning approaches (Random Forest classifier, Gradient Boosting classifier, Support Vector Machine classifier, Bayesian Discriminant Analysis classifier), the choice of features depends primarily on the operator, deep learning methods can learn features on their own, as part of the process of training on a huge set of data. This ability to learn is considered one of the main causes of the rapid progression of benchmark classification results, such as PointNet (Qi et al., 2017), PointNet++ improvements and KPConv (Thomas et al., 2019). However, deep learning uses neural networks with many hidden layers, powerful computing resources, and a large amount of annotated data. In this regard, the availability or not of the data may increase or limit the application of deep learning approaches. As in our case of studies in which, we know the relevant characteristics and we search for explainability. We chose random forest and gradient boosting models (Pedregosa et al., 2011). Which provide in general a good predictive performance, low overfitting, and easy interpretability, with a reasonable annotated dataset (Becker et al., 2018) (in our case 30% of the dataset is annotated). The random forest classifier uses bagging as an ensemble method and a decision tree as an individual model. The set of trees is referred to as a forest. Bagging implies selecting samples from the training subset and trains the trees representing individual classifier. Internal cross-validation technique, using the Gini index, measures the performance of RF and selects the best ensemble through voting. A recent comparison of several respective classifiers relying on different learning principles reveals that a Random Forest classifier provides a good trade-off between classification accuracy and computational efficiency (Weinmann et al., 2015).
Gradient boosting is another type of machine learning boosting multiclassifier. It relies on the intuition that the best possible next model, when combined with previous models, minimizes the overall prediction error. While the training stage is parallel for bagging (i.e. each model is built independently), boosting builds the new learner in a sequential way.

Case study
The village of Germigny-des-Prés (Loiret, France) holds one of the oldest church in France dating from the beginning of the end of the 8-9 th century (Caillet J.-P., 2016;Sapin, 2019). Theodulf, close counsellor to Emperor Charlemagne, bishop of Orléans and abbot of Fleury possessed there a villa that he transformed and where he built an oratory that became the current church of the village (Freeman and Meyvaert, 2001) (Figure 4).
In the eastern apse, mosaics were discovered under plasters in the 19 th century. One of the vaults representing the Ark of the Covenant passing the river Jordan was restored. Two cherubs stand on the ark. They are surrounded by two angels with wings crossed. Between these, a hand goes down from even. It is interpreted as the hand of God as illustrated in Figure 5. The mosaics of the arches under the vault are smaller, and they pictured floral motifs that were not restored (Croutelle, 2019).
Besides their rich iconographic and cultural contents, these mosaics are a unique opportunity to study this form of art as well as the material of the small cubes that form them. Indeed, most of the tesserae are made of glass that is not often preserved in an archaeological context, and that remains rare for this period. Though this material is of prime interest for historical sciences because, during the Late Antiquity and early Middle Ages, it was made in the eastern Mediterranean, and it can give clue for the exchanges and long-distance contacts that have existed. The mosaic of Germigny-des-Prés is composed of tesserae from different materials, sizes and periods. Some of the tesserae were immediately recognized as a production of the 19 th century, such as the golden squares or others very regular cubes cut mechanically. Tesserae were also made of glazed ceramic that did not exist in the Carolingian period (Van Wersch et al., 2019).

Dataset description
Tesserae3D is a mosaic tesserae 3D point cloud dataset acquired by a phase-based calibrated terrestrial laser scanner merged with dense image matching. We used scan data from a Leica P30 scanner, acquired from one position high enough using an extended mounted tribrach, the scan point cloud was composed of 87.5 million points (scan only). Indeed, data fusion which consists to"combines data from multiple sources to improve the potential values and interpretation performances of the source data, and to produce a high-quality visible representation of the data" (Zhang, 2010) prove to be essentials in this case to enhance the density of point cloud. Pictures were taken with a camera, type Canon EOS 5D Mark III with a full-frame sensor and a 35 mm camera lens. For implementing the so-called Multi-View Stereo method, an amount of 2.058 shots was captured, with 5760 * 3840 pixels in RAW. They were used to obtain a more dense 3D point cloud of all tesserae. This dataset covers approximately 9.3 m² (F. Poux et al., 2017a) and consists of approximately 502 million points (fusion results between scan and dense 3d reconstruction from images) which represents an average density of 54 million pts/m² as illustrated in Figure 6. The dataset is divided into 10 sections, and each section is saved separately in .txt and .laz files. These are available at https://github.com/akharroubi/Tesserae3D for visualization and download. In Figure 7 we illustrate an overview of the approximate boundary of each section.

Data annotation
Ground truth annotation is performed at the point level, where each point is manually assigned a corresponding class of tesserae (to not bias annotation by any algorithm use). An unclassified label is used to only focus the evaluation on the portion of the point cloud that has been classified only, which represent until now 30% of the total mosaic. The manual annotation is performed using the segment tool within the open-source software Cloud Compare (CloudCompare, v2.11, 2020). After manual segmentation, a class index is associated with every category of the point clouds.
While this annotation is done manually, we have taken care that it is as faithful to reality as possible. The quality control of the annotation was based on the verification and checking by an expert in archaeology and heritage. But, it is always subject to human error which is considered minimal after the quality control. This time-consuming operation takes in terms of time on average a duration of 25 seconds for each tessera, caused by challenges and difficulties related to the:  Missing tesserae part (metal cover).
 The shape of deteriorated and altered tesserae.
 Colour of tesserae; due to the materials used and their age, the ancient tesserae are generally characterized by pastel colours with low contrast.

Classes description
A classification for the census of the main types of tesserae present on the mosaic was made based on the expertise of the archaeologist as well as on the sweep of all the mosaic to identify these types of tesserae visually on properties which are mainly: the colour, the type of material between glass, ceramic and stone, the shape and the size.
Firstly, 18 classes of tesserae (Figure 8) were identified based on these criteria. Visually and with a light source, it is easier to distinguish almost all the tesserae mainly because we know each one and its area of existence on the mosaic. But on the point cloud, it is more difficult to distinguish between tesserae that have the same hue (e.g. light-blue tesserae, white, beige and silver ones). Therefore we adopted 10 main classes which include tesserae of similar properties as follows (for each class, we present the tesserae which belong to it). The unclassified class includes all parts that are not yet segmented but will be included as the manual segmentation progresses.
To have a comprehensive view on the distribution of each type of tesserae in the present dataset, we summarize in the following Figure 9 the number of points by each class (this concerns the data segmented up to the moment, so the unclassified class contains by default the rest of the points). Label 5 Label 6 Label 7 Label 8 Label 9 Label 10 Label 0 Figure 9. Complete data set -Label distribution (by thousand) (some code colour as Figure. 1) The numbers in Figure 9 will change and be updated. But due to the time taken by this time-consuming operation and due to the meticulous system for segmentation quality control, the rest of the segmentation of all sections will be published soon and updated on the same data link.

BASELINE RESULTS
To create a simple baseline, we used 3D covariance-based feature. The feature vectors of the training set (80% of the dataset) were fed into the classifiers (RF and GB), and then the trained model of each classifier was obtained. For both RF and GR we used a list of [50, 100, 150, and 200] trees, and we chose finally the 200 trees option and a learning rate of 0.1 for GR, which give the higher F1-score. After that, the test set (20% of the dataset) features were transmitted to the trained models, and the prediction labels of each point were output. Finally, the classification results are evaluated using the performance metrics in Table 2 and qualitative results in unseen data for prediction are presented in Figure 11.

Evaluation metrics and quantitative results
Semantic segmentation performance assessment was carried out by providing fundamental performance metrics: precision, recall (1), F1-score and the Jaccard index (2). The performance metrics are summarized in the table below as well as the time for training ( Table 1). We used a weighted averaging, for each class, we calculate metrics and find their average weighted by support (the number of true instances for each class). This alters "macro" to account for class unbalance.
To emphasize that, in this study, results are for baseline illustration purpose mainly, and better results could be potentially achieved with further tuning. The experiments are conducted on a workstation running Microsoft Windows 10 (×64) with Intel ® Core ™ i7-8700, 6 Cores, 32 GB Random Access Memory (RAM), and we used Scikit-learn (python library) (Pedregosa et al., 2011). With unbalanced classes (which is our case), it is easy to get highperformance metrics without making useful predictions (especially for small class). So, precision or F1-score as an evaluation metric alone is improperly suited if the class labels are uniformly distributed. Thus, in the case of unbalanced classes, we recommend confusion-matrix as a good practice for summarizing the performance of a classification algorithm. In Table 2 each column represents the number predicted to belong to the class, while each row represents the true number that belongs to the class (Pedregosa et al., 2011). And for each intersection, we provide two normalized values (in percentage), the first one for RF and the second for GB.  Variable importance assessment is another interesting aspect to study when applying data mining techniques. Even if the predictors are chosen to be able to better characterize the data without redundancy, we highlighted caveats: someone can have similar and likely collinear information. To see which variable have bigger importance, we compute the corresponding reduction of predictive accuracy when the studied variable is not used on top. In RF, the Gini importance index is defined as the averaged Gini decrease in node impurities over all trees in the forest. As we identified, the most important predictors were RGB channels and Surface density, illustrated in Figure 10. This shows the importance of 3D geometric feature over 2D feature (RGB).

DISCUSSIONS
Benchmarks provided in open access research have always been a major contribution to the scientific community, making it possible to compare models to advance machine learning methods and to discern their limitations to reinforce them. With the dataset presented in this paper, we aim to participate in bridging the gap of the large lack in training data point clouds for complexes and specifics cases. The goal is to reveal the full potential of machine learning models in the semantic segmentation of decorative art and especially mosaics.
Our method has shown good performance in terms of effectiveness and efficiency. But, the classification quality was evaluated based on point-wise metrics, which reveal many limitations in our approach such as inhomogeneous classification results.
In this experiments gradient boosting shows better performance than random forest but takes more time for training. Thus, in the results related to the importance of features, it is clear that geometric features do not have large importance compared to radiometric features. Except, the characteristic related to the density, since a type of tesserae, because of their reflectance related to the component material, are denser than other types of tesserae. This result can be explained by the fact that the mosaic is a flat and continuous surface and does not present big variability in its surface.
To overcome these limitations many directions will be studied in future works of:  Create a tessera-based segmentation: segment-based metrics are more descriptive, as they can provide an overall picture of the segmentation (giving a qualitative ratio of each class), considering the number of objects and not points.  Make available an instance segmentation dataset where each tessera is individually separated, with tessera-based evaluation metrics.  Integrate the results of mosaic segmentation in Augmented and Virtual Reality applications (Kharroubi et al., 2020(Kharroubi et al., , 2019. To allow immersive visualization and interaction with the different types of tesserae by specialists.

CONCLUSION
In this paper, we presented Tesserae3D, a new 3d point clouds dataset for semantic segmentation of mosaic tesserae. Furthermore, we investigated two standard machine learning algorithms. Geometric features of point clouds produced at multi scales and radiometric feature were used for classification. Results show that gradient boosting (with an F1-score= 0.893, Ji=0.785) outperform random forest (with an F1-score= 0.861, Ji=0.760). Dataset and code of used algorithms for classification are available under https://github.com/akharroubi/Tesserae3D.