EXPLORING LABEL INITIALIZATION FOR WEAKLY SUPERVISED ALS POINT CLOUD SEMANTIC SEGMENTATION

: Although a number of emerging point-cloud semantic segmentation methods achieve state-of-the-art results, acquiring fully in-terpreted training data is a time-consuming and labor-intensive task. To reduce the burden of data annotation in training, semi-and weakly supervised methods are proposed to address the situation of limited supervisory sources, achieving competitive results compared to full supervision schemes. However, given a fixed budget, the effective annotation of a few points is typically ignored, which is referred to as weak-label initialization in this study. In practice, random selection is typically adopted by default. Because weakly supervised methods largely rely on semantic information supplied by initial weak labels, this studies explores the influence of different weak-label initialization strategies. In addition to random initialization, we propose a feature-constrained framework to guide the selection of initial weak labels. A feature space of point clouds is first constructed by feature extraction and embedding. Then, we develop a density-biased strategy to annotate points by locating highly dense clustered regions, as significant information distinguishing semantic classes is often concentrated in such areas. Our method outperforms random initialization on ISPRS Vaihingen 3D data when only using sparse weak labels, achieving an overall accuracy of 78.06% using 1‰ of labels. However, only a minor increase is observed on the LASDU dataset. Additionally, the results show that initialization with category-wise uniformly distributed weak labels is more effective when incorporated using a weakly supervised method.


INTRODUCTION
Airborne laser scanning (ALS) data depict the 3D structures of large-scale out-door scenes, which are used in a variety of remote sensing applications. To comprehensively interpret ALS data, an indispensable solution is to acquire category information, which is referred to as semantic segmentation or classification. Recently, an increasing number of deep learning methods have been developed for point-cloud semantic segmentation tasks, achieving state-of-the-art results (Thomas et al., 2019;Hu et al., 2020). However, most of these rely on a large number of precise annotations, which are typically associated with heavy workloads. In point-cloud labeling, the occlusions caused by the scan pattern of ALS systems and discrete data structure of point clouds in 3D space further increase the difficulty of visual interpretation. Additionally, with advances in light detection and ranging (LiDAR) technology, massive point clouds can be easily acquired from diverse platforms. Thus, finely annotating newly acquired point-cloud data is impractical.
To reduce labeling and alleviate data hungry issues, an intuitive solution is to annotate a small part of the entire data. Semiand weakly supervised methods are proposed to address situations in which the number of labels is low. By exploiting extra information other than the original labels, these methods achieve competitive results compared with full supervision schemes. Several studies have recently emerged in point-cloud semantic segmentation (Xu and Lee, 2020;Hu et al., 2021). An important finding from these studies is that a huge redundancy exists in annotation information of fully labeled data. For example, only a minor accuracy degradation is observed by * Corresponding author. comparing classification results using 10% of sparsely distributed labels with that using full labels. This means annotating every point is unnecessary during labeling work, also indicating the importance of weakly supervised learning. Although these methods achieve competitive results using limited labels, the issue of weak-label initialization is often ignored. In some studies, active learning was applied to guide the selection of weak labels (Polewski et al., 2016;Lin et al., 2020). However, the interaction process adds to the workload. To choose weak labels directly before classification, a commonly used strategy is to randomly select a fixed number/ratio of data to assign labels. However, the randomness of weak labels leads to unstable classification results. Additionally, because of the extremely imbalanced category distribution of ALS data, sampled points may be unclassified under a fairly sparse weak label configuration. Another common strategy is to evenly sample points for each class, which ensures sufficient samples across categories. Both strategies initialize weak labels using a random generator, which lacks guidance in selecting representative labels.
This study explores an effective weak-label initialization strategy. The feature space of the point cloud was constructed to guide weak-label selection. Because supervisory sources are absent in unlabeled points, handcrafted features are extracted to form a point-cloud description. Subsequently, a manifold learning approach is incorporated to eliminate feature correlation and construct a more effective feature space. A densitybiased selection strategy is proposed to sample more points in the highly dense region in the feature space, where data are often considered to contain useful information. We use a clustering algorithm in advance and sample points in each cluster to avoid overly concentrated samples. After acquiring weak la-bels, classifiers were adopted to evaluate the effectiveness of the different weak-label initialization strategies. Moreover, we conducted experiments using a weakly supervised method for analysis.

Point cloud feature description
The feature description depicts discriminative information for each point. Local descriptors, such as FPFH (Rusu et al., 2009) and SHOT (Tombari et al., 2010) were successfully used in applications, such as registration and object classification. However, the results are unsatisfactory when adopting local descriptors in semantic segmentation tasks, as contextual information is essential for achieving accurate results. By contrast, geometric features, which represent 3D surface characteristics of each point, are popular in point cloud classification. A comprehensive study (Weinmann et al., 2015) was proposed to summarize a set of point-cloud geometric features. Selfsupervised learning (SSL) has recently become a popular research topic in computer vision. By creating self-supervisory sources, such as data augmentation (Huang et al., 2021) and data completion (Wang et al., 2021), discriminative information is extracted for each sample using the trained model. Nonetheless, this remains a challenge when incorporating SSL into complex ALS data. Thus, this study uses handcrafted features.
Feature embedding is often proposed for effectively combining extracted features. It corresponds to dimension reduction, which projects the original features onto a lower-dimensional space. A classical method is principal component analysis (PCA), which produces new uncorrelated variables that successively maximize the variance. However, the fine structure cannot be well preserved using this linear transformation. In contrast, manifold learning can preserve the local structure after feature projection. t-SNE (van der Maaten and Hinton, 2008) and uniform manifold approximation and projection (UMAP) (McInnes et al., 2018) are two representative manifold learning methods. In point-cloud processing,  classified ALS data using multi-scale local feature extraction with manifold embedding.

Weakly supervised learning
Weakly supervised point-cloud semantic segmentation has gradually attracted the attention of researchers. Under limited labels, studies have exploited potential information in unlabeled data. Xu and Lee (2020) proposed several strategies, including a siamese, inexact supervision, and smooth branch. An approximate result of fully supervised learning was obtained using 10% of the labels. However, the weak labels used were spatial aggregations of the downsampled full-scene labels, signifying a high labeling workload. Liu et al. (2021) constructed a supervoxel of point clouds to create pseudo-labels and applied an iterative training mode to improve the classification results under weak supervision. In Hu et al. (2021), a semantic query network was proposed to share sparse weak-label information in the spatial domain by interpolating features from neighboring points. Wang and Yao (2021b) proposed a pseudo-label-assisted approach for point-cloud semantic segmentation using limited annotations. This was enhanced in Wang and Yao (2021a), in which a plug-and-play weakly supervised framework was introduced, comprising entropy regularization, an ensemble prediction constraint, and online pseudo-labeling. Because of the flexibility and competitive results achieved using only 1‰ of labels, it was used in this study to explore the effectiveness of weak-label initialization strategies.

METHODOLOGY
This study explores the effective weak labels that represent the most discriminative information. Given that the input points P ∈ R N ×D comprise N points with D dimensional features, the M(M≪N) points are assigned labels. Several weak-label initialization strategies were compared in this study. Random selection was the most practical method. Points were selectable either directly from the entire dataset or evenly from each category. A density-biased selection based on the feature space is proposed to provide effective information for weak-label initialization. The handcrafted features of a single point are extracted and projected onto the feature space using manifold learning. The points were then divided into clusters, and weak labels were sampled from each cluster. Subsequently, the weakly supervised method was integrated to evaluate the classification results using different weak-label initialization strategies.

Random initialization of weak labels
An intuitive strategy for initializing weak labels is to randomly select a certain number of points for annotation. In this study, we analyze two random initialization strategies, directly select points from the entire data set, and assign labels to the same number of points for each category. From a practical perspective, the former strategy is more effective because certain points to be labeled can be determined in advance using a random function. In contrast, to maintain an even category distribution, the operators must select points during the labeling process. However, the category-based strategy ensures an adequate number of points for each class under a fixed abundance, which avoids producing imbalanced weak labels. In the case of an equal total number of labeled points, the workload of the two strategies is almost the same.

Feature constrained weak-label initialization
Uncertainty in the random initialization strategy leads to diverse weak-label sets, which produce unrobust classification results. Thus, developing a targeted approach is essential for selecting more representative weak labels to achieve a higher accuracy using the same number of labels. Fig. 1 illustrates the framework of the proposed strategy. This study proposes unsupervised feature embedding to explore the relationships between points, guiding label initialization. Several handcrafted features are used to extract point-cloud discriminative information. We then incorporated manifold learning to analyze the correlations between the features and construct a point-cloud feature space. Because significant information often corresponds to highly dense regions in this space, a density-biased selection strategy is proposed for the selection of initial weak labels.
3.2.1 Unsupervised feature extraction Feature description plays a crucial role in point-cloud processing by supplying discriminative information for a variety of applications. Coordinate information is commonly used to explore point-cloud characteristics, and a set of geometric features is developed to construct point-cloud descriptors. Additionally, inherent physical attributes are directly combined with the geometric features.
This study focused on extracting single-point features, and a local neighborhood region was constructed for each point. Figure 1. Framework of the feature-constrained weak-label initialization. Handcrafted features were developed to extract point-wise discriminative information. Subsequently, a manifold learning method was proposed to combine extracted features by projecting onto a more informative feature space. To ensure that the entire feature space was cross-sampled, the data were divided into clusters based on which weak labels are acquired using a density-biased selection strategy.
Eigenvalue-based features were used to represent the surface structure of a local region. Given a local neighbor set P l ∈ R k×3 of point pi, the corresponding derived normalized eigenvalues e λ with λ ∈ {1, 2, 3} are calculated from the 3D covariance matrix of coordinates {X, Y, Z}. Subsequently, several 3D properties were acquired from the formulas of ei. We used the following properties: omnivariance Oi, eigenentropy Ei, anisotropy Ai, linearity Li, planarity Pi, scattering Si, sum of eigenvalues Σi, and local curvature LCi.
Height-based features are essential in ALS point-cloud feature description. For each point pi, a local cylinder area is constructed to explore the height features. In this area, the height differences between pi and the highest point, pi and the lowest point, and the highest point and lowest point is extracted and denoted as Zmax−i, Zi−min, and Zmax−min, respectively. Multiscale neighborhoods are used to supply sufficient height information in complex outdoor scenes.
Apart from the two types of geometric features, the density Di and verticality Vi are combined. Density refers to the number of points within a local area, whereas the verticality relates to the normal vector of the z − axis, denoted as Nz. Additionally, the inherent physical attributes, such as reflectance Ri and color Ci, of the point cloud are integrated with the above features.
proximations and local fuzzy simplicial set representations were used to construct topological representations of highdimensional data in the UMAP. Given a set of data x ∈ R N ×d with d dimensional features, UMAP projects x onto a new feature space y ∈ R N ×d ′ (d ′ <d) , which can maintain the local structure and, arguably, preserve more of the global structure. UMAP first builds a local fuzzy simplicial set f s − set[x] from the original feature sets x, which is based on k-nearest neighbors. Subsequently, it constructs the relevant weighted graph A and corresponding degree matrix D, and a spectral embedding y is initialized from the sorted eigenvectors of the Laplacian matrix L of A and D, denoted as ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume V-2-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France UMAP optimizes y with respect to f s − set[x] cross entropy using stochastic gradient descent, and the optimized embedding y is the projected results. The target dimension was set to 10.

Density-biased selection
We used information from the constructed feature space to initialize weak labels. In the feature space, points in the highly dense region are near the cluster center, which represents the most discriminative information. Thus, we sampled more points in this area to maintain the original density distribution. Specifically, the density is calculated using the number of neighbors in the radius search, and a weighted random selection is proposed. Note that directly choosing M highest-density points is ineffective, as those points are close, which means they represent similar information. Owing to the imbalanced label distribution in ALS data, a considerable discrepancy exists between the density values in different high-density areas. Thus, a clustering method was proposed, in which the points are proportionally sampled from each cluster. The Gaussian mixture model (GMM) (Reynolds, 2009) was integrated to divide the points into clusters. GMM uses the mixture of Gaussians: where λ is the set of parameters {α, p}, and K i αi = 1 and each gi is a Gaussian density function parameterized by αi. Then, given a set of data x, the objective is to find λ such that p(x|λ) is a maximum. The number of clusters was empirically set to be the same as that of dataset category. Subsequently, for each cluster C, weighted random sampling was proposed, and the weight of each point was its density in the feature space. We sampled M · (|C| /N ) points to assign weak labels for each C. Thus, the selected points contained useful information from each cluster, and the subset maintained a similar density distribution to that of the full set.

Weakly supervised learning
After acquiring weak labels, supervised classifiers evaluated the performance. We first analyzed the classification results using only weak labels. A classical machine learning method, random forest (RF) (Breiman, 2001), and a deep-learning network, KPConv (Thomas et al., 2019), were used as classifiers. For the RF, only labeled data were used for training. For KPConv, the entire data was fed to the training model, but the loss was is simply calculated from the labeled points. Owing to the imbalanced weak-label distribution, weighted cross-entropy was used to calculate the loss, which is defined as where m is the number of labeled points in a training step. p c i is the predicted probability of point pi and y c i represents the cth value of the one-hot vector of the label li. wi relates to the category of pi. Given li = c, wi is calculated as wi = 1 Nc/M +0.02 . Nc is the number of category c in weak labels.
We conducted experiments further using a weakly supervised method. This study adopted our previous study (Wang and Yao, 2021a), which integrates deep-learning networks. Fig. 2 illustrates the workflow. Under weak supervision, potential information in the unlabeled data was exploited to enhance the classification result. Entropy regularization (ER) was adopted to minimize class overlap and generate predictions with high confidence. Moreover, an ensemble prediction constraint (EPC) was developed to enhance the robustness of the trained model by comparing the prediction at the current training step to the ensemble value. In addition, the proposed method developed an online soft pseudo-labeling (OSPL) strategy to further improve the performance of the model. KPConv was the backbone network in our weakly supervised method.

ISPRS dataset
The dataset contains ALS data, and aerial images were obtained from the Stuttgart region of Germany. The dataset was divided into two parts: training and testing. The dataset included 9 categories. As multiple scan data were contained in the dataset, we set the subsampling grid size to d = 0.4 m to remove redundant points in overlapping ALS strips and maintain an even point density. The classification result of the deleted points is based on the nearest neighbor point during testing. The format of the used features is {X, Y, Z, Intensity, IR, R, G}.

LASDU dataset
The study area is a valley along the Heihe River in northwest China, which covers an urban area of approximately 1 km². The area was divided into four connected sections, two of which were used as the training set, and the remaining two as the test set. Five categories were predefined in the dataset. Considering the even distribution and relatively low density of point clouds in the LASDU dataset, raw data were directly used for training and testing. The format of the used features is {X, Y, Z, Intensity}.

Implementation
In feature extraction, a spherical neighborhood was used to calculate eigenvalue-based features, and the radius was set to 2 m. For height-based features, we used three-scale cylindrical neighborhoods to explore height variation, and radii of 2, 4, and 6 m.
For the experiments using RF the classifier, we used 500 decision trees to acquire robust results using limited weak labels.
For the experiments using KPConv, we sliced blocks with a radius of 20 m. The model was implemented in the PyTorch framework and trained on a GeForce GTX 1080Ti 11 GB GPU.

Evaluation metrics
The overall accuracy (OA) and F1 score were used to evaluate the performance of our method. OA is the percentage of predictions correctly classified, and the F1 score is the harmonic mean of the precision and recall, presented as: where tp, f p, and f n are the true positives, false positives, and false negatives, respectively.

Results of ISPRS dataset
We analyzed the classification result using 1‰ of labels, corresponding to 330 points. Fig. 3 illustrates the category distribution of the weak-label initialization strategies. Compared to full random selection (FRS), our density-biased method (DBS) produces imbalanced weak labels. In contrast, class random selection (CRS) was used to produce samples with a uniform distribution of categories. Classification tasks were then conducted using produced weak labels. We first evaluated the effectiveness using only weak labels for training. A weakly supervised method was then incorporated for further comparison. Table 2 presents the comparison of the classification results. DBS achieved highest OA: 72.21%. Additionally, compared to FRS, DBS performed better in OA and Avg. F1. Although CRS achieved the highest Avg. F1 score, its OA value was considerably lower than that of the other two. This is because the number of weak labels of the dominant categories in CRS is much smaller, affecting the accuracy of these classes. Typically, the accuracy of dominant categories has a significant impact on the OA.  achieved a 3% increase. In contrast, the CRS is far less effective. Comparing the two classifiers, that the DBS performs better when using only weak labels. Moreover, given the same abundance, samples with uniform distribution of categories achieve unsatisfactory results.

Weakly supervised classification result
The objective of the weak-label initialization strategy is to provide more effective information and boost the performance of weakly supervised methods under the same abundance. Thus, evaluating the classification results is important when incorporating these methods. Table 4 presents the comparison. Using weakly supervised learning, the OA increased to a similar number for all strategies. However, a significant difference was present in the Avg. F1 score. While the CRS achieved a competitive result, 68%, degradation was observed in the other two strategies. Power lines, cars, and fences were severely misclassified into three categories. From Fig. 3, the number of weak labels in these categories is extremely small. As weakly supervised methods largely rely on initial weak labels, limited supervisory sources can cause confirmation bias, which can hinder performance. Fig. 4 presents the classification maps. For dominant categories, such as impervious surfaces and roofs, all strategies performed well. However, some categories were completely misclassified in the DBS and FRS. In (a), no point is inferred as a power line, while almost all points classified as car in (b) were misclassified as low vegetation. In contrast, a satisfactory result was achieved for all categories in (c).

Results of LASDU dataset
The classification results using 1‰ of labels were evaluated, corresponding to 1694 points. Similar to the experimental evaluation of the ISPRS dataset, we compared the performance using FRS, DBS, and CRS. show that in unbalanced ALS data, evenly distributed weak labels lead to considerable performance degradation on OA when using only weak labels. Table 6 lists the classification results for KPConv. Compared with the FRS, improvement was not evident in the DBS. As the number of weak labels increases in the LASDU dataset, the uncertainty caused by random selection decreases, leading to the ineffectiveness of our density-biased strategy. CRS underperforms in both evaluation metrics. Table 7 presents the comparison. From the table, the results are to similar to those from the ISPRS dataset. Although the three   strategies achieved nearly identical results on OA, CRS significantly surpasses the other two on Avg. F1. In addition, performance degradation was observed for the Avg. F1 of both DBS and FRS. Fig. 5 presents the classification maps. The main classification gap between CRS and the other two lies in the resulting artifacts, and a large number of points that belong to this category are misclassified into other classes.

CONCLUSION
This study explores a weak-label initialization strategy for the semantic segmentation of ALS data. Apart from applying random selection on the whole set or evenly selecting for each category, a density-biased strategy was proposed that increases the probability of being selected in the highly dense region of the feature space. Although our method achieves better results using only limited weak labels, evenly distributed weak labels demonstrated more applicability when incorporated with a weakly supervised method. Thus, the feature information of point clouds will be further explored in future works to improve weak label initialization and achieve robust and accurate classification results.