FRACTAL DIMENSION BASED SUPERVISED LEARNING FOR WOOD AND LEAF CLASSIFICATION FROM TERRESTRIAL LIDAR POINT CLOUDS

Terrestrial Laser scanner has been widely used in the field of forestry. Wood-leaf separation is the fundamental step to most applications of forestry. This paper presented a robust supervised learning method for wood and leaf classification by developing four new feature vectors. Fractal dimension is first calculated to indicate the difference of regularity or roughness between wood and leaf. Zenith angle and variation are presented to distinguish trunks or branches from leaves. The adaptive axis direction of cylinder is adopted to calculate the local point density precisely. Experimental results show that the supervised learning method using the four feature vectors presented in this paper can achieve a good classification performance. Both accuracy and 1 F score are higher than the ones of the method using eigen value based feature vectors. * Corresponding author


INTRODUCTION
Three-dimensional laser scanning (LS) technology is an active remote sensing technology which has been developing rapidly in recent years (Hui et al., 2019). LS system can actively transmit laser pulses to obtain the three-dimensional coordinate information of the target object, making it become an important data source for spatial topological analysis of vegetation (Zhang et al., 2019). With the improvement of LS measurement accuracy and sampling rate, this technology has been widely used in forestry, ecology, botany and other related fields . Compared with airborne or satellite-borne LS, terrestrial LS (TLS) can provide smaller light spots, achieve higher single point measurement accuracy, and also provide denser point clouds. Thus, TLS is widely used in forest structural parameters calculation, above-ground biomass (AGB) estimation, leaf area index (LAI) quantification, etc. (Calders et al., 2015. Calders et al., 2018 The separation of wood and leaf from TLS data is a crucial step for realizing these above-mentioned applications. The traditional methods usually separate wood and leaf manually with the help of visualization software . Obviously, it will be time consuming and labor intensive. Moreover, the classification results mainly depend on the experience of the staff and the quality of the point clouds obtained. To solve these problems, many automatic algorithms for separating wood and leaf have been developed in recent years. These algorithms can be divided into two categories including geometric information based and echo attributes information based . The geometric information based approaches implement the separation mainly relying on the different shape characteristics of wood and the leaf . Xu et al., 2007. Ma et al., 2015. Zhu et al., 2018. For instance, wood is generally linear distribution and shows cylinder shape in local areas, while leaf is more scattered and lacks linear characteristics. The echo attributes based methods mainly depend on the difference in information such as the reflection intensity, reflectance, or waveform to achieve the separation (Beland et al., 2011. Beland et al., 2014. Cote et al., 2011. In recent years, with the fast development of LS technology, several laser scanners can acquire full waveform data over vegetation areas. According to the difference in height and width of the waveform data towards wood and leaf, the accuracy of separation can be improved (Hancock et al., 2017. Danson et al., 2018.
Several existing experiments have shown that the accuracy of separation methods using echo attributes information such as reflection intensity is lower than the accuracy based on the geometric information method. This is because the reflection intensity of the laser pulse is related to various factors such as distance, incident angle, and roughness of the surface of the object. Thus, it is difficult to achieve radiation calibration of the reflection intensity. Although the use of full waveform data information can improve the accuracy of wood and leaf separation, not all laser scanners can acquire full waveform data. Compared with these sensor specific methods, the approaches based on geometric information containing three-dimensional coordinates of point clouds are more applicable. Almost all the existing geometric methods used eigen values and vectors to classify wood and leaf. Since eigen values indicate the magnitude of variance present in coordinates, while eigen vectors reflect the direction of the variance. To further develop geometric features used for wood-leaf classification, this paper try to develop new feature vectors calculated from threedimensional coordinates. Two publicly available datasets are used to test the new developed features. Experimental results show that the proposed method can classify wood and leaf effectively.

Fractal dimension feature vector
In Euclidean geometry, objects are generally seen as regular shape and their corresponding geometric feature can be determined as integer dimensions, such as one, two, three etc. However, in the real-world, there are many complicated and irregular objects (e.g., coastlines or snowflakes). Their geometric morphology cannot be described by integer dimensions. To better describe the complexity and roughness of these objects, fractal theory has become a new branch of modern mathematics. Nowadays, this theory has been widely used in many areas, such as signal analysis and image processing (Yang et al., 2015).
In fractal theory, fractal dimension is an important index to describe the fractal morphology. The values of fractal dimensions will indicate the irregularity and roughness of the complex objects. Fractal dimension can be calculated in different ways. This paper adopts the box counting method since its principle is simple and this method is easy to implement. The box-counting dimension is defined as Equation (1). (1) where  = the side length of the cube   N  = the number of the occupied cubes by point clouds The fractal dimension Dim can be calculated when  approaches 0. However, in terms of point clouds, the side length of cube cannot get infinitely close to 0. Moreover,  is generally discrete. To better describe the box-counting dimension, Equation (1) can be changed to the form as Equation (2).
Obviously, a series of different side lengths will lead to a series of different numbers of occupied cubes. By applying the least square fitting between   log  and   log N , the box-counting dimension Dim can be obtained.
In terms of forest point clouds, wood and leaf own different fractal morphology. To separate wood and leaf effectively, the fractal dimension for every point can be calculated according to the following steps: ⅰOrganize the raw point clouds using the k-dimensional tree.

Zenith angle and variation feature vectors
In general, wood and leaf own different grow features. For instance, trunks tend to grow straight up, while leaves tend to diverge. In other words, zenith angles of trunk points are generally close to 90 , while the zenith angles of other points do not own this characteristics as shown in Figure 2. Moreover, no matter trunks or branches their zenith angle variations are generally smaller than the ones of leaves. Thus, wood and leaf can be separated according to zenith angle and variation feature vectors. As mentioned above, the zenith angle variation of wood is general smaller than that of leaf. The zenith angle variation   std  is defined as Equation (4) From Figures 3 (a) and (b), it can be found that the zenith angle and variation of wood are clearly different from the ones of leaf. Thus, these two features will contribute to the wood-leaf classification results.

Points distribution feature vector
Compared with leaf points, wood points are usually cylindrical in distribution. Thus, woods can be separated from leaves by counting the number of points within the cylinder. Note that the axis direction of the cylinder should be adaptive to local point clouds as shown in Figures 4 and 5. The number of points within the cylinder can be seen as local density of point clouds which can be calculated according to Equations (5) and (6).  In this paper, the direction of the cylinder is set as the vertical direction of the normal vector calculated by applying the principal component analysis (PCA) method to the neighboring points. In so doing, the calculated local point density can represent the geometric morphology accurately.

EXPERIMENTAL RESULTS AND ANALYSIS
To evaluate the performance of these four new developed vectors, this paper adopts two isolated tree point clouds for training and testing, respectively. These two datasets are provided by Moorthy et al. (2019). Both of the two datasets are manually labelled using CloudCompare by an experienced person. The two datasets contain x, y, z coordinates and label information, in which 0 represents a leaf point and 1 represents a wood point as shown in Figure 6. Tree 1 (Figure 6 (a)) is used for training since its structure is more complicated, while Tree 2 ( Figure 6 (b) is used for testing. This paper down sampled the two datasets to improve the computational and memory efficiency. Random forest is adopted for the supervised learning since its performance is better than other machine learning methods, such as neural networks, naive bayes, etc. ( Moorthy et al., 2019). Accuracy and 1 F score are used to assess the performance of the proposed method. Accuracy is the proportion of all predictions that are correctly classified. 1 F score is the harmonic mean of both precision and recall. Accuracy and 1 F score can be calculated according to In the confusion matrix, A and D are correctly classified points, while B and C are wrongly classified points. Obviously, when leaf is defined as the positive class, 1 F score for leaf class can be obtained. When wood is defined as the positive class, 1 F score for wood class can be obtained.
To objectively evaluate the performance of the proposed method, this paper also acquired the wood-leaf classification results using the feature vectors based on eigen values. These eigen value based vectors are planarity, linearity, scatter, surface variation and entropy. The classification results of the two methods are shown in Figures 7 (a) and (b). From the two figures, it can be found that the proposed method owns a better classification result than that of the method using eigen value based feature vectors. What's more, the proposed method can detect the trunk more accurate.  Table 2 shows the accuracy and 1 F score calculated using the two methods. It is obvious that all the three indexes of the proposed method are higher than the ones of the method using eigen value based vectors. Thus, it can be concluded that the new developed vectors in this paper can separate wood and leaf effectively. From Table 2, we can also find that 1 F score for wood of the two methods is much lower than 1 F score for leaf. It is because that many small branches are wrongly classified as leaves as shown in Figure 7. The classification results can be better when more training datasets are involved.

CONCLUSION
Wood-leaf separation is a crucial step for the applications of TLS in the field of forestry. In this paper, four new geometric feature vectors calculated from three-dimensional coordinates are presented. Considering the difference of regularity or roughness between wood and leaf, fractal dimension is developed for the classification. Zenith angle and variation are presented to distinguish trunks or branches from leaves. The adaptive axis direction of cylinder is adopted to calculate the local point density precisely. Experimental results show that these four new developed feature vectors outperformed the five feature vectors based on eigen values. The proposed method achieved a higher performance in terms of both accuracy and 1 F score. However, the 1 F score for wood is much lower than that for leaf. It is because that many branch points are wrongly classified as leaf points. How to improve the wood classification accuracy will be focused in the future research.