MULTI-SCALE BASED EXTRACION OF VEGETATION FROM TERRESTRIAL LiDAR DATA FOR ASSESSING LOCAL LANDSCAPE

In this study, we propose a method to accurately extract vegetation from terrestrial three-dimensional (3D) point clouds for estimating landscape index in urban areas. Extraction of vegetation in urban areas is challenging because the light returned by vegetation does not show as clear patterns as man-made objects and because urban areas may have various objects to discriminate vegetation from. The proposed method takes a multi-scale voxel approach to effectively extract different types of vegetation in complex urban areas. With two different voxel sizes, a process is repeated that calculates the eigenvalues of the planar surface using a set of points, classifies voxels using the approximate curvature of the voxel of interest derived from the eigenvalues, and examines the connectivity of the valid voxels. We applied the proposed method to two data sets measured in a residential area in Kyoto, Japan. The validation results were acceptable, with F-measures of approximately 95% and 92%. It was also demonstrated that several types of vegetation were successfully extracted by the proposed method whereas the occluded vegetation were omitted. We conclude that the proposed method is suitable for extracting vegetation in urban areas from terrestrial light detection and ranging (LiDAR) data. In future, the proposed method will be applied to mobile LiDAR data and the performance of the method against lower density of point clouds will be examined.


INTRODUCTION
Light detection and ranging (LiDAR) measures laser light reflected from the surface of objects, and the discrete LiDAR data are used to model three-dimensional (3D) surfaces of the objects and derive the attributes.One of the most popular applications of airborne LiDAR data has been building modelling.For examples of detailed modelling, Sampath and Shan (2010) proposed a method to reconstruct polyhedral building roofs.The method selects neighbourhood via Voronoi meshing, and estimates surface normal vectors.The surface normals are clustered with the fuzzy k-means method.The method proposed by Kim and Shan (2011) segments points by minimizing an energy function formulated as multiphase level set.To improve modelling accuracy, the fusion with other data has been examined, such as aerial imagery (Susaki, 2013), satellite imagery (Awrangjeb et al., 2013) and terrestrial LiDAR (Caceres and Slatton, 2007).
Another popular application of airborne LiDAR data is to model vegetation and estimate the height and canopy volume of the vegetation.Vegetation returns the light in various ways, on the surface, in the middle and from the bottom whereas buildings return the light mainly on the surface.This unique feature is a challenge in applying the LiDAR data to the vegetation.It is well known that the first and last pulses of the light reflected from vegetation correspond to the top (canopies) and bottom (ground) of the vegetation.Heights of vegetation surface and ground are estimated by using the first and last pulse data, respectively.Thus, we can derive the heights of vegetation by subtracting ground height from vegetation surface height.Over the last decade, full-waveform airborne LiDAR has been examined (Rutzinger et al., 2008;Elseberg et al., 2011).It can provide more detailed pattern of reflected light, and has potential to estimate the structure of the forests.
Extraction of vegetation in urban areas has another important aspect of applications of LiDAR data.It can contribute to rapid and low-cost assessment of local landscape (Carlberg et al., 2009).Susaki and Komiya (2014) proposed a method to estimate green space ratio (GSR) in urban areas from airborne LiDAR and aerial images for quantitatively assessing local landscape.Because of occlusion, more accurate extraction of vegetation can be achieved by using terrestrial LiDAR data, as is the case of building modelling.In addition to terrestrial LiDAR, mobile (or vehicle-based) LiDAR has been examined for the purpose because mobile LiDAR is capable of measuring the data in a large area rapidly (Lin et al., 2014).
In a complex scene of urban areas, an automatic extraction of vegetation requires classifying man-made and natural objects.This is another challenge in applying the LiDAR data to urban vegetation.One of the most promising approaches is to process the LiDAR data on multi-scales (Unnikrishnan and Hebert, 2008;Lim and Suter, 2009;Xu et al., 2014).For example, Brodu and Lague (2012) presented a method to monitor the local cloud geometry behaviour across several scales by changing the diameter of a sphere for representing local features.While such approaches may be effective to extract vegetation, it is not guaranteed that they are effective to the data measured in urban areas that include different types of vegetation and buildings.Our final goal is to develop a method to effectively estimate a local landscape index reflecting vegetation volume from point clouds.In this research, we examine and propose a multi-scale based method to extract vegetation in complex urban areas from terrestrial LiDAR data.
The rest of this paper is structured as follows.Features of the employed data and the study area are described in Section 2. Section 3 describes the proposed method, experimental and validation results.The implications of these results and the validity of the algorithm are then discussed in Section 4. Conclusions are given in Section 5.

Study Area and Data Collection
We selected as the test site a residential area in Nishikyo-ward, Kyoto, Japan with various kinds of vegetation.We measured the data by using a RIEGL VZ-400 laser scanner in May, 2014.Four scan positions, shown in Figure 1, were arranged along the road.Each scan ranged from 30° to 130° in the vertical axis and from 0° to 360° in the horizontal axis with an interval of 0.04°.Under this condition, a scanner measured a point every 7 mm on the surface 10 m away from the scanner.

Co-registration
Co-registration was conducted by using commercial software RiSCAN PRO developed by RIEGL.First, point clouds belonging to pedestrians and cars were manually removed.Then, a reference scanner was selected and the reference coordinate system was defined.Finally, coordinate systems of the other scanners were converted into the reference coordinate system with at least six corresponding points.The standard deviation of the errors was 2.8 mm.

Resampling
Our final goal is to estimate a vegetation-based landscape index over a large area, and for that purpose, the proposed method will be applied to mobile LiDAR data in future.The measured data were reduced to save computation time and to examine the performance of the proposed method against as low density of data as mobile LiDAR data.The point clouds were mapped into voxels, and then points in a voxel were represented by the centroid of the points.We set the voxel size to 2 cm referring average of primary nearest point distance with mobile LiDAR (Lin et al., 2014).As shown in Figure 1, areas of approximately 20 m × 15 m were selected as Data 1 and Data 2, and the numbers of points were 1,453,130 and 1,563,901, respectively.In addition, to examine the availability of the proposed method, we acquired sparser point data sets, Data 3 and Data 4, by setting another voxel size as 5cm.The numbers of points were 411,269 and 527,969, respectively.

Outline
Figure 2 shows the flowchart for the proposed method.It takes voxel-based approach.With two different voxel sizes, the classification is conducted.The voxel size depends on the length of leaves and types of vegetation.In this research, we set them to 10 cm and 20cm.A process is repeated that calculates the eigenvalues of the planar surface using a set of points, classifies voxels using the approximate curvature of the voxel of interest derived from the eigenvalues, and examines the connectivity of the valid voxels.The vegetation missed at the 1st screening will be examined for the 2nd screening with a larger voxel size.The approximate curvature of points is estimated by fitting planar surface and calculating principal component analysis (PCA).The proposed method uses both local and contextual features.The former one is calculated by the approximate curvature and the latter one is obtained by examining the connectivity of the valid voxels.By repeating the process twice, the accuracy of extracting vegetation can be improved.

Principal Component Analysis
PCA is a statistical approach to represent observed data with linearly uncorrelated variables called principal components.
Each variance value according to the principal component corresponds to the eigenvalue.By applying PCA to 3D point clouds, the distribution characteristic can be analysed.The distribution characteristic of point clouds is captured by eigenvalues derived from the covariance matrix computed from neighbouring points (Mark et al., 2002, Vandapel et al., 2004).A distribution characteristic is classified into three categories, 1D, 2D and 3D, using the proportions of each eigenvalue to the sum of them (Brodu and Lague, 2012).
In this study, we use the result of PCA as a feature of the whole point clouds used to compute it.Three eigenvalues,   (  = 1 … 3, λ 1 ≥ λ 2 ≥ λ 3 ), are derived from 3D points set of N, p i (x i , y i , z i ) (i = 1… N), using PCA.
The ratios of each eigenvalue to the sum of all are defined as the following Equation (1).
Figure 3 shows the relation between c i and the distribution characteristic.The arrows correspond to the principal components and the length of them describes the magnitude of eigenvalues.The graph shows where 1D, 2D and 3D point clouds can appear on the  2 - 3 plane.In the case that only the largest eigenvalue, λ 1 accounts for the total variance, the points are distributed only along one principle component, which means they have a 1D distribution characteristic.In this situation,  1 approaches to 1, and the others approach to 0. In the case that the points are distributed on a plane surface, two of eigenvalues account for the total variance.As the result, only  3 approaches to 0. In the same manner,  1 ,  2 and  3 have similar magnitude when the points are homogeneously distributed around 3D space.We assume that the combination of c i represents approximate curvature of the surface applied to the points of interest, and hereafter we use it to capture both local feature and contextual features.

Voxel-based Analysis: Classification Using Approximate Curvature
After voxels store points, the distribution characteristic is computed, using points in each voxel.Then, according to the characteristic, each voxel is labelled.In this study, the test site is located in urban area, and thus vegetation must be distinguished from artificial structures such as buildings or walls, which points of them have a 2D distribution characteristic.
On the other hand, vegetation points have more scattered distribution than them.Therefore, vegetation with a 3D distribution characteristic and artificial structures with a 2D distribution characteristic are initially classified with slope a, the ratio of  3 to  2 , expressed by Equation ( 2). (2) In the procedure (i) shown in Figure 2, all voxels are classified into three groups, G 1 , G 2 and G 3 , by the value of a as described in Figure 4.The thresholds are represented by a 1 and a 2 .
Because G 1 is closer to 3D than 2D, the voxels belonging to G 1 has a high probability to be vegetation.Voxels classified as G 3 have a 2D distribution characteristic, that is, most walls and roofs belong to this group.Voxels in G 2 are difficult to be classified into vegetation or non-vegetation only with the proportion value a.In urban areas, much vegetation is trimmed and the surface is smoother than natural vegetation.As a result, the vegetation is classified into G 2 .Many windows and boundaries of two planes, such as ridges and edges of roofs, are classified as G 2 .In the following process, the voxels in this group are re-classified into G 1 or G 3 by connectivity of the voxels and the detail is explained in Subsection 3.4.
In the procedure (ii), the extracting operation in the procedure (i) is repeated with different thresholds.In this step, 20 cm voxels are used to extract vegetation with sparse points.
In this experiment, the thresholds related to a were set via the examination of samples from Data 1 as follows: a 1 = 0.02, a 2 = 0.1, a 3 = 0.06, a 4 = 0.6.

Cluster-based analysis: Classification by connectivity
In the voxel-based analysis, only local features are used to classify voxels.As the result, many misclassifications occur on window frames and edges.We implement the cluster-based analysis to examine the contextual information, i.e. connectivity of valid voxels.In the case that neighbouring voxels have the same label given in the voxel-based analysis, they are regarded as one cluster.A single voxel can compose one cluster, having no surrounding voxels with the same label.
The trimmed vegetation is difficult to be classified with a local feature.The trimmed vegetation occupies a great part of urban vegetation, and so it cannot be neglected to estimate a vegetation-based landscape index.Because of its flatter surface, the trimmed vegetation cannot be extracted with natural vegetation only by using a local feature.Therefore, the trimmed vegetation is extracted with a combination of a distribution characteristic and connectivity with surrounding voxels.Figure 5 shows the process illustrated in Figure 2(a).After clustering, a cluster is re-classified into G 1 or G 3 by using the labels of the surrounding voxels.Classification accuracy is improved by using lager scale than a voxel.In Figure 5, three red voxels    =    1 +  2 +  3 ,  = 1 … 3 .
(1) represent a target cluster.The surrounding voxels are the thirteen voxels connected with the cluster.Blue and green voxels correspond to G 1 and G 3 , respectively, and a transparent one means no data.The cluster is re-classified with Equation (3).

𝑟 =
1   1 +   3 . (3) Here, N G1 corresponds to a number of G 1 voxels and N G3 corresponds to one of G 3 voxels.Vegetation_ratio, r, is defined as the proportion of N G1 to sum of N G1 and N G3 .In the case that r equals to threshold r 1 or more, the cluster is classified as G 1. In the case that r is less than r 1 , the cluster is classified as G 3 .Therefore, the cluster has been classified as G 1 in the right-hand side.In this experiment, r 1 was set to 0.5 via the examination of samples from Data 1.
When 3D point clouds are classified by using a local feature computed with PCA, many misclassifications occur on boundaries of several objects (Vandapel et al., 2004).In urban areas, the local feature is not effective to distinguish vegetation from window frames and edges of roofs.Therefore, clusterbased analysis, shown in figure 2(b), is used to classify them.First, a number of voxels belonging to a cluster is counted.In the case that the number is greater than a threshold, the cluster is classified as vegetation.In the case that the number is less than another threshold, the cluster is considered as noise.
Vegetation cluster must have many voxels because vegetation voxels are extracted in the former process.For the same reason, most noise clusters are eliminated by numbers of voxels.Then, a large noise cluster is distinguished by using PCA with points in a cluster, as shown in Figure 6.Most noise clusters are located in windows and edges and thus have 1D and 2D characteristics.When c 1 is larger than a threshold b 1 , the cluster can be considered as an edge.When c 3 is smaller than another threshold b 3 , the cluster can be considered as a window or an edge.The thresholds were set via the examination of samples from Data 1 to 0.6 and 0.05, respectively.
The result of the process (a) in Figure 2 is shown in Figure 7.A combination of a local feature and a contextual feature improved the performance of extracting vegetation.

Multi-scale Classification
Multi-scale concept is important to improve classification accuracy of 3D point clouds.This concept is used to decide neighbouring points or to capture feature changes in some studies (Unnikrishnan andHebert, 2008, Brodu andLague, 2012).In the proposed method, we take multi-scale classification with two sizes of voxels, 10 cm and 20 cm, to extract vegetation.When only one size is used, a small voxel cannot store enough points to analyse a distribution feature at a low density area, or large voxels cannot capture an object shape.
The 10 cm voxel is suitable not only to capture an object shape but also to avoid a mixture of two objects.However, vegetation   with low point density is difficult to be analysed with this size.
The vegetation can be extracted to some extent with lager voxels.20 cm voxels are used to re-extract such remaining vegetation after the first process.
Figures 8(a) and (b) show the result with 10 cm voxels and the result with 20 cm voxels, respectively.As shown in the white circle, the shrub has not been fully extracted with 10 cm voxels, but it can be extracted with 20 cm voxels.Because of low point density, the number of the extracted voxels is not enough to describe the shape of the shrub with 10 cm voxels.The shrub voxels are divided into small clusters and thus loses the 3D feature.By contrast, the whole shrub can be described as one cluster by using 20 cm voxels and thus has been fully extracted.

Accuracy Assessment
The extraction performance was assessed with reference data that were manually obtained.The point-based assessment was conducted.The results of vegetation extracted from Data 1 and 2 are shown in Figures 9(a Here TP, FP, and FN denote true positive, false positive, and false negative, respectively.The F-measure results of Data 1 and 2 are shown in Table 1.

DISCUSSION
Although some misclassifications occurred, the proposed method performed well.Compared to the result of Data 1 (Fmeasure of 94.6%), the result of Data 2 is less accurate (Fmeasure of 91.8%).The thresholds used in the proposed method were determined by referring to the samples taken from Data 1.While the precision of the result of Data 2 was not as good as that of Data 1, it is still acceptable.In addition, the proposed method successfully extracted various kinds of vegetation.
Figure 10 shows the vegetation extracted by the proposed method: the canopy of the road tree (Figure 10 In addition, same as the case of Data 1, the false negatives mainly correspond to occluded vegetation.In case of vegetation occluded by the fences or other non-vegetation, it does not have high point density enough to be extracted even by 20 cm voxels.Some misclassified non-vegetation can be found on the boundary between vegetation and non-vegetation.This is because the noise voxels are connected with vegetation voxels and thus is not been eliminated as noise.In addition, some misclassified vegetation, can be seen on the tips of the branches.This is because the numbers of vegetation voxels on the tips are not large enough to make large clusters.Next, we focus on the effect of multi-scale classification.In the experiment, we used two different sizes, 10 cm and 20 cm, for voxel.The vegetation missed at the 1st screening with a voxel size of 10 cm is examined for the 2nd screening with a larger voxel size of 20 cm.As shown in Figure 8, we found that such multi-scale classification was quite effective to improve the accuracy of extracting the vegetation from point clouds.Moreover, we examined the applicability of the proposed method to sparser point clouds, Data 3 and Data 4. The numbers of points are approximately one third of original one.The accuracy is shown in Table 2.The F-measures are as high as those of Data 1 and Data 2 (93.8% of Data 3 and 91.2 % of Data 4).Although some false negatives appear in Figures 9(e), 9(f), 9(g) and 9(h), almost all vegetation is still classified correctly.
From this experiment, it was noted that our method performs well against the relatively sparse data.
Finally, we investigated the efficiency of a feature a, which is described by Equation (2).Weinmann et al. (2014) suggests a classification method that selects the best feature set out of 21 features.Because the feature sets contain approximately 10 features, the procedure is time-consuming.Instead, we selected the best feature among 21 features and a by measuring their relevance to classes.The experiment was conducted using sample data.The best feature was anisotropy and a was ranked as the fourth best one.We used anisotropy instead of a to classify voxels and conducted the same procedure as original method on Data 1 and Data 2.However, the accuracy did not change or decreased (F-measure of 94.6 % and F-measure of 91.5 %).As a result, it was noted that a is effective to classify vegetation.

CONCLUSIONS
In this study, we proposed a method to extract vegetation from terrestrial LiDAR data for estimating landscape index in urban areas.The proposed method uses two different voxel sizes, and a process is repeated that calculates the eigenvalues of the planar surface using a set of points, classifies voxels using the approximate curvature of the voxel of interest derived from the eigenvalues, and examines the connectivity of the valid voxels.
We applied the proposed method to two data sets measured in a residential area in Kyoto, Japan.The validation results were acceptable, with F-measures of approximately 95% and 92%.It was also demonstrated that several types of vegetation were successfully extracted by the proposed method.In addition, the method was applied to sparser data sets and the accuracy was acceptable.
Considering the estimation of a vegetation-based landscape index, future work will concentrate on application of the proposed method to mobile LiDAR data.Many obstacles included in mobile LiDAR data such as cars and pedestrians make the application more challenging.However, these problems must be solved to estimate a vegetation-based landscape index with mobile LiDAR data.After estimating the landscape index with mobile LiDAR data, we are going to examine the accuracy, comparing with the result estimated by using aerial LiDAR data in (Susaki and Komiya, 2014).

Figure 1 .
Figure 1.Test site.The red dots represent four scan positions and the rectangles show the areas of data.

Figure 2 .
Figure 2. Flowchart of the proposed method.

Figure 3 .
Figure 3. Scatterplot of c 2 and c 3 .  stands for eigenvalues of each component and c i are their proportion described by Equation (1).According to point distribution, 1D denotes the data that have a linear arrangement, 2D denotes the data that are on a plane and 3D denotes the data that are randomly scattered.

Figure 4 .
Figure 4. Classification of voxels.Using the slope a, voxels are classified into three groups, G 1 , G 2 and G 3 .
Figure 7(a) shows the result classified only with voxel-based analysis, and Figure 7(b) shows the result classified with the combination of voxel-based analysis and cluster-based analysis.As shown in the black circles, the noise on the walls and the roofs becomes less in Figure 7(b) than in Figure 7(a).Considering the results, it is obvious that a combination of local and contextual features improves classification accuracy.The decrease in the noise is effective to improve classification accuracy in the following process because it can prevent the noise clusters from becoming larger.From another point of view, the trimmed vegetation pointed by the white arrow can be extracted with the less noise.

Figure 7 .
Figure 7. Effect of combination of local and contextual features.(a) Result only using voxel-based analysis and (b) result using voxel-based and cluster-based analysis.
) and (b), and Figures 9(c) and (d), respectively.The F-measure, expressed by Equation (4), was used for quantitative assessment of the performance.F-measure = 2 Precision Recall / (Precision + Recall) (4) Precision = TP / (TP + FP), Recall = TP / (TP + FN) (b)), the shrub with the needle leaves (Figure10(e)) and the ivy covering the fence (Figure10(f)).In addition, our method has extracted the trimmed vegetation, which we can find much in urban areas, such as Figures10(a), 10(c) and 10(d).Now, we will discuss the factors for misclassifications.The false positive (non-vegetation) in Figures9(c) and (d) correspond to the fence close to the vegetation.The false positives are noticeably more in the Data 2 than in the Data1 because the fence surface is not flat and generates more noise.

Figure 10 .
Figure 10.Vegetation extracted by the proposed method.(a), (c), (d) Trimmed vegetation, (b) canopy of the tree.(e) shrub with the needle leaves and (f) ivy.