retrieving from terrestrial point cloud data: coupling computational geometry application and Gaussian mixture model clustering

: Leaf area index (LAI) is one of the most important structural parameters of forestry studies which manifests the ability of the green vegetation interacted with the solar illumination. Classic understanding about LAI is to consider the green canopy as integration of horizontal leaf layers. Since multi-angle remote sensing technique developed, LAI obliged to be deliberated according to the observation geometry. Effective LAI could formulate the leaf-light interaction virtually and precisely. To retrieve the LAI/effective LAI from remotely sensed data therefore becomes a challenge during the past decades. Laser scanning technique can provide accurate surface echoed coordinates with densely scanned intervals. To utilize the density based statistical algorithm for analyzing the voluminous amount of the 3-D points data is one of the subjects of the laser scanning applications. Computational geometry also provides some mature applications for point cloud data (PCD) processing and analysing. In this paper, authors investigated the feasibility of a new application for retrieving the effective LAI of an isolated broad leaf tree. Simpliﬁed curvature was calculated for each point in order to remove those non-photosynthetic tissues. Then PCD were discretized into voxel, and clustered by using Gaussian mixture model. Subsequently the area of each cluster was calculated by employing the computational geometry applications. In order to validate our application, we chose an indoor plant to estimate the leaf area, the correlation coefﬁcient between calculation and measurement was 98.28%. We ﬁnally calculated the effective LAI of the tree with 6 × 6 assumed observation directions.


INTRODUCTION
Leaf area index (LAI) is a key variable used to model many physical processes, such as canopy photosynthesis and transpiration.It determines the size of the plant-atmosphere interface and thus plays a critical role in the exchange of energy and mass between the canopy and the atmosphere (Weiss et al., 2004).Therefore, obtaining LAI effectively and accurately by using remote sensing data has been discussed for decades.
LAI is first defined as one-sided of photosynthetic tissue per unit ground horizontal area (Watson, 1947).At the beginning of the LAI studies the canopy structural parameters are measured directly.It is accurate, however, time consuming and destructive to plants (Lang and Xiang, 1986).According to the botanical structure of the vegetation, we only consider the green plants on the ground; leaves from the canopy are with low visible light reflectance except of green light.Theoretically it is possible to retrieve the leaf area from the canopy using optical sensed data.Since 1960s people attempt to indirectly retrieve canopy structural parameters from optical images (Jonckheere et al., 2004).Several ground-based commercial instruments for LAI collection have been testified, the advantages and drawbacks of each instrument have been presented as well; moreover the potential of the laser instruments for LAI measurement has been pointed out acutely (Welles and Cohen, 1996).Landsat TM data and ground-based measurements of LAI have been correlated for estimating effective LAI, which is the product of LAI and clumping index.Some issues have been discussed for the LAI estimation, which were correlated to the LAI retrieving from Landsat images in spite the limitations of TM image (Chen and Cihlar, 1996).
The development of the optical sensor resolution improves LAI measurement accuracy and mathematical modeling.An applica- * Email:shengye.jin@yahoo.com;Tel/Fax: +81 075-383-3303.tion with spectral and textural information was used for mapping the LAI of different vegetation types, has been presented by using IKONOS images (Colombo et al., 2003).Moreover when the land cover is stratified, spectral and textural relations are considerably better for mapping LAI spatial variability for different crop types.Encouraging result in LAI estimation by using optical data has been obtained via data fusion process for high spatial resolution images of the LAI.For achieving reliable result over a large area, LAI measurement with auxiliary variables such as NDVI computed from Landsat images have been combined (Hernndez et al., 2014).However, due to the assumption of randomly located foliage elements within the canopy, LAI obtained from gap fraction (Pearcy et al., 1989) theory is not the actual LAI (Zheng and Moskal, 2009), thus effective LAI has been proposed to describe the result more accurately, which is obtained by considering the clumping index affection.
Recent studies propose to utilize laser scanner for LAI estimation since it holds good performance at range detection and the consistency of the precision.Morsdorf and his colleagues evaluated the potential of deriving effective LAI proxy by estimating a fraction of first and last echo types inside the canopy.Based on the regression results, LAI has been computed for a region, and compared qualitatively to the result based on imaging spectrometry, revealing similar spatial patterns and ranges of values (Morsdorf et al., 2006).Effective LAI has been calculated by referring empirical model from airborne LiDAR data.Due to the narrow scan angle of the LiDAR, the parameters of the model have been estimated from the field investigation (Korhonen et al., 2011).An accurate application has been developed for calculating leaf area density (LAD) and cumulative LAI profiles of small trees (Hosoi and Omasa, 2006).This voxel based 3-D modeling application is implemented in the study.This model allows the LAD and LAI of trees, which have complex spatial distributed foliage, to be computed by direct counting of the beam-contact frequency in each layer using a point-quadrat method.The non-photosynthetic tissue has to be removed as redundancy in the application so that the damage to the plant cannot be avoided.A computational geometry based application has been published for retrieving the effective leaf area using terrestrial laser scanner (TLS) data (Zheng and Moskal, 2012).Due to the high sampling density of TLS, this method can be taken as a calibration tool for airborne laser scanning and optical sensing.PCD have been discretized into slides and the leaf area within each one has been calculated, but the non-photosynthetic structure still cannot be removed.Nonphotosynthetic tissue removing is a challenging issue of the laser scanner utilized LAI estimation study, especially for the canopies which are not dense.
As a consequence of the 3-D scanners development, the problem of high quality geometrical characteristic extraction from dense PCD is receiving increased attention (Fabio, 2003).It is possible to distinguish leaves from non-photosynthetic structure (branch and trunk), if one considers their geometry pattern features from PCD.One leaf is a patch of an infinite surface, yet the non-photosynthetic structure could be taken as the integration of cylinder segments.Surface curvature is considered as the bending and distortion of surface patches, which holds the potential to distinguish between leaves (distorted like a hinge) and branches (composed from cylindrical segments).Intuitively, the branch curvature should be smaller and more regular than that of leaves, while the curvature of leaf surfaces should be sensitively changed according to the leaf bend.Removing the nonphotosynthetic structures of the canopy is thus constrained by only geometric properties that differ from leaves, though other issues can be overcome.
PCD can be considered as the aggregation of multivariate distributed clusters according to the object location in Cartesian space.If the point clusters in a sub-space, a voxel, can be distinguished, the PCD can be classified into several clusters, which hold significance for leaf area retrieval.Leaf area calculation can be achieved by using computer scientific libraries like Computational Geom-etry Algorithms Library (CGAL).For clustering the points we choose the clustering analysis packages from R platform based on Gaussian mixture model, which is an iterative expectationmaximization (EM) processing method for determining maximum likelihood via a statistical model (Reynolds, 2009).We here focus on the feasibility of applying this to PCD clustering, rather than discussing a detailed statistical algorithm for the Gaussian mixture model.The Gaussian mixture model is an option for handling PCD clustering that can provide an unsupervised solution.After resolving the clusters, the complexity of PCD is highly reduced.Leaf is considered as a fitting plane which is computed from computational geometry algorithms.Leaf surface is reconstructed according to the points by using triangulation algorithm, which is a commonly employed technique of computational science.
This study aims to investigate the feasibility for retrieving the effective LAI from PCD, which using the combination of computational geometry application and distribution-based clustering algorithm.The data processing is as follows: first, we estimate the PCD curvature and set the curvature threshold for filtering the non-photosynthetic structure.Second, we discrete the filtered PCD into voxel matrices that can hold even the largest leaf, then cluster the PCD in each voxel according to the Gaussian mixture model.Third, we reconstruct the surface of each cluster in each voxel by using the computational geometry algorithm.The accumulation of the entire canopy clusters area is taken as the effective leaf area of the tree, which is calculated according to the different observation geometry.The effective LAI is finally calculated which is the ratio of the effective leaf area of the canopy and the volume from the convex hull of the entire canopy.
Five parts are presented in this paper.In the second part we introduce the data collection, theoretical foundations and data processing.Thirdly, we show some results, including the validation experiment and retrieved effective LAI of an isolated broad leaf tree.We discuss several issues related to the study in the fourth part; finally a conclusion will be made.

Data Collection
A well-isolated broad leaf tree (Cinnamomum Camphora, Figure 1a ) was used in this study, which is located in Kyoto University Katsura Campus (N34 • 58 55.61 , E135 • 40 43.54 ) in Kyoto Prefecture, Japan.We set six directions around the tree for collecting data.Raw data were assigned to an assumed coordinate system, in which the original point was set to the first station (ScanPos001).Cartesian coordinates (x, y, z) units were in meters (m) (Figure 1b).Both vertical and horizontal scanning intervals were set to 0.02 • .

Methodology
2.2.1 Simplified Curvature Estimation: PCD give the average Cartesian or polar coordinates of the object surface patch.The curvature of a point can therefore be estimated by considering the normals of its neighboring points.Assume there is a point set P N * which contains N points, whose element pi * is the 3-D coordinates (x, y, z) of the Cartesian space.The amount of the point is determined by both vertical and horizontal scanning intervals.The set of neighboring points Pn is determined by the searching radius r centering on the query point pq, which is Pn = {pi : pq − pi 2 ≤ r} , (i = 1, 2, ..., n).A simplified curvature cq of point pq is given by (Rusu, 2009): where λj(j = 1, 2, 3) is the eigenvalue given by Cov (2) 2.2.2 Gaussian Mixture Model: Voluminous points amount and complexity are the principal property of PCD.PCD carry the Cartesian coordinates from the patch of an object interacting with the laser impulse by receiving objects surface echo.Points therefore correspond to heterogeneous distributions, like multivariate clusters, of discrete objects within a sample space according to their positions.Exploring constructional information is one of the majority purposes for PCD users.We used the mclust package of the R platform to cluster the point set observations for reducing the complexity of the PCD.It provides functions for model-based clustering (covariance parameterization and number of clusters selected via Bayesian information criterion (BIC)), which gives the maximized loglikelihood for the number of components in the data.In general, the larger value of BIC, the stronger the evidence for the number of clusters (Fraley et al., 2012).It attempted to iterate group sets of objects according to a rule stating that objects within the same cluster were more optimized to each other than those in other groups.The detailed algorithm explanation was presented by the articles (Fraley andRaftery, 1998, Fraley andRaftery, 2002).

Leaf Orientation Retrieving:
This study assumed that one leaf is a fitting plane retrieved from point cluster which was obtained by mclust (see 2.2.2).We chose the principal component analysis routine from CGAL to calculate the plane (Alliez et al., 2014).Suppose the plane equation we computed is: Ax + By + Cz + D = 0.The orientation of the plane can be calculated by: where np 2 = √ A 2 + B 2 + C 2 , θL and ϕL are the zenith and azimuth angle of the plane normal, respectively.Leaf normal calculation is a key step to retrieve the leaf area we discuss in following section.

Effective Leaf Area Calculation:
In order to find the geometrical boundary of the cluster, the 3-D convex hull function of CGAL for generating the convex hull, which contained all the points of the cluster, was selected (Hert and Schirra, 2014).Assume the convex hull contains m points as vertices, the leaf area can be approximately calculated as the topological closure of the points that vertices projected on the fitting plane which we computed from section 2.2.3.Suppose PV = {p k (x k , y k , z k ) : k = 1, 2, ..., m} are the vertices projection on the fitting plane, the center pv(xv, yv, zv) of PV can be calculated by (2).If the points are ordered sequently, according to the triangulation mesh function of CGAL, the leaf area could be computed by accumulating the triangles: where s l is the triangle area constructed by pv, p k and p k+1 .When k = m, p k+1 = p1.Those points inside the closure are skipped.The effective leaf area S * can be computed by considering the projection coefficient (Ross, 1981): where rv(θv, ϕv) and rL(θL, ϕL) are view direction and leaf normal, respectively.

Data Processing
Effective leaf area retrieving algorithm was designed as follows (Figure 2).We calculated the simplified curvature of each point, see (1), and then defined the searching radius r (r = 4cm) according to the average long axis of the leaves.Points whose cq > T1 (T1 = 0.21) were taken as D1, which were mostly from leaves, and points whose cq ≤ T1 were taken as D2 (Figure 3c).In D2, there were points from leaves as well, which should be restored to D1. Hence we detected the D2 points within a sphere (the radius is T2 = 0.01m), which centered at each point of D1.The point(s) in D2 located in each sphere were merged to D1 as leaf data (Figure 3d).The leaf data were voxelized with each lattice size 8×8×8 cm, which could contain the maximum leaf size.The data within each voxel were clustered according to section 2.2.2.Those clusters containing less than thirty (T3 = 30) points were rejected due to the possibility of low reliability.The leaf area were calculated for each cluster according to section 2.2.4,and then the accumulated area for entire canopy was calculated.
Finally the effective LAI was computed according to proposed 6 × 6 observation directions.

RESULTS AND VALIDATION
Raw data points amount was over 9.7×10 6 (Figure 3a).After curvature estimation (Figure 3b), 76.13% data were rejected according to the threshold, many of them were from leaves.Therefore, we employed the method in 2.3 to restore the excessively removed points.The result of rejected data and leaf data are shown in Figure 3c and 3d, individually.The leaf data accordingly increased to 52.33% and the opposite data were 47.67%.The leaf data were effectively restored.
The algorithm of data processing within a voxel is indicated in Figure 4.The initial condition of the data in the voxel is shown in Figure 4a, which is out-of-order arranged.After mclust function (see 2.2.2) processing, points were distinguished into four clusters: c1, c2, c3 and c4 (Figure 4b).We picked one (c3) of them as an example for retrieving the cluster area, red plane (Figure 4c) was obtained by using PCA function of CGAL (see 2.2.3).First, we generated the convex hull of this cluster by using 3-D convex hull generation function from CGAL, then obtained the vertices of this convex hull.Vertices were projected on the fitting plane of the cluster and those points within the convex hull were removed.
Finally the area of this cluster was calculated by accumulating the triangles which are constructed from each two vertices and the center (red point in Figure 4d).The volume of this canopy was calculated as V = 33.918m 3 from 3-D convex hull as well.
For the light travel length we assumed the canopy was contained by a sphere, according the maximum and minimum coordinates of the PCD, we calculated the effective LAI by: where D is the average light traveling distance, η is the effective leaf area of the entire canopy, χ is the index of the voxel (from 1 to I), ω is the index of the cluster within the χth voxel (from 1 to J), S * χω is the effective area of each cluster within a voxel, ∆x = 4.64 m, ∆y = 4.61 m, ∆z = 4.79 m, respectively.
The effective LAI of the tree was calculated from 6×6 assumed observation directions (Table 1).It was indicated that our calculation can precisely provide effective LAI estimation according to the variation of the observation directions.Due to the lack of the resource we cannot accurately measure the total leaf area of the tree we selected.We assessed the application using an indoor plant (Figure 5a) and marked down nine well-isolated leaves before this plant was scanned (Figure 5b).We manually measured the area of the leaves to compare with the calculation results.A high correlation coefficient up to 98.28% was achieved (Figure 5c).

DISCUSSION
Curvature is a common parameter during PCD processing in computational geometry.Different from the Gaussian curvature and mean curvature, the simplified curvature we utilized in this study is the approximation of the change of curvature in a neighborhood Pn centered at pq, and it is invariant under rescaling.It can be computed conveniently as well.Small values indicate that the neighborhood points Pn are on the plane tangent to the surface (Rusu, 2009).This simplified curvature was an effective parameter for identifying whether the point was on trunk or leaf by using a properly defined search radius r.In our study we defined the searching radius as the average long axis of the leaves.However, we did not discuss how the searching radius affects the curvature.Different searching radius settings would definitely change the point normal result, which was determined by the surface complexity of the object.Smaller searching radius corresponded higher curvature fluctuation, and bigger searching radius causes that the distinction of leaf points and non-photosynthetic tissue points was not obvious.A future study for investigating the optimized searching radius setting method should be attempted so that to develop this application into an unsupervised function.
Our clustering strategy was to first discrete the PCD into voxel matrices, for reducing the computational complexity of the PCD.The voxel size was set according to the maximum single-leaf size observed in the canopy.Other applications for determining the voxel size should be considered in line with different leaf patterns.We then clustered the points within each voxel using Gaussian mixture model.Estivill-Castro has pointed out that one cannot precisely define clusters, so the clustering results almost certainly contain errors (Estivill-Castro, 2002).They showed weak correlations and the amount of these points were not big.Therefore we remove those clusters which held less than thirty points.It caused the loss of leaf area we retrieved from PCD, but the reducing area of thirty points is very small according to the scan- Effective LAI (m 2 /m 2 ) Zenith( ning resolution in our case.However, when the scanning interval and range are big the reducing of the leaf points should be cautiously considered.Our validation result showed a correlation coefficient of 98.28%, indicating that the indoor experiment presented high accuracy.Yet with the condition of wild plant and forest, the application should be demonstrated in several conditions.According to the pattern of the canopy in this study, which was approximately taken as a sphere, therefore the light travel distance was accordingly a constant in this case, for different con-ditions the light travel distance should be calculated individually.
This study adapted dense PCD with millimeter resolution, permitting reliable scanned points Cartesian coordinates.There should be at least three directional observations under similar conditions so that the whole canopy can be covered, with minimal disturbance from wind and beam absorption by the canopy.The PCD should be calibrated with geo-references for calculating the actual relative angle between leaf normal and observation angles for computing the effective leaf area.

CONCLUSION
This study retrieved effective LAI, a key canopy structure parameter, using terrestrial scanned PCD.It is an important factor in vegetation structural investigations and canopy radiative transfer simulations.In contrast to previous studies, we innovatively utilized simplified curvature as the threshold for filtering the non-photosynthetic components.PCD were clustered by mclust, which is a Gaussian mixture model based package of the R platform.An important aspect of this method was the manipulation of computational geometry algorithm for curvature estimation and effective leaf area retrieving.Good agreements between measurement and our calculation were obtained by conducting an indoor experiment of a plant.The correlation coefficient was in the range of 98.28%.Due to the sufficient accuracy of TLS, the results of this study could be taken as a validation of optical sensor based LAI estimation.In future research, we will attempt to apply our curvature-cluster application on investigating forest effective LAI.We believe that laser scanning based phytometric techniques have more advantages than many optical sensor based applications.With the development of laser scanning techniques, the efficiency and accuracy of TLS will be improved; and forest parameterization and investigation techniques will be substantially improved as well.
Figure 1: (a) The tree (Cinnamomum Camphora) used in this study.(b) Six stations for laser scanning, ScanPos001 is taken as assumed original point of the coordinate system we used.
• xj = λj • xj, Cov is the covariance matrix of Pn determined by Cov = p)(pi − p) T , xj is the eigenvector of Cov.p is the center of Pn, which is given by:

Figure 3 :
Figure 3: (a) The raw point cloud data of the tree with scale bar 1 m.(b) The curvature estimation of the raw data, scale bar shows in 0.9 m, color various according to the curvature, which ranged from 0 to 0.33.(c) The data (D2) contain branches after curvature filtering.The scale bar shows in 0.5 m.(d) Leaf data after merging leaf points from D2 and data D1; the scale bar shows in 0.5 m (see section 2.3).

Figure 4 :
Figure 4: (a) Point data with in a voxel.(b) After mclust clustering, the data were indicated into four subsets (c1, c2, c3, c4) each symbol indicates one cluster.(c) One cluster (c3) from (b), red plane is the fitting plane retrieved from PCA algorithm.(d) After CGAL convex hull generation, we obtain the vertices of the volume of the cluster, then project these vertices on PCA fitting plane of this cluster and remove those points within the closure.Finally the area of this cluster is calculated by accumulating the triangles which construct from each two vertices and the center (red point).

Figure 5 :
Figure 5: (a) The picture of the indoor plant for validation.(b) PCD of the plant, different color shows components of the plant.(c) Comparison between measurements and calculated each leaf area of the plant, which have a correlation coefficient of 98.28%.

Table 1 :
The effective LAI (m 2 /m 2 ) result which is calculated from 6 × 6 assumed observation directions.