WHEAT EAR DETECTION IN PLOTS BY SEGMENTING MOBILE LASER SCANNER DATA

The use of Light Detection and Ranging (LiDAR) to study agricultural crop traits is becoming popular. Wheat plant traits such as crop height, biomass fractions and plant population are of interest to agronomists and biologists for the assessment of a genotype's performance in the environment. Among these performance indicators, plant population in the field is still widely estimated through manual counting which is a tedious and labour intensive task. The goal of this study is to explore the suitability of LiDAR observations to automate the counting process by the individual detection of wheat ears in the agricultural field. However, this is a challenging task owing to the random cropping pattern and noisy returns present in the point cloud. The goal is achieved by first segmenting the 3D point cloud followed by the classification of segments into ears and non-ears. In this study, two segmentation techniques: a) voxelbased segmentation and b) mean shift segmentation were adapted to suit the segmentation of plant point clouds. An ear classification strategy was developed to distinguish the ear segments from leaves and stems. Finally, the ears extracted by the automatic methods were compared with reference ear segments prepared by manual segmentation. Both the methods had an average detection rate of 85%, aggregated over different flowering stages. The voxel-based approach performed well for late flowering stages (wheat crops aged 210 days or more) with a mean percentage accuracy of 94% and takes less than 20 seconds to process 50,000 points with an average point density of 16 points/cm. Meanwhile, the mean shift approach showed comparatively better counting accuracy of 95% for early flowering stage (crops aged below 225 days) and takes approximately 4 minutes to process 50,000 points.


INTRODUCTION i. Background
LiDAR (Light Detection And Ranging) or laser scanning technology has been identified to hold high potential for meeting the demands of next generation phenotyping (Lin, 2015).This is attributed to the availability of LiDAR systems with small footprint and high pulse emission frequency and their capability to provide high-throughput plant traits.These systems provide robust data in varied illumination conditions and effective reconstruction of the in-field 3D crop architecture.Understandably, the use of laser scanners, both stationary and robot-mounted scanners, for field crop monitoring is becoming common for crop-height measurement and biomass estimation (Garrido et al., 2015;Hofle, 2014;Koenig et al., 2015).However, these applications have not yet exploited the full potential of LiDAR data i.e. 3D point clouds.It could be extended to plant population estimation.
Figure 1.A sample 3D point cloud acquired over a wheat plot.
The random spacing between plants, irregular orientation of the ears and noisy air returns make individual ear detection a challenge.
The number of wheat ears per unit area, which is an indication of the plant population, can be obtained by manual, semi-automated and automated techniques.Even though it is favorable to incorporate automatic counting of wheat ears, it is a challenging task due to the complex crop architecture with close plant spacing and high extent of overlap (LemnaTec, 2015) as can be seen in Figure 1.Moreover, the automatic technique should be able to handle the changing size and orientation of the wheat ears with the developmental stage of the plant as shown in Figure 2.

ii. Related Work
Few automated image processing techniques have been proposed to count wheat ears using 2D images from Charge-Coupled Device (CCD) cameras by applying texture-based classification in hybrid space (Cointault et al., 2008) or by high pass Fourier filtering (Journaux et al., 2010).The counting accuracy from 2D cameras is constrained by the illumination conditions at the time of image acquisition and undetected wheat ears due to overlap and obstruction by other plant organs.Deery et al. (2014) showcase the use of rasterized elevation images to count the number of wheat ears by applying a simple particle count algorithm on the segmented image.This approach does not perform well for fields with high crop density and overlapping ears.
The use of terrestrial laser scanners to estimate wheat crop density was demonstrated by Lumme et al. (2008).A detailed methodology and validation with reference dataset was not presented and the authors conclude that a laser scanner mounted on a mobile platform could be used as a tool in precision farming.Saeys et al. (2009) developed a method to estimate wheat crop density by 3D reconstruction of an artificial canopy set-up.They extracted the ear layer by fitting a "thin-plate smoothing spline" and generated a point density image from which the approximate location of the ears was identified.This method was found to be computationally intensive due to the fitting of the spline.Also, the method was demonstrated on an artificial canopy, where overlap among the adjacent plants is minimal.
Thus, manual counting is still widely used in practice.However, it is a tedious and labour intensive method and obviously cannot meet the needs of high-throughput phenotyping.Hence, 3D point clouds from LiDAR sensors could be used as an alternative for the automatic counting of wheat ears as they help to overcome the limitations of the existing methods owing to the availability of the depth information and reliability in all illumination conditions.This study presents two methods (segmentation followed by classification) developed to extract wheat ears from the LiDAR 3D point clouds.
Point cloud segmentation may be defined as the clustering of points based on their properties and spatial distribution to form homogenous regions.In most scenarios, segmentation is an integral step to recognize objects in the point cloud which determines the amount of useful information retrieved (Wang & Shan, 2009).Hence, the application and the objects present in the point cloud should determine the choice of segmentation technique.
There are two basic design mechanism to segment point clouds.
The first approach is based on methods that have mathematical model assumptions or geometric reasoning such as model fitting, probability density estimators or region growing.Though these methods achieve quick results for simple scenarios, they are sensitive to noise and exhibit poor performance in complex scenarios.The other approach is based on machine learning algorithms trained to classify the different object types in the scene (Nguyen & Le, 2013).A review of various segmentation techniques has been presented by Vosselman et al. (2004) and they are categorized based on the surface being extracted.For the extraction of smooth surfaces, they have suggested region growing, scan line segmentation and connected components in voxel space.
For the segmentation of wheat plant organs, literature reporting methods for the detection of individual trees from airborne laser scanner and crop detection were reviewed.Two methods that do not require labeled training samples and can identify irregular structures from dense point clouds were chosen as follows: • Voxel-based connected component: Voxel-based tree detection (Hosoi & Omasa, 2006) has been demonstrated for the estimation of forest parameters.Another popular application of voxelization of 3D point cloud is to extract the skeleton of trees (Bucksch et al., 2009) and plants (Ramamurthy et al., 2015).These works demonstrate that the voxel approach is adaptable to suit the object of interest.

•
Mean shift segmentation: The mean shift approach (Fukunaga & Hostetler, 1975) uses a non-parametric density estimator function to look for modes in the data.This was first used in computer vision by Comaniciu & Meer (2002) where they demonstrated the applicability of mean shift in image segmentation, to look for clusters in the feature space of images.Since then, this method has gained popularity and has also been demonstrated efficient on 3D point clouds.
Thus, these two segmentation methods were incorporated in the ear detection methods and their performances were compared in this paper.The following section describes the steps involved in the extraction of the wheat ears.The third section presents the datasets and design of the experiments used for the comparison of the two methods.Section 4 contains a short discussion on the results and the final section presents the conclusion and future works.

METHODOLOGY
The procedure to extract the wheat ears from the point cloud involves three main steps as follows: i. Filtering of noisy points ii.Segmentation of the point cloud iii.Classification of the segments

i. Filtering of noisy points
An initial filtering step was designed to remove the outliers placed either high above the plant canopy or too low below the ground level.The next step was designed to deal with the points floating around the objects in the scene that are "air returns" or "ghost returns" owing to the mixed edge effect.The mixed-edge effect occurs when a laser beam is intercepted by the edge of an object (Van Genchten et al., 2008).The beam splits and thus two signals reflected by two different objects are sent back.The receiver records an averaged distance between the two received signals and stores it as a point, which does not actually exist.In our case, the presence of "ghost returns" is expected due to the mixed-edge effect owing to the beam divergence varying with range and the densely occurring leaves, ears and soil particles.
These "ghost return" points surrounding the objects were removed in the first step of the segmentation process.This removal was carried out based on the neighboring point density, and was handled in different ways for the two short-listed segmentation methods.

ii. Segmentation
The following two segmentation techniques were adapted to identify and remove the air returns due to multi-edge effect.The ear-classification procedure that was developed as a part of this study was applied on the segmentation output.

A. Voxel-based connected component
In this approach, the point cloud is split into equal sized voxelcubes and the number of points within each voxel-cube is calculated.Based on the point density and the objects being Table 1 User-defined parameters used in voxel-based segmentation studied, only the voxels with number of points above a threshold value are considered for further processing.This approach is suitable to delineate individual plants in the canopy layer by removing overlapping points from adjacent plants and segment the ears by using a connected component analysis.The following steps were carried out in Matlab to segment the input point cloud as shown in Figure 3 and the user-defined parameters are explained in Table 1.
a. Voxel definition: A voxel coordinate system is defined for the point cloud with equal sided voxels and origin at (Xvox, Yvox, Zvox) = (0,0,0).The position of each point in the voxel space was calculated.b.Voxel-based thinning: For each voxel, the number of points inside the voxel is counted.For further processing, only the voxels which contain more than one point and which have at least one direct neighbor that contain more than a user-defined minimum number of points were selected.This thinning process ensures only voxels that belong to plant organs with high point density are selected for further processing.c.Connected components analysis: On the voxels selected from the thinning process, a connected components analysis was performed.This clusters points belonging to a single object and assign a unique segment number to it.
d. Second iteration of thinning and segmentation: In areas of high overlap, the objects appear connected and result in two or more ears being clustered together as a single segment.In order to address these cases of undersegmentation, a second iteration of point cloud thinning followed by connected components segmentation is carried out with half of the original voxel size thinning threshold.
Figure 3 Flow-chart depicting the steps involved in the voxelbased connected components segmentation.

B. Mean shift segmentation
Mean shift segmentation method, which is point based uses a probability density estimator function to search for modes in the data and cluster the points that fall within function for the kernel bandwidth.The only user-defined parameter is the kernel bandwidth that is defined based on the characteristics of the object being segmented (Melzer, 2007).The following steps as shown in Figure 4 were carried out: a. Filtering: The points hovering over the canopy and belonging to the stem region were removed based on point density.For each point, the number of neighbors within an imaginary 3D cylindrical neighborhood of (diameter = 3 cm, height = 6 cm) is calculated.Only points with more than 30 neighbors in this cylindrical neighborhood are used in further processing.
The dimension of the cylindrical neighborhood was decided based on a trial and error method which helped to identify the ideal dimensions and the range of neighboring density within which wheat ears can be approximately identified.An analysis of manually extracted ear segments showed that a wheat ear contains at least 30 points and thus, the threshold of 30 neighbors was used.b.Connected components: Next, a connected components analysis was performed on the remaining point cloud to identify under-segmented blobs of connected points.This rough connected component step based on proximity and neighborhood definitions reduces the processing time for mean shift segmentation.

Userdefined Parameter
Definition voxel-side • Length of the side of voxel cubes (1 cm was used in this study); • Should be fixed according to the average spacing between the plants in the field.

thinningthreshold
• Threshold for points removal for voxelbased thinning (threshold of 2 points for voxel-side = 1cm) • Should be fixed depending on the size of the voxels; a larger threshold should be used for bigger voxel sizes. smallsegmentthreshold • Threshold for removing small segments (the minimum number of points per segment to be considered as ear segment) The average number of points per ear was always found to be higher than 30.Hence, segments with less than 30 points could be considered as non-ear segments.However, a threshold of 15 was used so as to consider the thinning of the point cloud.
c. Segmentation: For each component, mean shift segmentation was carried out using the (X, Y, Z) coordinates with the help of existing software from ITC, Enschede.The (X, Y, Z) coordinates of the points were used as input parameters in the algorithm.The kernel bandwidths were selected as X = Y < Z following a prolate spheroid, to approximate the average size and shape of a wheat ear.
Table 2 User-defined parameters used in mean shift segmentation

iii. Classification of Segments
After segmenting the point cloud using either the mean shift method or the voxel-based method, we are left with unlabeled segments that could be ear, leaf or stem.Hence, the following strategy was developed to distinguish the ear segments and the parameters used are listed in Table 3.
a. Top-most segment selection: Since the wheat ear is present at the top of the canopy, the first step was to extract the topmost segments in the segment-space.These top segments were labeled as ears.For the segments below, the extent of overlap with top ear segment is calculated.If the overlap percentage is above a certain value, say 50%, then the segment is assumed to be directly below an ear segment and permanently labeled as a non-ear segment.
In some scenarios where the ears are bent and overlapping, as highlighted in the green box in Figure 5     b.Height Thresholding: After the above steps, some of the non-ear segments might still be wrongly labeled as ear (an example as shown in the orange box in Figure 5 (b)).On analyzing the height histogram (Figure 6) of the segments labeled as ears, two distinct peaks could be identified; one peak corresponding to the ear segments and the other one to the non-ear segments.This is because there are always more points describing the ear segments since they are present at the top of the canopy.Hence, in most cases, they are not overshadowed by other plant parts.
Hence, identifying the trough between the two peaks of the height histogram will aid in correct labelling of the ear segments.This height threshold is selected using Otsu's threshold selection method (Otsu, 1979) for histograms.The segments that lie below this height threshold but were classified as ear in the previous steps are reclassified as non-ear segments.No changes are made to the labels of the segments above this height threshold.The orange boxes in Figure 5 (b, c) highlight a mislabeled ear segment being corrected in the thresholding step.

i. Datasets
The laser scanning survey was conducted on three micro-plots, roughly of size 10 m x 2 m, in Gréoux, France, by Arvalis (A technical institute specialized in cereals) and INRA (National Agricultural Research Institute).Three plots sown with three different varieties of wheat on 29 October 2015 and subject to different irrigation treatments were selected for the survey.In order to have a time series, the survey was conducted on three different dates with the crops aged 194 days, 209 days and 225 days (May 10; May 25; June 10, 2016 respectively), each 15 days apart, on the same micro-plots.
The datasets were acquired with a SICK LMS 400 LiDAR (SICK Germany, 2007) sensor mounted on an unmanned ground vehicle (UGV), with nadir looking.This time of flight laser scanner has a scanning frequency of up to 500 Hz with an operating range of 0.7 m to 3 m and a systematic error of ± 4 mm.The scanner uses visible red light at 650 nm as its light source.Along with the LiDAR observations, wheat ear density estimates from manual counting in the field were also provided.To check the segmentation accuracy, reference wheat ear segments were extracted and labeled manually using CloudCompare.The datasets provided were acquired from a height of 2.1 m above the ground and had a point density of 16 pts/cm 2 .

ii. Design of the accuracy assessment
From the point clouds acquired on the three crop developmental stages, i.e. over crops of age 194, 209 and 225 days, a subset corresponding to the same area was taken from the three plots.From these 9 subsets, ear segments were manually segmented and labeled by 6 operators using CloudCompare.These manual segments were used to evaluate the correctness and completeness measures of the two methods (shown in Table 6).In order to check the variability between the operators, one common dataset (crops of age 225 days from Plot A) was provided to them.Table 4 shows the number of ears identified by different operators for the same subset.In addition to these manual labels, reference ear density for each plot was made available from manual counting in the field.This was used to calculate the Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE) shown in Table 5.
iii.Ear detection results

a. Voxel-based ear detection
The thinning of point clouds in the voxel-based segmentation results in segments with much lower number of points as compared to the mean shift segmentation.From Figure 7 (b), it is seen that most of the noisy points surrounding the ears in the original point cloud have been removed.This results however in the removal of few ear points as well.This ensures that the ear segments are distinctly identified even in regions of overlap.In Figure 7 (c), we see the finally extracted ear segments in red, while the non-ear segments are displayed in blue.It is noticed that the voxel-based method does not retain the size of the original ear and hence if used to estimate the size of the wheat ear, it will lead to underestimation.Thus, the thinning threshold should be reconsidered if the size of the wheat ear is needed for the study.However, for plant population estimation the size of the ear segments is not a concern.

b. Mean shift ear detection
The segmentation results from the mean shift segmentation are representative of the raw point cloud i.e. the segments are not thinned and the size of the segments is retained.From Figure 8 (b), we notice that the segments retain the majority of the points from the raw point cloud and hence are mostly noisy.Thus, the mean shift based ear detection method can be directly extended to applications such as ear size estimation, without severe underestimation.

iv. Accuracy
The fitness of the two ear-detection methods was evaluated by comparing the number of ears observed by the automated methods with the reference dataset prepared by manual extraction.The following error metrics were used to evaluate the error associated with the detection of number of wheat ears per plot.

RMSE:
The ear detection method being evaluated might produce good estimates for a particular plot while returning poor estimates for another one.Therefore, RMSE was chosen to represent the collective error associated with the ear count over different plots and plant developmental stages as it is sensitive to outliers (Chai & Draxler, 2014).

MAPE:
This gives a measure of the overall performance of the ear detection algorithm by providing an averaged absolute precision in terms of percentage.The MAPE provides an estimate that is independent of the spatial extent of the plot.Hence, using the MAPE in combination with RMSE gives an overall understanding of the performance of the algorithm across different plots.
Table 5 Error in ear counts from the automatic detection methods calculated using the manual counting from the field as reference.In addition to the RMSE and MAPE, we used two more metrics, Correctness and Completeness, that have been used in literature (Y.Wang et al., 2016;Yao et al., 2014) to evaluate the performance of object detection algorithms.These were calculated using the manually extracted segments as reference.The voxel-based detection method had a comparatively higher Correctness value of 80.15% while compared to the 60.57% of the mean shift method as seen in Table 6.A similar trend is noticed for the Completeness value for which the voxel-based method has a 90.27% while the mean shift has only 83.33%.This indicates that the voxel-based approach is more successful than mean shift in identifying ear segments present in the scene with a relatively lower number of false ear segments.

v. Analysis of Detection Results
The following three criteria were used to understand the robustness of the two ear detection methods developed in this study.The accuracies stated were calculated using the manual counts from the field.
• Crop developmental stage: The size and orientation of the wheat ear changes throughout its flowering stage (refer to Figure 2).Thus, using the datasets available for three different developmental stages of the same plot, the effect of crop growth on the ear detection rate was evaluated.The voxel-based method performs better for the crops aged 209 days and more with an average percentage error of 5.2% and 8.03%.Whereas, the mean shift-based approach shows relatively better results for the crops of age 209 days and more, with an average percentage error of 3.8% and 6.68% calculated by using the manual counts from the field as reference.Thus, it is concluded that the crop developmental stage influences the performance of the ear detection methods.If the age of the plants is known, then the input parameters should be set accordingly for optimal performance.

•
Wheat variety and irrigation treatment: The variety of wheat and water stress are found to influence the size of the wheat grains (Sionit, Hellmers, & Strain, 1980).Hence, the ear detection methods developed were tested on three different varieties of wheat subject to different irrigation treatments.This helped to understand how the methods perform for plots with varying ear density.From this analysis, it was concluded that adaptive selection of segmentation parameters is advisable.

•
Effect of point density: Point density is one of the important factors that influence the choice of input parameter values and ear detection rate.Hence, to understand how the ear detection methods perform for different point densities three degraded versions of the datasets was created by retaining 75%, 50% and 25% of the original points in a random manner.It was found that the ear detection rates for both the methods were consistent for up to 75% of the original point density.However, modifying the parameters according to the point density improves the performance of the ear detection methods.

DISCUSSION
After aggregation of the MAPE values in Table 5 over different plots and developmental stages, it was observed that the voxelbased ear detection method had a lower value of 14.90% whereas the mean shift-based ear detection had a relatively higher value of 41.00%.It should be noted that before aggregation over different developmental stages, the voxel-based method had an average MAPE of 6% for crops aged 209 days and above.Whereas, the mean shift based method had an average MAPE of a very low 5% for crops of age 209 days and below.
This detection rate is an improvement when compared to the 0.9 correlation with 40 training points reported for image-based ear detection by Cointault et al. (2008).In a different study, Saeys et al. (2009) reported a 94% ear detection rate while using LiDAR observations with optimal acquisition settings.However, this was demonstrated on artificially constructed canopy and the proposed method was computationally very expensive.

CONCLUSIONS AND FUTURE WORK
In this study, we designed, analyzed and evaluated two approaches for the detection of wheat ears from laser scanned 3D point clouds.The task of ear detection was split into two stepssegmentation and classification of segments.For segmentation of the point cloud, voxel-based segmentation and mean shift segmentation methods were chosen and implemented.For classification of the segments, an ear classification methodology was developed taking into consideration the position of wheat ears in the canopy, overlap between plants and presence of stunted plants within the canopy.
The performance of these two ear detection methods was then assessed based on developmental stage of the crop, wheat variety and point density.

•
The voxel-based approach performs well for late developmental stages with a comparatively very short computational time.

•
Meanwhile, the mean shift method performs well for different point densities and gives reliable results for early developmental stages and has a higher processing time.
Since both the ear-detection approaches are in some way influenced by the point density of the dataset, the input detection parameters need to be adapted to achieve good count results.
Another interesting observation was the variation in the number of ears extracted by different operators by manual segmentation which focuses on the fact that ear detection is a difficult task even for a human operator.
In the following works, the use of adaptive kernel bandwidth and adaptive voxel sizes for segmentation of the point cloud should be investigated.Experiments should be done to utilize RGB images to improve the ear classification performance.Depending on the available prior knowledge regarding the crops studied and accuracy required for the application, one of the two ear detection approaches may be selected.Thus, through this study, two ear detection techniques that depend solely on the 3D coordinates of the points describing wheat ears have been developed and compared.

Figure 4
Figure 4 Work-flow involved in mean shift segmentation illustrated with a sample point cloud.
(a), the ear segments from the shorter plant initially tend to get mislabeled as non-ears.However, reconsidering these segments based on the extent of their overlap with the topmost segments ensures that they are correctly relabeled as ear segments as shown in the green box in Figure5(b).

Figure 5
Figure 5 Example to illustrate the steps involved in ear classification.Ear segments are displayed in red while the non-ear segments are displayed in blue (a) Ear segments identified by searching for the top-most segment in segment space.The green box highlights a case of mislabeling due to overlap.(b) Relabeling nonear segments depending on extent of overlap with the segments above.The orange box focuses on a case of mislabeled non-ear segments.(c) Final segment labels assigned based on height thresholding selected by Otsu's method which identifies the trough between two distinct peaks in a histogram.

Figure 6
Figure 6 Height histogram showing the two distinct peaks between the ear and non-ear segments.The line at 0.727 m denotes the threshold chosen from Otsu's thresholding.

Figure 7
Figure 7 Voxel-based ear detection results.The size of the ears is reduced due to point cloud thinning (a) Raw point cloud, color coded based on height value.(b) Voxelbased segmentation results with each segment displayed using a random color (c) Ear classification where the red segments are the extracted ears (crops of age 225 days).

Figure 8
Figure 8 Mean shift-based ear detection results.The size of the ears is retained (a) Raw point cloud, color coded based on height value (b) Mean shift-based segmentation results with each segment displayed with a random color (c) Ear classification where the red segments are the extracted ears (crops of age 225 days).
= Number of detected by the automatic method   = Number of ears identified by manual counting N = Number of datasets used in the evaluation

Table 3
Parameters used for the ear classification

Table 4
Variability among different operators in manually extracting the ear segments.The subjectiveness of manual segmentation and difficulty to distinctly identify ear segments for humans is observed.

Table 6
Error in ear counts from the automatic detection methods calculated using the manually extracted segments as reference (aggregated over different plots and developmental stages).