PALM TREE DETECTION USING CIRCULAR AUTOCORRELATION OF POLAR SHAPE MATRIX

Palm trees play an important role as they are widely used in a variety of products including oil and bio-fuel. Increasing demand and growing cultivation have created a necessity in planned farming and the monitoring different aspects like inventory keeping, health, size etc. The large cultivation regions of palm trees motivate the use of remote sensing to produce such data. This study proposes an object detection methodology on the aerial images, using shape feature for detecting and counting palm trees, which can support an inventory. The study uses circular autocorrelation of the polar shape matrix representation of an image, as the shape feature, and the linear support vector machine to standardize and reduce dimensions of the feature. Finally, the study uses local maximum detection algorithm on the spatial distribution of standardized feature to detect palm trees. The method was applied to 8 images chosen from different tough scenarios and it performed on average with an accuracy of 84% and 76.1%, despite being subjected to different challenging conditions in the chosen test images.


INTRODUCTION
Remote sensing is widely used in different sectors of vegetation monitoring like forestry, agriculture, biophysical studies etc.It includes identifying, classifying, localizing, measuring sizes, heights, carbon yields etc., which describe the characteristics and the socio-economic and environmental importance of different vegetation and ease the associated institutions in planning and monitoring.
In agriculture, palm tree cultivation is one of the big sectors with a huge market value.Palm trees are used to produce a variety of products like vegetable oil, bio-fuel, papers, furniture, decorations, fodder for cattle etc.It also has to be mentioned that palm oil is the most widely used vegetable oil in the world.In 2011, palm tree plantation produced over 53 million metric tons of palm oil in 16 million hectares land.The global production of palm oil is increasing every year along with its demand.According to forecast, in Indonesia and Malaysia alone which represent over 90% of palm oil produced globally, the production will increase by 30% by the year 2020 (Economy et al., 2014).
In order to meet this demand, it is necessary to plan well during cultivation and to monitor different aspects of trees like proper inventory, health, size, heights etc., at different stages of their life.As palm tree cultivation zones are huge and difficult to be monitored visually on ground, remote sensing can play a vital role in delivering those data and analysis, cost and labor efficiently from the satellite or aerial images.Therefore, this study investigates on one aspect, more specifically localizing and counting palm trees for creating their inventory, as an initial step towards the multifaceted monitoring.
A few studies in this direction have already been published.Jusoff and Pathan (2009) used the hyperspectral image to map palm trees.The study implemented the linear spectral mixture analysis to discover contributions of different materials on hyperspectral pixel vectors, then used the mixed-to-pure converter that assigns those vectors to signatures and the minimum distance based formula to detect palm trees.Unfortunately, the results were not discussed in the publication.Hence, the performance of this approach is difficult to compare.In general, the hyperspectral imaging shows a coarser geometrical resolution than three-channel images and requires more expensive equipment.Malek et al. (2014) have formulated a very well described framework for 2D image processing and analysis, in order to automatically detect the palm trees in images captured with unmanned air vehicle (UAV).The method uses shift invariant feature transform (SIFT) as a feature and extreme learning machine (ELM) and level setting for classification and grouping respectively.It is followed by the texture analysis of palm trees with local binary patterns (LBP).The performance of the framework is in general admirably above 90%.In the method proposed by Shrestasathiern and Rakwatin (2014), they compute semi-variograms from the rank transformed vegetation index and use non-maximal suppression on them to detect the peaks that correspond to the locations of the palm trees.The normalized difference index (NDI), which is the normalized ratio between green and red spectral bands of the image, is used as vegetation index, in order to make the process usable even for RGB images.And rank transformation is used for removal of blurred object borders to increase the discontinuity between palm tree and background intensities.The performance of the method is stated as 90%.
In both cases, the study areas include plantation regions with palm trees strongly different compared to the background or similar objects.The regions consist of coarsely cultivated trees and a negligible number of artificial objects.In the case of a dense cultivation region, level set grouping tends to fail, as, in principle, it would be difficult to find the arbitrary boundary between the connected trees.Similarly, NDI might have a smooth transition between trees offering no discontinuity between objects and background.Palm trees, in general, have a very distinctive shape.The canopies of palm tree resemble the star-shaped objects on the aerial image.Therefore, we assume that using the shape information of palm trees canopies in detection algorithm could solve This paper proposes a generic shape feature, the circular autocorrelation of the polar shape matrix representation of the image and uses this feature to detect palm trees.The study uses linear support vector machine (SVM) to reduce the dimensions and standardize the CAPS and local maximum detection algorithm to detect the centers of palm trees canopy.

METHODOLOGY
The proposed method uses a sliding window approach for feature extraction.A window of fixed size translates over the entire image, where algorithm derives CAPS for each window, reduces its dimension to single scalar value (standardized feature) and assigns the standardized feature to the image pixel at the center of the window.It results in a spatial distribution of the standardized feature called feature map.The feature map consists of peaks on the palm trees with local maximum value at the center.Therefore, a local maximum detection algorithm is used on the feature map to detect palm trees.The local maximum detection algorithm results in object map, which is the spatial distribution of the centers of the palm trees over the image.The detailed description of the method is presented in following subsections.

Palm Tree Model
In orthophotos, the palm tree canopies resemble star-shaped objects with varying numbers of petals.Each palm tree grows 20-25 leaves every year (Food & Agriculture Organization of United Nation, 1977), but we found through the observation of multiple palm trees that the number of distinguishable leaves is limited to 8-12 petals due to their overlapping nature.In this study, a model of the star-shaped object, as in Figure 1, is used as the basis to understand the characteristics of the star-shaped object and to develop a hypothesis about generic shape feature.The model consists of 10 leaves and the brightness of the object increases radially inward, with the highest value at the center.The background is completely ignored for the analysis, which is not the case in the real-world scenario.
Star-shaped objects are radially symmetric and each pair of consecutive leaves have a constant angular difference.Sampling the object along the circumference of a circle that is sufficiently large enough to include leaves results in a periodic curve with a period equivalent to the number of petals.Sampling along a number of such concentric circles centered at the center of object results to a two-dimensional matrix, with one axis being the angle and other being the radius, called polar shape matrix (Taza and Suen, 1989).The polar model shape matrix of the star-shaped object produces a distinct triangular waveform like structure, as shown in Figure 1b.
The circular shift along the angular axis of the polar shape matrix results in the repetition of the same image when the shift is equivalent to the angular difference between leaves.The similarity decreases until it reaches the least when the shift is half the angular difference between leaves, and gradually increases to maximum on the other half.This scenario repeats periodically with the period equivalent to the number of leaves.In order to measure the similarity between original polar shape matrix and the circularly shifted version of it, correlation coefficients can opt.

Circular Autocorrelation of Polar Model Shape Matrix
Circular autocorrelation of polar model shape matrix (CAPS) is the generic shape feature proposed in the study, which is the correlation of a polar shape matrix representation of the object with a circularly shifted version of itself along the angular axis (circular autocorrelation along angle).CAPS is a periodic curve with the number of peaks equivalent to the numbers of leaves, in the case of the ideal star-shaped object.A vector representing CAPS curve acts as the generic shape feature for detection method.
The polar shape matrix representation of the image is calculated by sampling image along the circumference of different concentric circles in a constant angular interval.Considering angular and radial sampling intervals α and β, the polar shape matrix representation of the image is given by; where Zxy is the image normalized to zero mean and unit standard deviation, θ = (0, α, 2 * α, ..., 360 − α) T , r = (0, β, 2 * β, ...., dmin/2) T and dmin is the side with minimum diameter required to include object and xm and ym are the coordinates of the center of the object.
Circular shifting is achieved by shifting the final entry to the first position while shifting all other entries to the next position.Circularly shifting along the angle and calculating correlation for each shift is highly inefficient.Therefore, an efficient algorithm based on Wiener-Khinchin theorem is proposed as in Equation 2, which computes the autocorrelation through inverse Fourier transform of the product of Fourier transform of polar shape matrix and its conjugate.
where F is the Fourier Transform.
The product of Wiener-Khinchin method (Figure 2) of polar shape matrix is a two-dimensional matrix, which consists of autocor- where N is the total number of samples or elements in polar shape matrix.µZ and σZ are mean and standard deviation of polar shape matrix.
The proposed feature considers the autocorrelation value along the angular shift only.Therefore, only the values at zero radial shift are extracted from the normalized Wiener-Khinchin product of polar shape matrix, as shown in Figure 3. Additionally, the autocorrelation function is symmetric in nature, hence only half of this vector i.e 360 2 * α number of elements is enough to describe the object and is used as the feature vector.CAPS has maximum values at zero angular shift, but for the rotationally periodic object it has multiple maximum values.The CAPS of the rotationally periodic object is itself periodic with the half the period.Circular autocorrelation, by nature, is not affected by the shift.In the case of CAPS, the shifting operation is carried out along the angular axis, therefore, CAPS is not affected by rotation.CAPS is invariant to overall illumination change, however slightly variant to the change in the direction of the source of light and variation of light within the object.

Feature Dimensionality Reduction and Standardization
CAPS is a vector with multiple elements, a total of 360 2 * α in number.The collective influence of all these elements plays a decisive role in discriminating the objects.But, the higher dimensionality of the CAPS offer complexities in processing the features to detect objects.We propose the use of any standard machine learning methods, in our case linear SVM, as a transformation measure to reduce the dimension of this vector.SVM is a standard tool for machine learning, widely used for solving problems in classification, regression and novelty detection (Chang andLin, 2011, Bishop, 2006), because of its good generalization property for even small samples (Liu et al., 2012, Vapnik and Vapnik, 1998, Cortes and Vapnik, 1995).In this method, the data is mapped into a higher dimensional space and an optimal separating hyperplane is constructed in this space (Suykens andVandewalle, 1999, Cortes andVapnik, 1995).SVM uses the convex optimization method to determined the decision boundary/hyperplane in multidimensional feature space.If a training dataset {(x i , yi)} n i=1 where x i , also called example, is the i th vector and yi is the label associated with x i is used to train SVM.The decision boundary/hyperplane is identified by solving the following primal optimization problem.
where W T represents weight vector, b an explicit bias parameter (Liu et al., 2012, Ben-Hur and Weston, 2010, Bishop, 2006).The hyperplane or decision boundary is defined by the vector x such that the value of the discriminant function is zero.The margin error 0 ≤ ξi ≤ 1 allows the samples to be in the margin or misclassified.The constant C > 0 is a regularization parameter.Minimization of Equation 4gives the unique solution of hyperplane which serves as the linear decision boundary for two classes.
The SVM is a discriminative model, hence, it provides the decision rather than posterior probabilities.Decision function is given by sgn(f where f (x) is the function that gives the distance of the sample from decision boundary in feature space and can be taken as the Generally, the trained classifier is used with a sliding window detector which visits every pixel, calculates feature for the window around that pixel and, based on this feature, decides whether the pixel is object or non-object i.e. it labels the pixel directly using the decision function in Equation 6. Usually, this approach generates many detections in the proximity of and within objects, which need to be post-processed.
In this study, instead of the decision on labels of windows, actual values of the f (x) are computed, as shown in Figure 5. Hence, linear SVM is rather used as the feature space transformation function, which reduces the dimensionality of the feature space from n-dimensional space to one-dimensional space, with the decision boundary as the reference R n => R.

Local Maximum Detection
Feature map is a spatial distribution of standardized features over the image, with positive peaks on the palm trees with a local maximum at the center of the palm trees, given that the features used for training are extracted from the blocks with the central pixel at the center of palm trees.The spatial extent of peaks depends on the size of objects.
To detect the locations of the objects, a local maximum detection algorithm is proposed which, in the context of this study, is the sliding window algorithm where a window of definite size (footprint) translates over the entire image and calculates the local maximum by comparing the central pixel value to all other pixels values within the footprint.In the presence of non-palm objects, there could be local maximum detections on non-objects as well, hence, a threshold in the value of f (x) is introduced such that only local maximum value exceeding this threshold is considered as palm-trees.Figure 6 shows the visual demonstration of the algorithm, where the highest peak within a footprint that is above the feature-value threshold is detected.The result of detection might vary a lot for different footprint sizes and thresholds.Therefore, an appropriate size of the footprint, Parameter A and the threshold, Parameter B have to be chosen by evaluating accuracies for different footprints and thresholds.The final result of this process is a distribution of the object's center locations over the entire image: object map.

Accuracy Assessment
The detection results are evaluated by comparing them with the ground truths.Ground truths The precision-recall and total accuracy is calculated from ratios of positives and negatives.For determining true and false positives and negatives, the user should provide the maximum permissible localization error (δ) i.e. permissible deviation of location in ground truth and detection.
Precision or user's accuracy is defined as the proportion of detected objects that are correct in reality, given by Equation 7. Recall or producer's accuracy is defined as the proportion of objects detected (Equation 8) (Powers, 2011).Total accuracy is calculated as the average of both user and producer accuracy.

AccU = T P T P + F P (7)
AccP = T P T P + F N (8) 2.5.2Parameter Optimization As discussed before, the local maximum detection algorithm uses two parameters: footprint size and the feature-value threshold.The detection results are affected by the choice of values for those parameters and it is difficult to choose those parameters, as they can vary between scenes and depend on size and shape of the palm trees.Therefore, an automatic approach to determine those parameters is proposed here.But, it requires an image of small region per scene with ground truths.
In this approach, the local maximum detection algorithm is applied to these images, choosing a set of varying footprint sizes and the feature-value thresholds.The precision, recall, and total accuracies are calculated for each of the detection results and the parameters corresponding to the highest accuracy are chosen as the final optimized parameters for the scene, where that image is taken from, as mathematically expressed in Equation 10.

Localization Error
For each true positive, the displacement from its location in ground truth is calculated as localization error (Equation 11).The overall localization accuracy, in each scene, is calculated through root mean square of the localization errors of all true positive detections (Equation 12).Lj = (xT P (j) − xtrue(j)) 2 + (yT P (j) − ytrue(j)) 2 (11) where j = 1, 2, 3, ..., NT P .NT P being the total number of the true positives.
3. STUDY AREA AND DATA

Study Area
Our study areas comprise of the oil-palm tree plantation regions in Malaysia, Thailand and Indonesia.Indonesia and Malaysia are the two biggest exporters of oil-palm tree products, which contribute to more than 90% of worldwide export (Colchester, 2011).The study data are aerial images captured at four plantation region in the stated countries, with Trimble UX5 unmanned aircraft system (UAS).Ground sampling distance of data range from 3.2 to 10 cm and consist of three spectral channels which include near-infrared, red and green in two of them and red, green and blue in others.The spectral combination of the image does not play a significant role in our study, hence, it is ignored and the only green channel is used for all the experiments.All of the data are orthorectified and reprojected to the Universal Transverse Mercator (UTM) projection system with World Geodetic System 1984 (WGS84) as an ellipsoid of the reference.Test images are selected from all the data, considering different challenges.In total, eight test images covering 4000x4000 pixels are selected.General characteristics of the test images selected are shown in Table 1.Within and between test images, the existence of different cultivation densities, different levels of contrast between object and background, different sizes of trees, artificial objects etc. pose ample challenges for the algorithm to perform. Figure 8 demonstrates two of the test images used in the study.Figure 8a have a high cultivation density of palm trees that have a high resemblance to the star shape.The object-background contrast is high and there are artificial objects like houses present.

Training Data, Test Data and Ground Truth
On the other hand, Figure 8b consists of palm trees, which have more circular shape with a low cultivation density.There exist palm trees of different sizes and the object-background contrast is very low.It is difficult, even for human eyes, to distinguish between the palm trees and the background.The ground truths required for accessing the accuracy of the study is prepared by manually selecting the palm tree centers in the test images.

IMPLEMENTATION AND RESULTS
The values of α = 1px and β = 1 degree are used as radial and angular sampling intervals for the extraction of CAPS.CAPS of palm training samples show lower values at leaves compared to the model, as repeating scenarios are not exactly same in the real world as in the model.However, the CAPS curves still are fluctuating curves with peaks at the petals, as shown in the CAPS of a training example (Figure 9a).The CAPS elements of palm trees on average have higher values of autocorrelation coefficients compared to the non-palm objects.The window of size 201x201px is translated over the test images to generate feature maps.The window is shifted along rows and columns at the step size of 5 pixels.Even though it affects the final localization accuracy, the efficiency of the standard feature extraction process is increased by 25 times.Feature maps have prominent peaks at the center of palm trees as expected by the hypothesis.But, there are also a few peaks on the objects that are not palm trees.The false peaks lie mainly on other vegetations and the artificial objects.These characteristics can be noticed in Figure 10, which shows the feature map generated from Image 01.For δ = 40px, the precision and recall exhibit similar behavior as for δ = 100px.But the precision and recall are smaller in value and hence, the total accuracies are lower.For Image 01, the maximum total accuracy of 75.75% is achieved at footprint size, 215px and threshold, 2.45.The feature-value threshold for δ = 40px is higher compared to the one for δ = 100px, implying that it only considers objects with very high likelihood.Figure 13 shows the optimal detection on Image 01 for δ = 40px.Similar object maps are generated for all 8 images.
The results of the detection on all images for δ = 100px and 40px

DISCUSSION
The close analysis of object maps and the results suggests the good performance of the algorithm in detecting palm trees.In most of the cases, the accuracies are higher than 80%.Test images 02, 03, 06 and 07 have the palm trees that are very distinct and are resembling more to the star-shaped objects (eg. Figure 14a) than the palm trees in other images and the detection algorithm performed best for those images with accuracy above 89% and 82% for δ = 100px and 40px respectively.The detection performed very well against the presence of different sizes of palm trees and high (eg.Figure 14a) and medium cultivation densities.The performance of the algorithm against medium and low object-background contrast is low for δ = 40px, while it is higher for δ = 100px, as seen in results of test images 04, 05 and 08.This suggests that the objects with low objectbackground contrast form small peaks.In the case of test Image 08, the object-background contrast is very low making it difficult even for human eyes to distinguish between the trees and the background.The algorithm performed poorly for the image.
In the presence of artificial objects, mainly houses, the algorithm detected the roof corners and centers as shown in Figure 14b.CAPS of the regular shapes are also a periodic curve with high autocorrelation values, which is the reason behind the detections on the houses.CAPS of a circular shape have the highest value.Therefore, the objects, like tree canopies having circular and nearly circular shape are also detected.In order to get rid of such false detections, a learning algorithm, which decides on a non-linear decision boundary, can be employed.From the observation of such characteristic of CAPS, it can be concluded that the use of CAPS can also be extended to object recognition.It can be used as a general shape feature in detecting objects of different shapes.
There are few false detections in the backgrounds between the cluster of the trees, where the backgrounds, in a combination with the trees around, give the false impression of star-shaped objects (Figure 14c).Even though they can be eliminated by taking the spectral value in consideration, it is not attempted in this study.
Overall, the training data required for the algorithm were produced from only one image, while the trained SVM were applied to scenes taken at different scenarios.Even then, the accuracies of detections are comparable with each other and with those methods proposed by Malek et al. (2014) andShrestasathiern &Rakwatin (2014).Unlike, the test areas chosen by the authors, our test areas were chosen to have multiple challenges, including few images with high cultivation densities.The accuracy of the detection might increase if the training data from all scenarios are included.
Although the localization accuracies are highly controlled by the maximum allowed localization error δ, evaluation on how accu- The locational accuracy can be increased by decreasing this shifting step, but it will also make the process inefficient.

CONCLUSION
The algorithm performed well with comparable accuracy with the results from different authors, given that the test sites were chosen focusing on different challenges.Overall accuracy of the method for all test images is 84% for δ = 100px and 76.1% for δ = 40px.Although, the performance of the algorithm was poor for few images, in most of the cases it produced results with high accuracies, despite the fact that training samples for algorithm were chosen only from the first scene.
CAPS as the feature was able to distinguish between palm trees and non-palm objects.But yet, few interesting false objects were detected, which had high CAPS values.It signifies that CAPS can also be used for recognition of different shapes, especially regular shapes, which also have high values of CAPS.
One drawback of CAPS is that its extraction is a non-linear process, and hence, the efficiency of feature extraction is low.Further study on increasing the efficiency would strengthen the feature.Finally, proved to be a shape that distinguishes palm that can be used singularly in detection.
However, for increased accuracy, it also be used in combination with other features.
Figure 1: (a) Model of a star-shaped object.(b) Polar shape matrix representation of the model those drawbacks.As an extra feature for above algorithms, shape features could strengthen the algorithm's potential.

Figure 2 :
Figure 2: The product of the Wiener-Khinchin method.The image shows the values for a shift in both radial and angular direction.Red as the highest and blue as the lowest correlation.
SVM is trained with labeled generic shape features extracted from the training examples.The input feature vector formulates a complex multidimensional space where SVM determines the best decision boundary between positive and negative training examples (Figure4) and the function to calculate the distance of examples from decision boundary in the same space.This reduces the n-D feature space to the 1-D scalar quantity with higher values for higher object likelihood.

Figure 4 :
Figure 4: Support Vector Machine: Red boundary is the hyperplane generated by SVM by maximizing the margin with two blue hyperplanes, also called support vectors.

Figure 5 :
Figure 5: f (x) of test samples from decision boundary confidence measure.The decision function outputs the class label which input vector x belongs to (Chang and Lin, 2011).

Figure 7 :
Figure 7: Few examples of positive samples (Palm) of training data in the red channel.The data chosen for training and test are resampled to approximately 5 cm to avoid different sizes of training and test images.Through the observation in a significant amount of fully grown tree canopies, we found out that the canopies of the fully grown palm trees are enclosed by the window size of 201x201 pixels (10mx10m).Therefore, the sizes of training data and the window in feature extraction are chosen to be 201x201 pixels.Training data are selected randomly from only one test site in Thailand.Training data consist of 644 samples of palm trees and 597 samples of non-palm objects, which include non-palm trees, cars, buildings, backgrounds etc.Few samples of the training data are shown in Figure7.The leaves and their sizes varies between the images.In some cases, the leaves overlap and the canopies tend to be a circular object.On most of the training data, there are interferences of nearby trees or other objects.
Figure 8: (a) Test Image 07 (b) Test Image 08 Figure 9: (a) CAPS of a palm sample (b) Average CAPS of all palm and non-palm training samples

Figure 10 :
Figure 10: Feature map of Image 01 A local maximum detection algorithm with different footprint sizes and threshold values is applied to produce multiple object detection results for each of the feature maps.For each result, precision and recall are calculated with the maximum permissible localization errors of δ = 100px and δ = 40px, in order to compare the results for different values of δ.Precision-recall

Figure 12 :
Figure 12: Object map of Image 01 with δ = 100px Figure 14: (a) Detection in high cultivation density.False detections on (b) artificial objects and, (c) false star-shape.[Blue-star: Detection, Red-star: Missed, Red-dot: False detection]rately the system localizes center is made by calculating the root mean square of the deviations of true detections from object centers as discussed in Section 2.5.3.The localization error of CAPS in case of δ=40px is less than 20px and 70px for allowed δ = 100px.The maximum localization error occurs for Image 08, for which accuracy of detection is also worst.Except image 08 for δ=100px, all other images have localization accuracy below 50px, which is half of the allowed error.Furthermore, during the feature map derivation, the shift along rows and columns are carried out in an interval of 5 pixels, which also contribute significantly to the degradation of the locational accuracy of detections.The locational accuracy can be increased by decreasing this shifting step, but it will also make the process inefficient.

Table 1 :
Test image sets & their properties

Table 2 :
Detection results on all images with CAPS, δ = 100px

Table 3 :
Detection results on all images with CAPS, δ = 40px