MAPPING OF HIGH VALUE CROPS THROUGH AN OBJECT-BASED SVM MODEL USING LIDAR DATA AND ORTHOPHOTO IN AGUSAN DEL NORTE PHILIPPINES

This research describes the methods involved in the mapping of different high value crops in Agusan del Norte Philippines using LiDAR. This project is part of the Phil-LiDAR 2 Program which aims to conduct a nationwide resource assessment using LiDAR. Because of the high resolution data involved, the methodology described here utilizes object-based image analysis and the use of optimal features from LiDAR data and Orthophoto. Object-based classification was primarily done by developing rule-sets in eCognition. Several features from the LiDAR data and Orthophotos were used in the development of rule-sets for classification. Generally, classes of objects can't be separated by simple thresholds from different features making it difficult to develop a rule-set. To resolve this problem, the image-objects were subjected to Support Vector Machine learning. SVMs have gained popularity because of their ability to generalize well given a limited number of training samples. However, SVMs also suffer from parameter assignment issues that can significantly affect the classification results. More specifically, the regularization parameter C in linear SVM has to be optimized through cross validation to increase the overall accuracy. After performing the segmentation in eCognition, the optimization procedure as well as the extraction of the equations of the hyper-planes was done in Matlab. The learned hyperplanes separating one class from another in the multi-dimensional feature-space can be thought of as super-features which were then used in developing the classifier rule set in eCognition. In this study, we report an overall classification accuracy of greater than 90% in different areas.


LiDAR Mapping
Land cover has a significant impact on the earth's climate and environment. Because of this, land cover mapping is very important for authorities and scientists to gain better understanding and monitor environmental changes. According to the Climate Research Committee of the United States National Research Council, land cover distribution has an evident influence on the earth's radiation balance, because changes in the land cover will have a large effect in the evaporation, and other heat fluxes on the earth's surface (US National Research Council, 2005). One example of an environmental effect of a land cover type is the reduction of land surface temperature through the absorption of solar radiation by a large area of trees. Another instance would be the effect of an impervious land cover to the natural infiltration of groundwater which can be a potential cause for flooding.
Accurate understanding and precise monitoring of land cover is essential for decision makers for management of the earth's resources. Passive aerial/satellite remote sensing techniques reach certain limits in producing a land cover analysis and classifications at finer scale; therefore, one of the advancements to consider in the future is to divert the research from algorithmic development into multi-sensor data fusion (Benediktsson, Chanussot, & Fauvel, 2007). That need for multi-sensor data fusion has thus motivated researchers to probe the use of topographic airborne LiDAR data for land cover classification. Airborne LiDAR is a laser profiling and scanning system, which emerged commercially in mid-1990s, mostly used for bathymetric and topographic applications. Using direct geo-referencing, the laser scanning apparatus installed in the aircraft collects a 3D point cloud of the surveyed area. Unlike conventional 2D satellite data, the LiDAR point cloud describes the 3D topographic profile of the scanned surface. Several advantages of airborne LiDAR include penetration of tree canopy, insensitivity to lighting conditions and there are no effects of relief displacement. Due to these advantages, airborne LiDAR has been effectively used for generating digital terrain model (DTM), topographic mapping, construction of digital 3D city model, natural hazard (Wai Yeun Yan, 2014).

Use of LiDAR in vegetation analysis
Due to the primary data on height obtained through airborne LiDAR scanning, the utility of airborne LiDAR for land cover classification and object recognition has increased. The height value (z value) from LiDAR gives very significant information for feature extraction. Users of LiDAR data usually interpolate the 3D LiDAR data to produce the digital surface model (DSM). From this DSM layer, several other features can be derived to increase the separability between different classes in the feature space. Several feature extraction studies using LiDAR can be found in the literature such as Priestnall, G., Jaafar, J., & Duncan, A. (2000). Several accuracy improvements have been reported by fusion of LiDAR and multi-spectral data (Hartfield, Landau, & Van Leeuwen, 2011). The DSM can be transformed into a normalized digital surface model (nDSM) by subtracting the DSM with the digital terrain model (DTM). The nDSM represents the above-ground feature only. Numerous studies such as (Charaniya et al., 2004) and (Huang et al., 2013) have presented that height features derived from LiDAR data can significantly differentiate high from low vegetation. The addition of high resolution aerial orthophotos and LiDAR data in the aforementioned studies addresses the problem of classification confusion caused by spectral mixing in areas with heterogeneous classes. However, in high spatial resolution images, the problem of within-class spectral difference, and between-class spectral similarities would reduce the separability among different land cover/land class types. Because of this problem, object-based classification techniques have been proposed to replace pixel-based classification as is described in (Blascke, 2010). It has been pointed out in the literature that pixel-based analysis and classification is acceptable only if the spatial resolution of the imagery is coarse (Hay et al., 2001). The pixel-based approach is reasonable if the objects of interest to be classified are smaller than the spatial resolution. In imagery where the spatial resolution is finer than the objects to be classified, important spatial patterns like texture emerge and is not addressed by the conventional pixel-based approach. Yet still, the structural parameters of the image like texture, shape, and context could only be understood by human interpreters. In OBIA, image-objects represent meaningful entities that are distinguishable in a high resolution image. This new paradigm in image analysis incorporates segmentation, which is the vital step before classification.

Segmentation
In object-based image analysis, segmentation is the partitioning of an array pixels on the basis of homogeneity. Segmentation splits the raster image into spatially continuous, disjoint and homogeneous regions called 'segments' or objects. This process may include a processing chain of segmentation steps to ultimately delineate the target or desired objects in the image (Blaschke et al. 2004). The resulting heterogeneity of an object or segment within itself is less than the heterogeneity with respect to its neighbors. Early development in image segmentation was made during the 1970s and 1980s (Haralick and Shapiro, 1985). Segmentation methods are commonly divided into three main approaches: (i) pixel-, (ii) edge and (iii) region-based segmentation methods. These approaches can be combined as was presented in (Baatz and Schäpe, 2000). Different types of segmentation algorithms can be used interchangeably in preliminary stages and in later stages in order to capture the target objects. Different segmentation algorithms are defined by different parameters and a good control of these parameters are essential for a good segmentation result. The end results of a segmentation process are image-objects that are left to be classified. OBIA software like eCognition allows the user to emulate the human mind's cognitive powers. The developers of this software devised a way to render knowledge in a semantic network. This relatively new software examines pixels not in isolation, but in context. eCognition allows the user to develop rule-based classification grounded on expert knowledge.

Classification
Object-based classification can be done through user-defined rule-sets. However, different classes of objects aren't separable by direct thresholding one feature at a time. Hence, samples from different classes of objects need to be classified using machine learning algorithms. Among the machine learning algorithms, Support Vector Machine has recently received a lot of attention and the number of works utilizing this technique has increased exponentially. Support Vector Machines can generalize well given a limited number of training samples. The most important characteristic is SVM's ability to generalize well from a limited amount and/or quality of training data. Compared to other methods like artificial neural networks, SVMs can yield comparable accuracy using a much smaller training sample size. This is due to the ''support vector'' concept that relies only on a few data points to define the hyperplane that best separates the classes (Mountrakis et.al, 2010). An added advantage is that there is no need for repeating classifier training using different random initializations or architectures. Furthermore, being non-parametric, SVMs do not assume a known statistical distribution of the data to be classified. This is very useful because the data acquired from remotely sensed imagery usually have unknown distributions. This allows SVMs to outperform techniques based on maximum likelihood classification because normality does not always give a correct assumption of the actual pixels distribution in each class (Su et al., 2009). The method is presented with a set of labelled data instances (the sample objects) and the SVM training algorithm finds a hyper-plane that separates the dataset into a discrete predefined number of classes that is consistent with the training samples (Vapnik, 1979). The term "hyper-plane" is used to refer to the decision boundary that minimizes misclassifications, obtained in the training step. Learning is the iterative process of finding a classifier with optimal decision boundary to separate the training patterns (Zhu and Blumberg, 2002). The implementation of a linear SVM assumes that the multispectral feature data are linearly separable in the feature space. In real data measurements, distributions of vectors of different classes overlap one another. This property of real data makes linear separability difficult. No hard linear decision boundaries can be found to sufficiently to classify patterns with high accuracy. Succeeding techniques and workarounds such as the soft margin method (Cortes and Vapnik, 1995) and the kernel trick are used to solve the inseparability problem by introducing additional variables (called slack variables) in SVM optimization and mapping the nonlinear correlations into a higher dimensional space. The one-against-one formulation of the SVM constructs k(k-1)/2 classifiers (k is the total number of classes) where each one is trained on data from two classes. For training data from the ith and the jth classes, we solve the following binary classification problem: ≥ 0 Minimizing 1 2 w T w means that we would like to maximize 2 ‖ ‖ , the margin between each groups of data. When data are not linear separable, there is a penalty term ∑ =1 which can reduce the number of training errors. The basic concept behind SVM is to search for a balance between the regularization term and the training errors (Chih-Jen Lin, 2001). After the classifiers have been constructed, an instance would be classified based on its sign with respect to the hyper-plane. For example if sign((w ij ) T xi + b ij ) >0 then x i is in the ith class. Choice of the parameter value (usually denoted by C), which controls the trade-off between maximizing the margin and minimizing the training error, is also an important consideration in SVM application. There exist no established heuristics for selection of these SVM parameters which frequently leads to a trial-and-error approach. However, an optimization procedure can be done to search for the regularization parameter C that gives the highest accuracy for classification. The equations of the learned hyper-planes can serve as "Super-Features" which can then be used to build a rule-based classifier in eCognition 2. METHODOLOGY

Study Area
Our study area is the province of Agusan del Norte having LiDAR datasets where for the purpose of this discussion contain the different classes that we aim to classify. The study area is shown in Figure 2.

Overall Workflow
The overall workflow for the object-based image analysis is shown in Figure 3. LiDAR derivatives and the Orthophotos are first segmented in eCognition. Due to the memory contraints of the very large 0.5m LiDAR/Orthophoto dataset (1,015 sq. km LiDAR footprint) segmentation is done in different blocks/scenes in eCognition. Samples from each class from different eCognition scenes are then taken and are subject to SVM optimization in Matlab. The advantage of doing this is that an optimized SVM model is developed that works with good accuracy for all the blocks/scenes. By plotting the samples in different 3D configurations in the feature space, the best features that separate the different classes are then used for the supervised SVM optimization. Specifically, the optimization procedure is a search for the best regularization parameter C in the linear SVM. The hyper-plane equations learned in the optimization are then used as rule-sets back in eCognition that work for all the blocks/scenes with high accuracy.  These layers are loaded into eCognition for pre-segmentation and pre-classification. The first segmentation performed is a quadtree segmentation with a scale parameter of 2.0 and weighted only based on the LiDAR nDSM layer (no weights are placed for the other layers). After the quadtree segmentation, a spectral-difference segmentation with a maximum spectral difference of 2.0 is then run on the current image-object level. A pre-classification is then made by assigning all objects with an nDSM value greater than 2.0 meters to the class HE (High Elevation Objects/Tall Group) using the assign class algorithm. Unclassified objects in the image-object level with an nDSM value that is less than or equal to 2.0 meters and greater than or equal to 0.25 meters are classified as class ME (Medium Elevation Objects/ Medium Group). All the other remaining unclassified objects (< 0.25 meters) are then assigned to the LE (Low elevation Objects/ Groundlevel Group) class. The purpose of pre-classifying the segmented objects into the three major classes is to develop separate SVMs for each of the HE, ME,  Figure 5.

High Elevation (Tall) Group:
After the initial preclassifications, the next step is re-segmentation to capture the target subclasses and to select subclass samples from each of the super-classes. Samples were collected for building an optimized support vector machine. In the HE super-class, the current subclasses are the following; Buildings, Coconut, Mango, and Other Tall Trees. Contained within the "Other Tall Trees" class are other tall tree species and other crops taller than 2.0 meters like banana, rubber and other tall species found in forest lands. Methods to classify banana and rubber are still being developed by the team. For now, these classes are still kept as "Other Tall Trees" class. The HE class objects are re-segmented using the multi-resolution segmentation algorithm in eCognition with a scale parameter of 17, shape of 0.3, and compactness of 0.5. The image layer weight is placed only on the nDSM layer. A sample end result of this segmentation setting is shown in Figure 6 and 7.

Medium Elevation Group:
After pre-classifying objects with a mean height greater than 2.0 m to the super-class HE (high elevation/tall objects), unclassified objects are resegmented with a scale parameter of 50, shape of 0.2 and compactness of 0.5. This is the segmentation setting for both the ME and LE superclass. For this segmentation, image layer weights are placed only on the RGB layers. Unclassified objects with a mean height in the range [0.25, 2] meters are then preclassified as ME (medium elevation superclass). In the ME super-class, the current level-3 subclasses are the classes Corn, and Shrub. Contained within the "Shrub" class are other vegetation species and crops that fall in the height range of [0.25, 2] meters. Methods to classify other crops that fall in this height range are still being developed by the team. A sample end result in the segmentation of the ME class is shown in Figure 8 and 9.

Low Elevation Group:
The segmentation settings for the subclasses under the LE group are the same as the settings for the previously described ME class. In the LE superclass, the subclasses are the classes Grassland, Rice field, Fallow (uncultivated), Road, and Shadow. However, the shadow subclass is not an actual land-class and is reclassified contextually using the relative neighbor feature. In the development of the SVM for the LE superclass we include the class shadow because objects that fall in this class have a distinct property in the feature space. A sample end result in the segmentation of the LE class is shown in Figure 10.

High Elevation (Tall) Group:
Samples from the four different subclasses of the HE group are collected and the distributions of the samples in different configurations of the 3D feature-space were inspected to find the best features. These features are namely; Roundness, Compactness, Area, Height, Height standard deviation, Asymmetry. The features used for the subclasses of the HE group are structural features primarily based on the LiDAR derivatives. Detailed derivations and mathematical formulation of these features are described in the eCognition reference book. Shown in the following figures are the 3D plots of the samples in the best feature-space configurations that separate each class. Figure 11. Roundness, Compactness and Height Std.dev Figure 12. Height Std. Dev, Roundness, and Assymetry

Medium Elevation Group:
Much of the features used for SVM classification for the ME as well as the LE super-class are based from color science and color image processing concepts. LiDAR intensity was used as well. Findings of this study identified nine (9) features including LiDAR intensity for classifying the ME and LE subclasses. Eight of the nine features come from color science concepts. To understand color measurement and color management, it is necessary to consider human color vision. There are three things that affect the way a color is perceived by humans. There are characteristics of the illumination and the object. Also, there is the interpretation of this information in the eye/brain system of the observer. CIE metrics incorporate these three quantities, correlating them well with human perception. The additional features used for the ME and LE are: Red Ratio (CIE xy Chromaticity), Green Ratio (CIE xy Chromaticity), Blue Ratio (CIE xy Chromaticity), First Coordinate (1-Dimensional Scalar Constancy), Second Coordinate (1-Dimensional Scalar Constancy), RGB Intensity (HSI), a* (CIE Lab), b* (CIE Lab). Shown in Figure 13 and 14 are some of the 3D plots of the samples in the best feature-space configurations that separate the classes corn and shrub. Figure 13. 1st coordinate, 2nd coordinate, and a* Figure 14. a*, b*, and LiDAR Intensity

Low Elevation Group:
The features used for the LE superclass are the same features used in developing the model for the ME superclass. Shown in Figure 15 and 16 are the 3D plots of the samples in the best feature-space configurations that separate the different subclasses in the LE superclass. In the plots, the class "uncultivated" corresponds to the class fallow. Figure 15. a*,b*, and 2nd Coordinate

SVM Optimization
Using the scaled features for the different classes, an optimization procedure for the SVM learning is then carried out in Matlab. The optimization is a search for the best regularization parameter C that gives the highest cross validation accuracy. Three-fold cross validation is performed for each 1v1 SVM implementation. The LIBSVM package was interfaced in Matlab to extract the equations of the hyper-planes of the best SVM model found. A linear kernel SVM was used because an RBF kernel could not be reconstructed back to eCognition. A sample color plot of the search for the optimal value for the regularization parameter C is shown in the Figure  16 below. Figure 16. Sample color plot in the optimal C search

Classification in eCognition
Using the equations of the hyper-planes that separate the different classes in the six-dimension feature space, a rule-set is then built in eCognition. It should be noted that the equation of each hyper-plane is of the form w T x +b = 0, where w is the normal vector to the hyper-plane and x is the feature vector. For a 4-class problem (HE Class) with a one-against-one implementation in SVM, we have 6 hyper-planes namely (w 12 ) T x +b 12 = 0, (w 13 ) T x +b 13 = 0, (w 14 ) T x +b 14 = 0, (w 23 ) T x +b 23 = 0, (w 24 ) T x +b 24 = 0, (w 34 ) T x +b 34 = 0. We have Class 1 = Buildings; Class 2 = Coconut; Class 3 = Mango; Class 4 = Other Tall Vegetation. An object (a feature vector x) would be classified according to its sign with respect to the hyper-planes it is involved with. For example, for an object to be classified as a Building (Class 1) it should satisfy: (w 12 ) T x +b 12 > 0 and (w 13 ) T x +b 13 > 0 and (w 14 ) T x +b 14 > 0. The learned hyper-planes serve as "Super-Features" which is then used to build a thresholding rule set in eCognition for the classification of the scene.  Figure 18. Snapshot of the Class Description for the classifier algorithm in eCognition using the hyperplane equations as thresholding superfeatures.

Accuracy of the Developed SVM Models
The developed SVM rule-set showed a high overall accuracy when applied to different scenes/areas. The following tables show the error matrix and the accuracy of one block/scene in Agusan del Norte. Table 2. Accuracy of the HE subclasses in a sample block/scene Table 3. Accuracy of the ME subclasses in a sample block/scene Table 4. Accuracy of the LE subclasses in a sample block/scene

Sample Classification Results
The resulting classification maps can be accessed through the web as a Web Map (lidar2.carsu.edu.ph:9080)

CONCLUSION
We have demonstrated an object-based classification of highvalue crops using an optimized SVM model using LiDAR data and Orthophoto. There are three major parts of the presented workflow. The first part involves the segmentation algorithms for feature extraction. The second part is the SVM optimization through cross validation in order to extract the equations of the hyper-planes that best separate each class. Finally, the learned hyper-planes are used to create the rule-sets for classification in eCognition. Segmentation and rule-based classification was done using eCognition. The optimization of the SVM as well as hyper-plane extraction was done using LIBSVM through Matlab. Overall accuracies greater than 90% were achieved with the optimized model. An optimized model can yield better accuracy: however, like all optimization procedures, it may just cause the model to over-fit. To check if the model is over-fitted to the scene where it has learned the parameters, the model should be tested against different scenes. We have shown that even when the optimized SVM model was tested against different scenes within Agusan del Norte, the obtained overall accuracies were greater than 90%.