AIRBORNE MULTISPECTRAL LIDAR DATA FOR LANDCOVER CLASSIFICATION AND LAND / WATER MAPPING USING DIFFERENT SPECTRAL INDEXES

Airborne Light Detection And Ranging (LiDAR) data is widely used in remote sensing applications, such as topographic and landwater mapping. Recently, airborne multispectral LiDAR sensors, which acquire data at different wavelengths, are available, thus allows recording a diversity of intensity values from different land features. In this study, three normalized difference feature indexes (NDFI), for vegetation, water, and built-up area mapping, were evaluated. The NDFIs namely, NDFIG-NIR, NDFIG-MIR, and NDFINIRMIR were calculated using data collected at three wavelengths; green: 532 nm, near-infrared (NIR): 1064 nm, and mid-infrared (MIR): 1550 nm by the world’s first airborne multispectral LiDAR sensor “Optech Titan”. The Jenks natural breaks optimization method was used to determine the threshold values for each NDFI, in order to cluster the 3D point data into two classes (water and land or vegetation and built-up area). Two sites at Scarborough, Ontario, Canada were tested to evaluate the performance of the NDFIs for land-water, vegetation, and built-up area mapping. The use of the three NDFIs succeeded to discriminate vegetation from built-up areas with an overall accuracy of 92.51%. Based on the classification results, it is suggested to use NDFIG-MIR and NDFINIRMIR for vegetation and built-up areas extraction, respectively. The clustering results show that the direct use of NDFIs for land-water mapping has low performance. Therefore, the clustered classes, based on the NDFIs, are constrained by the recorded number of returns from different wavelengths, thus the overall accuracy is improved to 96.98%. * Corresponding author


INTRODUCTION
The spectral vegetation, water, or built-up index is a single number derived for each pixel from an arithmetic operation on two spectral bands.An appropriate threshold of the index is then established to contrast the reflectance of one feature from other land cover features (e.g., vegetation from built-up area) based on the spectral characteristics.The selection of bands for any spectral index is based on the interaction of features with different wavelengths.Following are examples to existing indexes that have been used to extract different features using multi-spectral bands.
The Normalized Difference Vegetation Index (NDVI) was first proposed by (Rouse et al., 1974) as following: where ρ NIR and ρ red are the reflectance from the NIR and red bands, respectively.
The basic idea of the NDVI is that the chlorophyll in green plants absorbs red light portion of the spectrum, while NIR light portion is reflected or scattered.As a result, vegetation has low reflectance from red light and high reflectance from NIR light, and hence high NDVI values.The NDVI is used for vegetation extraction (Chen et al., 2009), tree canopy mapping (MacFaden et al., 2012), and objects classification into vegetation and builtup areas (Thanapura et al., 2007;Zhou et al., 2009;Hartfield et al., 2011).
The Normalized Difference Water Index (NDWI) was first defined by McFeeters (1996) as following: where ρ green is the reflectance from the green band.McFeeters (1996) has selected the green wavelength to maximize the reflectance from water bodies, and the NIR wavelength to minimize the low reflectance from water bodies as well as take the advantage of high reflectance from vegetation and soil features.Based on that, the NDWI has been calculated to discriminate water bodies from vegetation and soil features.Xu (2006) has used the shortwave-infrared (SWIR) instead of NIR to enhance McFeeters's NDWI in distinguishing water bodies from built-up features as well as vegetation and soil features.Xu's MNDWI is defined as following: where ρ SWIR is the reflectance from the SWIR band. (1) (2) (3) Gao (1996) has suggested a new NDWI that should be considered as an independent vegetation index for estimating water content of vegetation canopy.Although vegetation canopies have high reflectance from NIR and SWIR wavelengths between 900-2500 nm, liquid water absorption in SWIR region (1500-2500 nm) is significantly stronger than that in NIR region (900-1300 nm).Gao's NDWI is defined as: Also, the aforementioned equation (Eq.4) has been defined as normalized difference built-up index (NDBI) (Zha et al., 2003;Xu, 2007) and normalized difference soil index (NDSI) (Rogers and Kearney, 2004) to extract built-up areas and soil features, respectively.
Over the past years, remotely sensed satellite data have been employed to extract boundaries of water bodies, vegetation, and built-up areas.Few studies attempted to use NIR band from LiDAR data with red band from aerial images to derive NDVI for vegetation classification (trees and/or grass) in urban areas (Huang et al., 2008;Guan et al., 2013).Recently, the NDVI is calculated for vegetation applications using multispectral LiDAR data obtained from either laboratory-based multispectral LiDAR systems or terrestrial laser scanning (TLS).Woodhouse et al. (2011) used laboratory-based measurements from a tunable laser, operates at four wavelengths (531, 550, 660, and 780 nm).The NDVI and photochemical reflectance index (PRI) were then used to measure plant physiology.Shuo et al. (2015) used a laboratory-based prototype of multispectral LiDAR, designed by (Wei et al., 2012), to record intensity values at four wavelengths (556, 670, 700, and 780 nm).The physiology of the canopy was then studied using three vegetation indexes, and achieved classification accuracy above 90%.Puttonen et al. (2015) used a TLS from the Finnish Geodetic Institute (FGI), called Hyperspectral LiDAR system (HSL).The system was tested in an outdoor experiment using seven wavelength bands ranging from 500 to 980 nm to detect man-made targets from vegetation based on their spectral response.Four vegetation indexes were used in classification and the results achieved an overall accuracy up to 91.0%.
In 2014, Teledyne Optech has launched the world's first airborne multispectral LiDAR sensor "Optech Titan".The sensor operates simultaneously at three wavelengths and collects data in three channels: NIR (1064nm) in C2 at 0° nadir looking, MIR (1550nm) in C1 at 3.5° forward looking, and green (532nm) in C3 at 7° forward looking.Specifications of Optech Titan sensor are provided in Table 1.Wichmann et al., (2015) analysed a dataset, acquired by Optech Titan, covering the city of Oshawa.Conventional geometrical classification and mapping procedures were used to classify the terrain into four classes: ground, buildings, mid vegetation, and high vegetation.Spectral patterns for different classes were analysed and showed that the intensity values could be potentially used in classification.

Parameter Specification
Wavelength This paper aims to evaluate the use of different spectral indexes, as an automatic unsupervised classification method, derived from airborne multispectral LiDAR data, for land-water, vegetation, and built-up area mapping.In this study, three different indexes: 1) the NDFI G-NIR , 2) NDFI G-MIR , and 3) NDFI NIR-MIR are tested based on the three different wavelengths (green (532 nm), NIR (1064 nm), and MIR (1550 nm)) produced by Optech Titan.An automatic 3D point clustering based on spectral indexes is presented, and the recorded returns from different wavelengths are used with the NDFIs for landwater mapping.

STUDY AREAS AND DATASETS
An urban study area, located near Lake Ontario in Scarborough, Ontario, Canada, was used in this paper.The study area covers variety of land features, such as built-up areas (buildings, roads, and parking lots), high vegetation (trees) and low vegetation (shrubs and grass), as well as part of Lake Ontario.A flight mission, to acquire LiDAR data for three strips (L1, L2, and L3), was conducted on September 3 rd , 2014.Two subsets of LiDAR data were clipped from L1 (Site I), Figure 1, and L3 (Site II), Figure 2, with the dimension of 600m x 200m each, for the experimental testing.Site I includes land features and the water body so that the NDFIs could be evaluated for land/water mapping, while Site II includes different land features to evaluate the NDFIs for vegetation and built-up area discrimination.Since the number of points is not the same from each channel, the maximum number of points of the two input channels per each index was used in processing and displaying, in order to obtain highest point density.The points from green channel were used for NDFI G-NIR and NDFI G-MIR , while the points from NIR channel were used for NDFI NIR-MIR .The number of reference points for results validation, for each feature and total number, per each index are provided in

METHODOLOGY
The overall workflow of this study is shown in Figure 4.The intensity values from the green, NIR, and MIR wavelengths of multispectral LiDAR sensor were employed to form NDFI G-NIR , NDFI G-MIR , and NDFI NIR-MIR as following: where Green, NIR and MIR are the intensity values from the green, NIR and MIR wavelengths, respectively.Although, Optech Titan operates simultaneously at the three wavelengths, it acquires LiDAR points in the three channels at different angles.Consequently, points from different channels do not coincide at the same GPS time.Another method could be used to find corresponding points, called "3D spatial join", where a search of the nearest neighbours for each point within a sphere of defined radius, using geometric information (x, y, z), is conducted.This method may result in wrong match between points, shown in Figure 5.For example, case (1) indicates the perfect point matching from two channels, C2 and C3.In case (2), a point from C2 could be matched twice with two different points from C3, as this point is the nearest neighbour to both points.Case (3) shows two possible neighbour points from C2 which have the same distance to a point from C3, thus two points could be wrongly matched.Case (4) indicates that no (5) neighbour points from C2 to a point from C3 within a defined sphere.Therefore, for each channel, a grid was created with 1m cell size and the mean intensity value of each cell was calculated.The grid cell size is selected as 1m in order to guarantee two conditions for each cell; the first is to have sufficient number of points and the second is not to contain any points from different features.Then, the mathematical operation of NDFIs was conducted using grid cells.Based on the point's location, a NDFI value was assigned to each point using bilinear interpolation, where the adjacent cell centres were used in calculation.The whole process is shown in Figure 6.2009) adjusted manually a threshold value for NDVI to extract vegetation.The manual adjustment of the threshold could achieve more accurate results among different datasets or sites (Xu, 2006;Lacaux et al., 2007;Li et al., 2013).Therefore, a dynamic threshold adjustment method is required (Ji et al., 2009).
In this study, the Jenks natural breaks optimization method was used to determine the threshold values for discriminating water from land in Site I and vegetation from built-up areas in Site II (Jenks, 1967).This optimization is a data clustering method designed to minimize within-class deviations and thus leads to maximize the between-classes variance (σ 2 ), which can be obtained from: Now, assume that the NDFI points range from [a, ⋅⋅⋅, b], where −1 ≤ a < b ≤ 1.Using the Jenks natural breaks optimization, the (M) is first calculated.Then, the points are divided into two classes with ranges [a, ⋅⋅⋅, t] and [t, ⋅⋅⋅, b], where (t) is the threshold value.For each range, M 1 and M 2 are calculated.The optimal (t) is obtained by minimizing variation within classes, and intuitively maximizing the between-class variance of the two classes, using the following equation: For land-water mapping of Site I, it is noticed that the water body has high variation in intensity values, especially the intermediate part.A radiometric correction model, developed by (Yan et al., 2012), was applied.The model is based on radar equation and the correction is mainly for the range and the scan angle.Since the high variation of intensity values is at nearly the same range and very small scan angles, the results are insufficient.Therefore, a low pass filter was applied to the intensity data as a pre-processing step before NDFIs calculation, thus making water intensity values more consistent.However, the radiometric correction should be applied in case of processing of overlapped strips (Yan and Shaker, 2015).In addition, the intensity values of water points from the green channel are lower than intensity values from NIR and MIR.Moreover, for each channel, vegetation and water points have a close range of intensity values.As a result, parts of vegetation area and water body could have the same range of NDFIs, thus affecting the classification results of NDFIs for land-water mapping.Therefore, the clustered classes from NDFIs were constrained by the recorded number of returns from different features at the green, NIR, and MIR channels, shown in Table 3.

Feature
Green NIR/MIR Built-up areas single single Water single/double single Vegetation double or more double or more Table 3. Recorded returns from different feature in green, NIR, and MIR channels

RESULTS AND DISCUSSIONS
The histogram of each NDFI was first created and studied for both sites.NDFI histograms give an indication of how many classes could be distinguished from the dataset based on number of modes.After that, the Jenks natural breaks optimization method was used to determine the threshold values for each NDFI.The accuracy assessment was conducted using the labeled reference points from the Google image.The confusion matrix was created and the accuracy measures (overall, producer's, and user's accuracies), as well as Kappa statistic were calculated.

Site I Results
Site I is located at Lake Ontario and covers different land features, such as building, roads, parking lots, trees, grass, and shrubs, as well as water body.Therefore, Site I was tested to extract water body from land features.Figure 7    The intensity variation of the water surface was first reduced using a low pass filter.After that, the NDFI calculation was conducted, and a threshold value was applied.The cover type is vegetation and water when the NDFI G-NIR ≤ -0.51 or NDFI G-MIR ≤ -0.53; otherwise, it is built-up area.The output "vegetation and water" class was then constrained by recorded number of returns as shown in Table 3, to extract water points.The cover type is water if the number of returns of the examined point equals 1 or 2 from green channel and 1 from NIR/MIR channel when using NDFI G-NIR or NDFI G-MIR , respectively.The NDFI NIR-MIR is not applicable in this case because it is based on the NIR and MIR channels, while the green channel should be used to detect water points.Figure 9 shows the labelled points, as water, land and unclassified, based on number of returns added in with the NDFI G-NIR /NDFI G-MIR .The confusion matrix, overall accuracy, and kappa statistics for the results of the two NDFIs are provided in Table 4 and 5   Despite the high overall accuracy, some vegetation areas are misclassified as water points and vice versa.Some low vegetation points (e.g., grass) could not be filtered out.This is because that those points and water points have a close range of intensity values, especially after applying the low pass filter.Also, they have a single return from green, NIR, or MIR channels.Hence, these vegetation points were misclassified as water points.

Site II Results
Site II covers an urban area mixed with vegetation (includes: trees, grass, and shrubs) and built-up areas (includes: building, roads, and parking lots).Site II was used to discriminate vegetation from built-up areas.Figure 10

CONCLUSIONS AND FUTURE WORK
This paper discusses the use of airborne multispectral LiDAR data for land-water, vegetation, and built-up area mapping.Three NDFIs namely, NDFI G-NIR , NDFI G-MIR , and NDFI NIR-MIR were evaluated using intensity values from green, NIR, and MIR wavelengths of the new airborne sensor "Optech Titan".The Jenks natural breaks optimization was used to determine the threshold values of each NDFI, in order to avoid the manual adjustment of threshold values and cluster the 3D point data.
The results show low performance of using NDFIs only in landwater mapping.The low performance may be due to the variation of the transmitted signal power at different wavelength.Hence, the recorded intensity for an object (e.g., water) is high at NIR and MIR wavelengths compared to green wavelength.Consequently, water points are misclassified as vegetation points.The land-water mapping results were significantly improved by using two steps before and after NDFIs calculation.A low pass filter was applied to intensity data as a pre-processing step before NDFIs calculation to reduce the intensity variation in the water surface.The output "vegetation and water" class from clustering was constrained using recorded number of returns from different channels.On the other hand, the direct use of the three NDFIs succeeded to discriminate vegetation from built-up areas.Based on the results achieved, we suggest using the NDFI G-MIR and NDFI NIR-MIR for vegetation and built-up areas extraction, respectively, from airborne multispectral LiDAR data.The discrimination of roads from buildings and low from high vegetation will be further investigated by incorporating the LiDAR height data.Also, other factors will be introduced to distinguish water from land features.

Figure 1 .Figure 2 .
Figure 1.LiDAR point clouds from C2 for Site I (upper: height and lower: intensity)

Figure 3 .
Figure 3. Google image showing Site I and Site II

Figure 4 .
Figure 4.The overall workflowTo be able to calculate different NDFIs, each point from any channel should have a corresponding point from other channels.Although, Optech Titan operates simultaneously at the three wavelengths, it acquires LiDAR points in the three channels at different angles.Consequently, points from different channels do not coincide at the same GPS time.Another method could be used to find corresponding points, called "3D spatial join", where a search of the nearest neighbours for each point within a sphere of defined radius, using geometric information (x, y, z), is conducted.This method may result in wrong match between points, shown in Figure5.For example, case (1) indicates the perfect point matching from two channels, C2 and C3.In case (2), a point from C2 could be matched twice with two different points from C3, as this point is the nearest neighbour to both points.Case (3) shows two possible neighbour points from C2 which have the same distance to a point from C3, thus two points could be wrongly matched.Case (4) indicates that no LiDAR Data (X, Y, Z, I)

Figure 5 .
Figure 5. 3D spatial join between C2 and C3 pointsA threshold value is required to cluster LiDAR data based on NDFI values, which range from -1 to 1, into two different features.McFeeters (1996) andXu (2007) set zero as the threshold value for NDWI and NDBI to extract water bodies and built-up areas, respectively, while Chen et al. (2009) adjusted manually a threshold value for NDVI to extract vegetation.The manual adjustment of the threshold could achieve more accurate results among different datasets or sites(Xu, 2006;Lacaux et al., 2007;Li et al., 2013).Therefore, a dynamic threshold adjustment method is required(Ji et al.,  2009).

Figure
Figure 6.NDFIs calculation for 3D points shows the three histograms, created from the three NDFIs, for Site I. NDFI G-NIR histogram shows two modes in the histogram indicating two distinguishable classes, while NDFI G-MIR and NDFI NIR-MIR histograms show skewed distributions to left and right, respectively, indicating that two classes might be distinguished with difficulty.However, a threshold value was applied to cluster the dataset based on each NDFI into two classes.So that if the NDFI G-NIR ≤ -0.45, NDFI G-MIR ≤ -0.41, or NDFI NIR-MIR > 0.1, the cover type is water body; otherwise, it is land.

Figure 7 .
Figure 7. NDFI histograms of Site I

Figure 9 .
Figure 9. Labelled LiDAR points of Site I from number of returns with a) NDFI G-NIR / b) NDFI G-MIR shows the three histograms, created from the three NDFIs, for Site II.NDFI G-NIR and NDFI NIR-MIR histograms show two normal distributions, indicating two distinguishable classes, while NDFI G-MIR histogram shows a skewed distribution to left, indicating that two classes might be distinguished with difficulty.A threshold value was applied to cluster the dataset based on each NDFI into two classes.It is considered that the cover type is vegetation when the NDFI G-NIR ≤ -0.48, NDFI G-MIR ≤ -0.42, or NDFI NIR-MIR > 0.13; otherwise, it is built-up area.

Figure 10 .
Figure 10.NDFI histograms of Site II Figure 11 shows the labeled points, as vegetation (V), built-up areas (B) and unclassified (U), based on NDFI G-NIR and NDFI G- MIR , as well as NDFI NIR-MIR .The confusion matrix, overall accuracy, and kappa statistics for the results of the three NDFIs are provided in Table6, 7 and 8.The NDFI G-MIR achieved the highest overall accuracy of 92.51%.In particular, the same index, NDFI G-MIR , achieved the highest vegetation detection rate of 93.17%, while the NDFI NIR-MIR achieved the highest built-up areas detection rate of 93.79%.

Table 2 .
Table 2 for the two sites.Reference points for Site I and Site II

Table 4 .
Confusion matrix of labelled point based on NDFI G-NIR and number of returns in Site I Table 6, 7 and 8.The NDFI G-MIR achieved the highest overall accuracy of 92.51%.In particular, the same index, NDFI G-MIR , achieved the highest vegetation detection rate of 93.17%, while the NDFI NIR-MIR achieved the highest built-up areas detection rate of 93.79%.

Table 7 .
Confusion matrix of labelled point based on NDFI G-MIR in Site II