ESTIMATION OF OPTIMAL PARAMETER FOR RANGE NORMALIZATION OF MULTISPECTRAL AIRBORNE LIDAR INTENSITY DATA

Range normalization is a common data pre-process that aims to improve the radiometric quality of airborne LiDAR data. This radiometric treatment considers the rate of energy attenuation sustained by the laser pulse as it travels through a medium back and forth from the LiDAR system to the surveyed object. As a result, the range normalized intensity is proportional to the range to the power of a factor a. Existing literature recommended different a values on different land cover types, which are commonly adopted in forestry studies. Nevertheless, there is a lack of study evaluating the range normalization on multispectral airborne LiDAR intensity data. In this paper, we propose an overlap-driven approach that is able to estimate the optimal a value by pairing up the closest data points of two overlapping LiDAR data strips, and subsequently estimating the range normalization parameter a based on a least-squares adjustment. We implemented the proposed method on a set of multispectral airborne LiDAR data collected by a Optech Titan, and assessed the coefficient of variation of four land cover types before and after applying the proposed range normalization. The results showed that the proposed method was able to estimate the optimal a value, yielding the lowest cv, as verified by a cross validation approach. Nevertheless, the estimated a value is never identical for the four land cover classes and the three laser wavelengths. Therefore, it is not recommended to label a specific a value for the range normalization of airborne LiDAR intensity data within a specific land cover type. Instead, the range normalization parameter is deemed to be data-driven and should be estimated for each LiDAR dataset and study area.


INTRODUCTION
Fine-scale land cover monitoring has taken a great leap forward with the advent of airborne light detection and ranging (LiDAR) technology. Airborne LiDAR is capable of probing the Earth's surface with a number of partially overlapping flight lines, resulting in a set of high-density 3D point clouds. The laser scanning method is deemed to be a favorable alternative than the traditional optical imaging system with respect to the data accuracy and 3D capability (Yan et al., 2015). Nevertheless, conventional monochromatic LiDAR system still has its limitation, since only a single record of reflectance measurement (i.e. intensity, I) can be obtained. Although data fusion with on-board camera or ancillary imaging sensor can be adopted to compensate the lack of spectral measurements, a multispectral LiDAR system, which is able to collect the laser intensity with multiple wavelengths, is still desired. Recently, a number of experimental multispectral and hyperspectral LiDAR systems have been invented for laboratory testings and short-range measurements (Hakala et al., 2012;Wei et al., 2012). The first commercial topographic multispectral airborne LiDAR system has been rolled out in late 2014. Teledyne Optech announced the world's first tri-laser wavelength system, Optech Titan, which equips with three channels: green, near-infrared (NIR) and infrared (IR). RIEGL also launched a dual-laser wavelength system, RIEGL VQ-1560i-DW, which offers measurements of green and IR lasers. All these developments further improve the capability of LiDAR in different applications, such as land cover mapping (Morsy et al., 2017), tree species classification (Yu et al., 2017), forest stand characteriza- * Corresponding author tion (Dalponte et al., 2018), and computation of spectral vegetation index (Okhrimenko and Hopkinson, 2019).
Similar to microwave remote sensing, airborne LiDAR data should undergo certain radiometric pre-processing in order to maximize its full potential for the aforementioned analyses. The backscattered laser energy, though somehow correlates to the spectral reflectance, suffers a certain degree of attenuation due to the environmental and system-induced distortions. Therefore, different radiometric calibration, correction and normalization techniques have been developed to improve the LiDAR intensity homogeneity (Kashani et al., 2015). These techniques mainly consider the effects of range, incidence angle and atmospheric attenuation, all of which can be corrected with reference to a physical or empirical model (Höfle and Pfeifer, 2007). As a simplified version of radiometric correction, range normalization has been commonly used, particularly in forestry studies (Hopkinson, 2007). Range normalization simply takes into account of the effect of range without considering other parameters. This thus leads to the normalized intensity (Ic) being proportional to the range (R) to the power of a factor a, i.e.
where Rr can be referred to a reference range, such as the minimum range value found in the LiDAR dataset. The correction parameter a represents the rate of energy attenuation sustained by the laser pulse as it travels through a medium back and forth from the LiDAR system to the surveyed object. Although the parameter a is recommended to set as two according to the radar (range) equation (Jelalian, 1992), such a setting may not always yield to the best intensity homogeneity. Therefore, research efforts can be found in searching for an optimal a value for the range normalization on monochromatic airborne LiDAR data.
By examining a range of a value, Korpela et al. (2010) concluded that the range-normalized intensity variation, as measured by coefficient of variation, was minimum when a was set as two for target objects with well-defined large surfaces, three for linear features such as electric wires, and four for point-like target objects. Regarding natural features, the optimal a value ranging from 2.3 to 2.5 deems to be ideal for grass cover or barely vegetation (Korpela, 2008). The author explained the intensity variation found on different vegetation objects, such as trees and shrubs, which can be attributed to the differences in leaf size, shape, orientation, and density of foliage and branches. After the implementation of range and automatic gain control (AGC) normalization, the correction parameter a with a value of two was appropriate for vegetation in Scandinavia, and the accuracy of tree species classification results could be improved up to near 90% for pine, spruce, and birch. These studies suggested that range normalization is an effective process in vegetation analysis if the intensity value is used. Similar findings can be found in Gatziolis (2011), where setting the a value close to two can lead to a reduction of coefficient of variation by 55.3% in forest canopies.
Despite these attempts, most of the existing studies focus on the monochromatic airborne LiDAR system, and there is a lack of studies examining the range normalization method on multispectral airborne LiDAR data. In addition, it is impractical to evaluate a range of a values all the time for different LiDAR datasets. Therefore, this paper aims to propose an automatic approach to estimate the optimal a value based on the use of overlapping data strips. This method is implemented and compared to the results derived by range normalization on a multispectral airborne Li-DAR dataset collected by Optech Titan. Thus, the coefficient of variation found on different land cover classes with respect to different parameter a and laser wavelengths can be used as an indicator to measure the improvement of intensity homogeneity.

ESTIMATION OF RANGE-NORMALIZATION PARAMETER
Assuming a LiDAR dataset (L) is composed of two partially overlapped data strips, L1 and L2 (see Fig. 1). The process begins by pairing up n number of closest points of the two LiDAR data strips based on a kd-tree search. Since a pair of closest Li-DAR points are assumed to be located on the same surface/object, therefore, the normalized intensity value (Ic) should be identical. As a result, the Eq. 1 can be reformulated as: The above Eq. 2 leads to the following linearized form after taking a logarithm on both sides of the equation: Thus, the parameter a can be estimated through using a leastsquares adjustment: Once the parameter a is estimated, it can be substituted back to Eq. 1 in order to compute the normalized intensity Ic for L1 and L2 . Figure 1. Overlapping LiDAR data strips.

Multispectral LiDAR Data
The range normalization method was tested on a multispectral airborne LiDAR dataset collected by Optech Titan, which has three laser channels: channel 1 (1550 nm), channel 2 (1064 nm) and channel 3 (532 nm). The collected LiDAR dataset covers the Petawawa Research Forest, Ontario, Canada. The scanning configuration of LiDAR survey can be summarized as follows: pulse repetition frequency = 375kHz, scan frequency = 40 Hz, scan angle = ±20 • , and flying height ≈ 1, 100 m. As a result, the mean point density of channel 1 to 3 is ,respectively, 11.9 points/m 2 , 12.4 points/m 2 , and 4.8 points/m 2 , yielding to an approximate 0.5 m of mean point spacing. Although a total of 33 LiDAR data strips were intentionally collected to study forest attribute modelling (van Ewijk et al., 2019) and tree species classification (Rana et al., 2018), we selected two pairs of LiDAR data strips, with an approximate 55% of overlapping in each, to study the intensity variation before and after implementing the proposed range normalization.

Measurement of Intensity Homogeneity
To calculate the optimal parameter a value, four land cover types found within the overlapping region of LiDAR data strips were selected to evaluate the range normalization. Samples of data points of tree top, rooftop, grass cover and road were selected and extracted. These four types of land cover have different values of surface roughness, shape, size, and spectral reflectance. The intensity homogeneity can be measured by computing the coefficient of variation (cv) on the multispectral LiDAR data before and after implementing the range normalization. The cv can be computed as: where σ refers to the standard deviation of the intensity value of the selected LiDAR sample data points of a land cover class (ci), and µ is the respective mean intensity value. Computation of cv was implemented on the four land cover types with respect to the three laser channels. A reduction of cv after implementing the range normalization implies an improvement of intensity homogeneity.

Cross Validation of Optimal Parameter
Although cv was used to measure the effect of range normalization on the intensity homogeneity, the cross validation approach adopted by (Korpela et al., 2010) was implemented in order to compute and obtain an optimal value of parameter a for comparison. A pre-defined set of a value from 0.1 to 6.0 with a 0.1interval was examined in Eq. 1 for the four land cover classes. The optimal parameter a was achieved by looking for the lowest value of cv of the corrected assembly of data strips. The multispectral airborne LiDAR data with three laser wavelengths probing the four land cover classes -tree top, roof top, grass cover and road also underwent the range normalization using the cross validation approach.

Impact of Range Normalization on Single Data Strip
The results of range normalization on single LiDAR data strip are shown in Fig. 3. The 12 sub-figures present the cv against the parameter a on the range-normalized intensity data. In each of the subfigures, the horizontal line represents the cv of the original LiDAR intensity data. For the data returns of treetop, there is no improvement of cv when applying the range normalization in channel 1, where the cv increases together with the parameter a. In channel 2, the cv slightly rises above the cv of original intensity when the value of a is less than 0.5. The cv gradually drops when the a value increases. The cv still decreases even a reaches to 6.0. As a result, the cv is reduced by 4.22% at the largest a inputted. Ironically, the cv against a curve fluctuates in channel 3. The cv remains at original when a is below 0.8. It touches the lowest cv when a is set to 4.3, resulting in a reduction of 3.38%.
Considering the data returns of rooftop, the cv against a curve follows a trend similar to a U-shape in channel 1. The cv reaches to an optimal a value of 4.1, leading to a reduction by 5.58%. In channel 2, the cv behaves similar to that of tree top. The cv continuously drops when a increases from 0.0 to 6.0. The cv is reduced by 9.56% when a is set to 5.8. In channel 3, the cv remains at the original value when a ranges from 0.1 to 0.8. The value of cv subsequently rises above the cv value of the original intensity when a ranges from 0.9 to 1.3. Afterwards, the cv drops at a = 1.4 and onwards. Finally, the cv fluctuates when a is set larger than 3.0 and reaches the bottom value when a is equal to 4.6, leading to a 9.22% of cv reduction.
For data returns of grass cover, the cv slightly increases when a ranges from 0.6 to 0.9 in channel 1. The cv reduces when a is greater than 1.2 and reaches the first local minimum at a = 1.8. It bounds up and drops to the second local minimum when a is equal to 5.3, which results in a reduction of 2.37% in cv. In channel 2, it slightly rises above the cv of original intensity when a is set between 1.0 and 2.3. It forms a U-shape afterwards and reaches to the lowest value of cv at a = 4.0, resulting in a reduction of 2.19%. Finally, the cv stays at the original value when a ranges from 0.1 to 4.6 in channel 3. The cv increases at a negligible value when the a value is between 4.7 and 4.8. It falls to a minimum when a is 5.9 with a total of 2.13% of reduction. For the data samples extracted for road, the cv maintains at its original value when a ranges from 0.1 to 0.4 in channel 1. It meets the first local minimum of U-shape at a = 1.6 and the second local minimum at a = 4.6 with a 9.57% reduction in cv value. In channel 2, the cv remains unchanged till a = 0.9. It rises over the original cv value and bounds back in between 0.9 and 1.8. The cv value drops and follows a U-shape afterwards. Finally, the cv reaches to the lowest point when a = 3.7, having a 4.5% of reduction. In channel 3, the cv remains steady at the original value when a ranges from 0.1 to 2.1. It rises at a minor value at a = 2.2 and 2.3, and the cv drops against the value of a to a bottom at a = 4.6, resulting in a 1% reduction.
Among the 12 sub-figures covering channel 1 to channel 3 for the four different types of land cover, the results of range normalization are diverse and inconsistent. Regarding the percentage of reduction in cv, only three segments reach near 9% (i.e. channels 2 and 3 of on rooftop, and channel 1 of road) and the reduction of cv of all the other varies from 1% to 5.58%. Concerning the trend between cv and a, only the results of grass and road in channel 2 form a curve, which follows a U-shape pattern with an obvious local minimum. Also, the cv remains steady at certain value of a in channel 3 across the four different types of land cover. It is interesting that the resulting cv of tree top in channel 1 goes upwards above the original cv when the a value increases. Therefore, the range normalization model has no effect on this set of data return from tree top in individual LiDAR data strip.

Impact of Range Normalization on Overlapping Data Strips
Referring to Fig. 4, the cv against the parameter a of data returns of different types of land cover within the overlapping region of data strips are presented in the respective 12 sub-figures. In general, the majority of the the results follow a trend of quadratic equation. The lowest cv corresponding to the value of a can be snapped from the U-curve. For the data returns of tree top in channel 1, the cv is below the cv of the original intensity data when a ranges from 0.1 to 2.8. It reaches to the lowest value at a = 1.3 with a 0.49% of reduction in cv. The curve of channel 2 lays below the original cv all the time. At a = 3.0, the cv comes to the lowest with 1.9% of reduction. In channel 3, the cv goes down from a = 0.5 and reaches to the bottom at a = 0.6, resulting in a reduction by 1.88%. The cv rises above the cv of original intensity data, when a is 0.9 and onwards.
Regarding the data returns of roof top, the cv reaches the bottom at a = 2.3 with a total reduction of 2.71% in channel 1. In channel 2, the bottom is at a = 5.9 and cv is reduced by 4.45%. In channel 3, the lowest point of curve is at a = 4.4 with a reduction of 27.54%. For the data returns of grass cover, the cv has a more significant reduction of cv. It is reduced mostly by 23.84% at a = 3.1 in channel 1. In channel 2, the cv reaches to its lowest at a = 2.3 with a total decrease of 11.28%, and the cv increases above the original value of cv when a is greater than 4.7. In channel 3, the cv is significantly reduced by 42.90% at a = 3.2. Finally, the data return of road feature also has a notable improvement of intensity homogeneity in channel 1, where the lowest cv is recorded when a is set to be 2.9. In channel 2, the cv reaches to its lowest at a = 2.3, resulting to a 9.12% reduction, and subsequently, the cv rises above the original value at a = 4.7 and onwards. In channel 3, the cv is at the bottom at a = 3.4 with a total of 31.79% of reduction.
After applying the range normalization on the overlapping Li-DAR data strips, the reduction of cv is significant (up to 42.9%), particularly on the data returns of rooftop in channel 3, grass cover in both channels 1 and 3, and road in both channels 1 and 3. The range normalization does not seem to be effective on the returns of tree top, where the reduction of cv is found only ranging from 0.49% to 1.90%. Among the rooftop, grass and road, the effect of range normalization in channel 2 are less comparable (4.45% to 11.28% reduction in cv) to that of channels 1 and 3. In addition, the reduction of cv found on the roof top of channel 1 is low (i.e., 2.71%).

Estimation of Optimal Parameter based on Overlapping LiDAR Data Strips
As reported in section 2, the optimal parameter of range normalization can be estimated based on the overlapping LiDAR data strips. Table 1 summarizes the estimated a value together with the corresponding cv determined by the proposed method and the cross validation method (Fig. 4). Among the four land cover classes, the estimated a value of road feature found in the three channels has a difference of 0.1 between the proposed method and the cross validation. The computed cv is identical between the two methods in the three laser channels.
In the data samples of grass cover, the estimated a value of channels 2 and 3 are close. The proposed method estimates the optimal a as 2.255 and 3.116, while the cross validation reveals the  lowest cv being found when a is 2.3 and 3.2 in the two respective channels. Although the a value of channel 1 does not seem close (i.e. proposed method = 3.482 and cross validation = 3.1), the determined cv by both methods are almost identical. Similar achievement can be found in the rooftop. The estimated a value by the proposed method is close to the cross validation method with a difference of 0.1 in channels 1 and 3. In channel 2, the a value estimated by the proposed method is 4.866, and the cross validation reveals the optimal a would be 5.9. Despite a significant difference in the a value, the cv determined by both methods is almost the same. Finally, the range-normalized LiDAR intensity data of channels 1 and 2 found on tree top behaves similar as what has been reported in channel 2 of rooftop. An identical cv is found despite the difference of the determined a value. However, the proposed method being implemented on the tree top of channel 3 yields the worse result. The estimated cv has a difference of 0.012 between the two methods, where the proposed method estimates the optimal a as 1.822 but this value is found to be 0.6 as shown in Fig. 4(i).

Discussion
Based on the experiment, one can note that there is no consistent a value for the range normalization of multispectral airborne Li-DAR intensity data with respect to the land cover classes. Even within the same land cover class, the optimal a value is different for the three laser wavelengths. Therefore, it is mostly impractical to recommend a specific a value for the range normalization of LiDAR intensity data for a specific land cover type. In addition, it is surprised that the a value even goes beyond the value of four (e.g. rooftop data samples of channels 2 and 3), which has been set as the maximum value in the previous studies (Korpela, 2008;Korpela et al., 2010;Gatziolis, 2011). Regardless of the land cover type, the proposed method can consistently yield the best estimation of a value, which is able to generate the low-est cv. In future, other parameters, such as the incidence angle, atmospheric attenuation, etc. can also be considered in the normalization / correction model. The use of robust statistics can be incorporated in the parameter estimation so as to cater the appearance of outliers. Finally, a radiometric pre-process, i.e. LiDAR scan line correction (Yan and Shaker, 2018), can be implemented prior to the range normalization, since the multispectral airborne LiDAR data is observed with a notable banding effect (Fig. 2).

CONCLUSIONS
Range normalization is a common pre-process to improve the radiometric quality of airborne LiDAR intensity data, and this method was only examined on monochromatic LiDAR intensity data in the existing literature. In this study, an in-depth examination of the range normalization model was achieved with the multispectral airborne LiDAR data collected by Optech Titan. Based on our experimental testing, it is noted that the impact of range normalization is not significant, where no obvious function (i.e. quadratic function) can be observed when applying the normalization on a single LiDAR data strip. On the other hand, the range normalization can provide a significant reduction of cv (up to 42.9%) on the data samples collected in the overlapping Li-DAR data strips. Most of the results form a quadratic equation between the cv and the parameter a, and an optimal value of a is obtainable for the lowest cv resulted. Since it is mostly impractical to implement a cross validation approach to test different a values and look for the lowest cv, therefore, we propose to use an automatic overlap-driven approach to estimate the optimal a for the range normalization. Most of the findings of the four land cover classes yield close or identical to the results derived by the cross validation approach.
Based on the experiment, one can conclude that there is no consistent a value that should be specified for a certain land cover type. Also, the optimal a value is not the same for any specific land cover with respect to the three laser wavelengths. Therefore, it is recommended to estimate the optimal a value by pairing up the closest points in the overlapping LiDAR data strips and subsequently estimating the a value for the range normalization based on the least-squares adjustment. In future, the abovementioned process can be further enhanced by incorporating the use of robust regression, such as RANSAC or M-estimator. Also, the multispectral airborne LiDAR data should undergo the process of LiDAR scan line correction prior to the range normalization. With the intensity banding effect removed, the combined effect can yield the best improvement of intensity homogeneity, particularly on the multispectral airborne LiDAR intensity data.