VEGETATION REMOVAL FROM UAV DERIVED DSMS , USING COMBINATION OF RGB AND NIR IMAGERY

Current advancements on photogrammetric software along with affordability and wide spreading of Unmanned Aerial Vehicles (UAV), allow for rapid, timely and accurate 3D modelling and mapping of small to medium sized areas. Although the importance and applications of large format aerial overlaps cameras and photographs in Digital Surface Model (DSM) production and LIDAR data is well documented in literature, this is not the case for UAV photography. Additionally, the main disadvantage of photogrammetry is the inability to map the dead ground (terrain), when we deal with areas that include vegetation. This paper assesses the use of nearinfrared imagery captured by small UAV platforms to automatically remove vegetation from Digital Surface Models (DSMs) and obtain a Digital Terrain Model (DTM). Two areas were tested, based on the availability of ground reference points, both under trees and among vegetation, as well as on terrain. In addition, RGB and near-infrared UAV photography was captured and processed using Structure from Motion (SfM) and Multi View Stereo (MVS) algorithms to generate DSMs and corresponding colour and NIR orthoimages with 0.2m and 0.25m as pixel size respectively for the two test sites. Moreover, orthophotos were used to eliminate the vegetation from the DSMs using NDVI index, thresholding and masking. Following that, different interpolation algorithms, according to the test sites, were applied to fill in the gaps and created DTMs. Finally, a statistic analysis was made using reference terrain points captured on field, both on dead ground and under vegetation to evaluate the accuracy of the whole process and assess the overall accuracy of the derived DTMs in contrast with the DSMs.


INTRODUCTION
The advent of Unmanned Aerial Vehicle (UAV) mapping is based on UAVs and their autonomous flight ability, combined with the computer vision algorithms.Plethora of applications and potential users have re-discovering photogrammetry.A new photogrammetry has risen, one full of automation, no need for calibrated cameras and no need for 3D stereoscopic vision.Numerous implementations of UAV are available in the market, although without standards, while the sector is still under development.Traditional photogrammetric packages are being replaced by Structure from Motion (SfM) and Multi View Stereo (MVS) implementations, Agisoft's Photoscan being the most popular and user friendly, while maintaining most of the photogrammetric advantages.
As UAVs are "are to be understood as uninhabited and reusable aerial vehicles" (van Blyenburgh, 1999), kites (Currier, 2015), balloons (Snow et al., 1989) and radio-controlled aircrafts (Theodoridou et al, 2000), have been used for a long time in mapping using photogrammetric practice.Therefore, the use of such platforms in photogrammetric mapping applications has been around for a long time.The main problems that limited the expansion of UAVs, were: • Limited range, hence small areas could be covered.Larger areas needed many days to be covered • Experienced operator needed • Heavy calibrated cameras, necessary for photogrammetric processing, further limited usability and time of flight • They do not benefit from the human sensibility and intelligence, thus, cannot react properly in unpredictable situations Since 2011, the photogrammetric industry has changed drastically based on two major developments; autonomous drones and implementation of computer vision algorithms into commercial photogrammetric software.The rapid emerge of mapping UAVs of any kind, fixed wing or multirotor, is the current trend.Many manufactures are presenting their commercial solutions and the cost is similar or less than a pair of RTK GPS, with a reducing cost trend.At the same time, the automation of photogrammetric processes using state of the art computer vision algorithms allows the creation of Digital Surface Models with better density than LIDAR data.Given the improvements in functionality, higher resolution and data density, software vendors are trying to provide alternatives for land surveying using drone data, by bringing a 3D copy of the countryside in the office for 'on screen' selection of surveying points.This highlights the emerging trend for drone mapping to replacing land surveying sooner or later.The next step is the use of drones for the design of such large-scale construction projects, such as dams, roads, bridges, airports, open mines, etc.
Given that the accuracy of drone mapping is already, or will soon be sufficient for such detailed tasks, the main disadvantage of photogrammetry is the tree canopy and consequent inability to map terrain.In the era of stereo-photogrammetry and vector plots, the traditional plots created by stereo restitution, were corrected and complemented by land surveys.Detailed and accurate terrain mapping and Digital Terrain Model creation is necessary in every construction project.Therefore, in case of large construction planning, land surveying or LIDAR flights are commissioned to map the terrain (ground).Depending on the area size, the former is usually cheaper but the resulting DSM cruder, while the latter isn't efficient.Drone mapping and computer vision photogrammetry are a valid alternative, to both techniques as it combines better DSM density from LIDAR in a costeffective solution, if only the vegetation problem could be overcome.
This study introduces and tests a methodology to create detailed DTM from drone aerial imagery by automatically filtering the vegetation on the DSM, without the need for land measurements other than standard control point acquisition.

RELATED WORK
The main problem of photogrammetrically derived DSM is that they include canopy and buildings.It is impossible to extract DTM from photos as the ground is not always visible from air.This is not the case of LIDAR data, where the LIDAR penetration ensures that there is going to be some return from the ground.Recent publications comparing LIDAR and multi-image matching techniques highlight that the main advantage of LIDAR is the ground information (Skarlatos and Vlachos, 2015;Szabó et al. 2016).
Nevertheless, additional manual work is necessary to produce DTM from LIDAR data.LIDAR points are coarser than image pixels.Moreover, manual work is needed to manually remove man made structures or adding break lines to enhance the result.Therefore, if the vegetation canopy was not a problem in photogrammetric point clouds, UAV photogrammetry would have been a better option to Lidar, for the DSM creation.
More than a decade ago high resolution colour infrared (CIR) orthoimages and DSM were used, to extract buildings and trees from urban environments.In this case the trees were identified and extracted based on the NDVI values across the scenery (Straub et al, 2001).
Engaging with the topic of the extraction of man-made structures, four different methods were used (a) DSM/DTM comparison in combination with NDVI analysis, (b) supervised multispectral classification refined with height information from LIDAR data, (c) the use of voids in LIDAR DTM and NDVI classification, (d) use of raw LIDAR DSM data (Demir and Baltsavias, 2010).Similar to the current paper, vegetation index, normally calculated from the red and near infrared band of an RGB and IR orthophotos respectively was used in order to eliminate the remaining vegetation from a high object mask derived from an nDSM (Grigillo and Fras, 2011).
Another application referred to the use of NGATE software module which enables the automatic creation of DSM employing image matching and various morphological operations for removing objects that do not belong to the relief in order to finally produce the DTM (Grigillo and Kanjir, 2012).
A more recent work shows the extraction and characterization of low and high urban vegetation executed using Plѐiades multiangular images by computing a nDSM, extracting spectral and contextual features and classifying vegetation using a random forest classifier (Lefebvre et al., 2016).
Given the spreading of UAVs, their implementation in dense DSM extraction, and the need for DTM, a simplified approach for vegetation removal from DSMs is introduced, explained and verified in two test sites.The proposed methodology uses NDVI analysis to remove vegetation from DSM, using the combination of two cameras (colour and near infrared) on board a UAV or two flights with a UAV with interchangeable cameras.This study doesn't aspire to describe a universal solution of vegetation removal in UAV DSMs, as the process parameters are depended on the cameras' multispectral response, the morphology of terrain and vegetation canopy, and the vegetation type.On the contrary it is a valid proof of concept study, with several ad hoc solutions for the selection of the processing parameters (NDVI threshold, morphological filtering, DTM interpolation), which ought to be extended with proper NIR cameras and standardization of the methodology.

METHODOLOGY
The proposed methodology is based on the straight forward assumption that vegetation points should not be used in the DTM creation.The proposed methodology (Figure 1) uses NDVI to classify DSM points into vegetation and non-vegetation points, exclude vegetation points and generate DTM using interpolation to fill in the gaps created by the exclusion of the vegetation areas.It depends on the existence of an initial dense point cloud and a respective high resolution NDVI orthoimage which could be used to filter out the vegetation points, so that enough points remain, even within dense vegetation (Figure 2), to ensure that there is enough remaining information to create a proper DTM.This cannot be ensured always, as shown below.Therefore, the existence of high resolution multispectral imagery with 4 channels (red, green blue and near infrared) and a dense DSM are the inputs.Multispectral imagery can be easily obtained, either using the aforementioned aerial cameras, or UAV.In the latter scenario, the UAV may be equipped with multispectral camera, or dual camera configuration, with one true colour (RGB) and a modified colour near infrared (NIR) one.
Even if the drone may only use a single camera at each flight, two flights may be performed, one with each camera, one after the other with the same flight planning.The creation of dense point clouds is addressed by the next generation photogrammetric software using computer vision algorithms.Most of this software are especially designed to support drone imagery with some form of multi-image matching either least square or Multi-View Stereo (MVS) or Image based model (IBM) techniques to produce extremely dense point clouds and DSMs.Therefore, measuring millions of 3D points over a block's surface and creating a dense DSM and an orthophotomosaic, using either RGB or NIR photos, is a trivial and fully automatic task.
Although the process seems trivial there are three sources of potential problems, which need to be discussed, most prominent being the NDVI index (Eq.1) itself.As Soil Adjusted Vegetation Indices (SAVIs) are still an active research field, and many other indices such as Transformed Soil Adjusted Vegetation Indices (TSAVI), Optimized Soil Adjusted Vegetation Indices (OSAVI), Global Environment Monitoring Indices (GEMI) have been proposed, used and compared (Bannari et al, 1995, Rondeaux et al, 1996, Steven, 1998), the use of NDVI may look controversial.Nevertheless, NDVI has been selected as the most widely adopted index for vegetation identification.

𝑁𝐷𝑉𝐼 = (𝑁𝐼𝑅 − 𝑅𝐸𝐷) (𝑁𝐼𝑅 + 𝑅𝐸𝐷) (1)
Regardless which Vegetation Index (VI) is used, camera radiometric calibration should be taken into consideration, especially for sensitive indexes.The most common practice for NDVI calculation is the creation of two orthophotomosaics, an RGB and a NIR, and calculation of NDVI using the appropriate channel from each orthophoto, red from RGB and NIR from the NIR photos.During the mosaicking process, software alters original pixel values along seamlines to adjust colour balancing.This process, severely affects original pixel values and final NDVI orthophoto.
Following the NDVI calculation, a proper threshold for vegetation must be selected for mask creation (Figure 2).Selection of a proper threshold depends on many different aspects, such as illumination differences, vegetation type, topographic variations, sensor sensitivity, epoch, phonological cycle, leaf coverage etc (Huete, et. al, 2002).A low NDVI threshold may mask out valid ground points, while a high one will include vegetation points deteriorating the DTM accuracy.
Although the proposed method is sensitive on the selected threshold, relaxation and morphological filtering helps minimizing large interpolation areas in the final DTM After masking out the vegetation areas, the remaining holes must be filled using an appropriate interpolation method (Figure 2).Selected method greatly affects the quality of the final DTM (Aguilar et. al, 2005).In case of vegetation covering large area with underlying surface variations, the proposed method cannot properly detect ground.That is due to the problems that occur when an interpolation algorithm is applied to fill in the gaps.The method assumes that there are enough gaps between the trees, or the canopy is coarse enough to allow for some ground points to be visible under the trees or between the bushes.Large grass areas, which will be filtered out completely, would severely deteriorate the DTM results.
To answer such practical aspects, two test areas were selected for proposed method implementation, and evaluation using reference points.On these cases, the small ground pixel size of UAV photography proves to be an advantage over platforms with larger ground pixel size, such as satellites (Lefebvre et al. 2016) or aerial surveys.The small pixel size allows some penetration through canopy, for individual ground pixels among the DSM.
Figure 2: Proposed methodology -interpolation principle (among high trees and across grass field)

APPLICATION OF PROPOSED METHODOLOGY
Two areas were selected for testing, based on the availability of a sufficient amount of ground reference points, both under trees and among vegetation, as well as in dead ground.In addition, both RGB and near-infrared UAV photography, should be available, to apply the proposed methodology.For the application of the tests, a single UAV has been used and data were collected in two separate flights, one with RGB and another one with NIR camera.
In the following implementation, a modified non-calibrated near infrared camera has been used, mainly because this doesn't affect the main scope of this work and the method is designed to be as versatile as possible.In this particular study, Agisoft's Photoscan was used to process the UAV photography and create two orthophotos and a dense DSM.Since the overlapping and coregistration of the orthophotos is a crucial factor, a common bundle adjustment was performed, with all photos (RGB and NIR).It was decided to use the rest of field collected points as reference points to maximize the sample, instead of check points to check overall UAV flight accuracy, which is irrelevant to this study.The DSM was extracted from the RGB photos, and the RGB and NIR orthophotos were created by selecting the corresponding photos only, during the orthorectification process.
The rest of the processing for NDVI calculation, NDVI filtering, removal of corresponding areas from DSM, morphological filtering and interpolation of the DTM were done in MathWorks' Matlab using in house code.

Test Site A -, Kimisala, Rhodes Island, Greece
The first site is part of Kimisala area in Southwestern part of the island of Rhodes.Kimisala's area is covering about 1000 acres, with many scattered archaeological sites within.Data were acquired as part of HERICT IP project (Acevedo Pardo et al., 2013) and processed within the scope of this work.This site comprises of stone archaeological ruins, surrounded by high pine trees and high shrubs (Figure 3).In between the pine trees and shrubs small clearings allowed for the aerial photos to penetrate to the ground (Figure 5).This test site, intends to address the problem of very dense tree canopy, with small openings of ground.
For the area, 476 reference points were collected using Trimble GPS measurements in RTK mode, with estimated accuracy of 2cm horizontally and 3cm vertically.The vertical accuracy might be more, considering that the pole's pointed tip can penetrate the dead ground by at least 2-3cm.These points were collected to produce a vector plot of the archaeological area and create a rough DTM of the area of interest.Therefore, 316 of them were measured on clear ground (without vegetation) and 160 under trees or in shrubs (under vegetation).A separate set of 23 presignalised ground control points was collected, using RTK GPS.   4) a colour RGB and a NIR orthophoto (Figure 5&6) has been created, with 0.20 m ground pixel size, covering 268x192 m.An equivalent DSM with similar ground pixel size was also exported.Special care was taken so that all outputs cover the exact same area, hence each pixel covers the exact same ground area.In this way, all following processing was made in raster context.This site covers overall a 2.5 km2 (Figure 7), while the survey was contacted as part of HawkEye research project, funded by Electric Authority of Cyprus.Area coverage consists mainly of low and medium height shrubs as well as shrubbery, and some tree concentrations (Figure 8).The middle area was mapped using both colour and colour near infrared photos, with the same UAV and cameras as test site A. The flight planning was executed twice, once with the normal colour camera and another one with the colour near infrared camera.Both flights were executed the same day, one right after the other.Thirteen presignalized ground control points were collected using Leica GPS in RTK configuration.Additionally, 137 points were collected as reference points (Figure 10) 77 ground points and 60 within vegetation.The RTK points for Vasiliko area were measured also with horizontal and vertical accuracy of 2cm and 3cm respectively.
The two flights were processed independently of each other.Main results are showed in Table 1.Although, DSM was created from both blocks, the one created from RGB photos was used for further processing, filtering and creation of orthophotos.The orthophotos and DSM exported were with 0.25 m ground pixel size, covering 729x690 m area.

RESULTS
As mentioned above, after selecting an optimal threshold of 0.09, the masking for the two sites was done and the valid vegetation was identified and then eliminated from the DSMs (Figure 11).The threshold was selected as such because other values caused the classification of valid terrain points as vegetation or the not classification of vegetation in some areas.Following that an interpolation was applied to fill in the gaps of the masked DSMs.
Regarding the Kimisala test site a cubic interpolation was applied curtesy of the larger amount of eliminated information in the DSM where for the Vasiliko test site a linear interpolation was applied.Due to the large excluded area in the initial mask, a morphological filtering was performed, where each output pixel contained the median value in a 5x5 neighbourhood around the corresponding pixel in the initial mask.In this process, a pixel in the original image is considered 1 only if all the pixels in the neighbourhood are 1, otherwise it becomes zero.The outcome of this process is that, all the pixels near boundary will be discarded depending upon the size of neighbourhood to decrease the thickness or size of the foreground object (Mordvintsev, 2017).The idea is to erode the boundaries of foreground objects which in this case are the boundaries of the excluded areas, classified as vegetation, to maximize the available terrain.

RGB-A NIR-A Com
As is showed in  Afterwards to evaluate the accuracy of the whole process, a large number of reference points gathered on points under vegetation and on the terrain.Those points then were used in order to apply a statistic analysis (Table 2) and indicate the overall accuracy of the derived DTMs in contrast with the DSMs.A rough analysis for the expected elevation precision, should be made beforehand.
The height precision of a point from an impeccable stereopair is given by, where σΖ , the accuracy of elevation H , the flying height above the object c , the principal distance B , the base of the stereopair σp , the precision of measuring x-parallax With H=87,35m, c=4.3mm, base across strips B=28,34m, for the Kimisala test site.Image matching techniques may have subpixel accuracy on well signalized points, nevertheless on random points one pixel is a more realistic value,   = √2 * 1,55 = 2.2.Using these values, the estimated precision of a random point in the point cloud is 0.14m, while for the Vasiliko flight it is 0.28m.By employing MVS algorithms, it is expected that either the precision of a single point or the overall noise of the point cloud will be improved.To evaluate the precision, one must consider the bundle adjustment errors as well.The average of bundle adjustment RMS in Z, for the control points in the blocks is about 0.04m and 0,07m respectively for each test site (Table 1), while the random error in any 'random' point is expected at least twice as big (Kraus, 1993).Theoretically the expected precision for a block can be calculated by applying error propagation (Eq.3), using an estimation of bundle adjustment height error.These figures represent a rough estimation of elevation precision for random points in the stereopair.
As derived from Table 2, for reference points on ground, there is no significant change for both test sites.The mean differences, RMS errors and standard deviations remained unchanged.The RMS error as well as the standard deviation were kept in a range of 1-1.2 *GSD of orthophotos.That is still relatively large considering the original GSD of the captured photos.Overall DSM accuracy is a function of several factors, including triangulation, interpolation and raster generation (Skarlatos and Vlachos, 2015).
Regarding the reference points on vegetation, there is significant improvement providing a good indication on how useful the applied methodology might be.The Std. Dev and RMS values are reduced from about 1.855m and 2.051m in site A and 1.638 and 2.137m in site B to about 0.405m and 0.524m respectively which is approximately 1.5 times the RMS and standard deviation values for the points measured on the ground.Still those numbers cannot be considered as optimal.They only provide an indication on how well the applied algorithms managed to eliminate vegetation and describe terrain beneath it.Additionally, regarding the Kimisala scenario, the maximum differences between the reference points and the DTM is increased from 0.456m to 1.304m instead of being reduced.That is possibly due to misclassification of some points in the exclusion mask as ground or due to gross errors that the points carry from the measurement phase.This is an indication of how crucial the selection of the proper threshold for the NDVI calculations as well as the interpolation method on the remaining terrain are, for the DTM generation according to the area morphology.

CONCLUSIONS
Small pixel size and large-scale photographs are an asset for small bundle adjustment residuals, but doesn't improve overall DSM accuracy.This may be attributed to the fact that higher flights avoid motion blur, while have small angle of intersection in between canopy.DSM accuracy is depended in point precision well as point density and noise of the original point cloud.
Although the evaluation of point accuracy is easy, interpolation and prediction of the overall DSM accuracy is not a trivial task.
Qualitative and quantitative metrics, agree that by incorporating NIR imagery to eliminate vegetation from a DSM, great gain in the derived DTM's accuracy can be expected.Hence, this method may produce DTMs that describe the terrain well enough to act as an alternative option, when the use of LIDAR is not affordable, especially for small area projects.The quality of results when applying this processing pipeline is a product of various factors like the morphology of the test site, the density and the area covered by vegetation, the threshold used in NDVI calculation and the interpolation method that is being used to fill in the gaps in the masked DSM.Therefore, this cannot be a universal application proposal, since several parameters should be set manually, using an ad hoc approach.More tests in different sites and radiometrically calibrated cameras should be used, for the methodology to become standardized.
UAVs, the combination of RGB and NIR or mini MS cameras as well as the SfM-MVS pipeline, offer a competitive alternative to LIDAR data and radar satellites for DTM generation, in many applications, especially for small-medium areas, where flying a full-scale plane is not efficient.Main limitations are limited range, further hindered by flight regulations and limited payload.Further improvements may be expected with combination of low and high flights and simultaneous image capture by integrating RGB and NIR cameras in the same UAV platforms.
The proposed methodology can be easily adopted for large digital aerial cameras, taking into advantage the multispectral information recorded by default.All modern large and medium format digital aerial cameras, such as DMC, ADS and Ultracam families from Intergraph, Leica and Microsoft respectively, are recording four channels of the spectrum, hence this information could be easily adopted to extract DTM from DSM.

Figure 1 :
Figure 1: A simplified diagram of the proposed methodology.In this implementation the aerial triangulation (bundle adjustment) was performed in a joined block of NIR and colour photos.

Figure 3 :
Figure 3: Kimisala site from ground photos Many UAV flights took place over the area for educational reasons.Two flight data were selected for the specific work.Both flights were performed with SwingletCam UAV, but on different days.The first one with normal Canon IXUS 220HS camera and an average flight height of 78m, while the second one with a modified near infrared Canon PowerShot ELPH 300HS was contacted at a height of 100m.Both cameras were provided by SensFly, the UAV manufacturer.

Figure 4 :
Figure 4: Kimisala site DSM and ground control pointsFor the processing of the data set, PhotoScan from Agisoft has been used.A combined aerial triangulation took place because the pre-signalized targets were only visible in the RGB low height flight.Unfortunately, the targets were washed out because of rain at the previous night before the near infrared flight.The combined automated aerial triangulation using 442 (266 RGB and 176 NIR) photos and 416312 tie points ended with residuals were 0.015, 0.019, 0.036m at X,Y,Z respectively, with an overall 0.928 pixel error.From these block, only the RGB photos were used to extract a dense point cloud with an average of 113 points per square m.Using the DSM (Figure4) a colour RGB and a NIR orthophoto (Figure5&6) has been created, with 0.20 m ground pixel size, covering 268x192 m.An equivalent DSM with similar ground pixel size was also exported.Special care was taken so that all outputs cover the exact same area, hence each pixel covers the exact same ground area.In this way, all following processing was made in raster context.

Figure 8 :
Figure 8: Vasiliko Test site from ground Photos

Figure 11 :
Figure 11: Initial mask with NDVI threshold 0.09 (left) and final mask after morphological filtering (right) for Kimisala (top) and Vasiliko (bottom) sites Figures 12 & 13 the algorithm derived the DTMs managing to eliminate the vegetation as well as interpolating the missing gaps.Those facts are also indicated by the contour lines which differ on the initial DSMs and the final DTMs.

Figure 12 :
Figure 12: Overlaid Contour lines on the initial DSM (top left), mask (bottom left), 1m contour lines overlaid on the derived DTM (top right) and nDSM (bottom right); Kimisala test site

Table 1 :
RGB and NIR block processing results of both sites

Table 2 :
Statistics of errors on reference points, collected during the UAV expedition in the 2 test sites, Reference points are separated in ground points and vegetation points accordingly to their classification based on the NDVI masks created.