MAPPING URBAN TREES WITHIN CADASTRAL PARCELS USING AN OBJECT-BASED CONVOLUTIONAL NEURAL NETWORK

: Urban trees offer significant benefits for improving the sustainability and liveability of cities, but its monitoring is a major challenge for urban planners. Remote-sensing based technologies can effectively detect, monitor and quantify urban tree coverage as an alternative to field-based measurements. Automatic extraction of urban land cover features with high accuracy is a challenging task and it demands artificial intelligence workflows for efficiency and thematic quality. In this context, the objective of this research is to map urban tree coverage per cadastral parcel of Sandy Bay, Hobart from very high-resolution aerial orthophoto and LiDAR data using an Object Based Convolution Neural Network (CNN) approach. Instead of manual preparation of a large number of required training samples, automatically classified Object based image analysis (OBIA) output is used as an input samples to train CNN method. Also, CNN output is further refined and segmented using OBIA to assess the accuracy. The result shows 93.2 % overall accuracy for refined CNN classification. Similarly, the overlay of improved CNN output with cadastral parcel layer shows that 21.5% of the study area is covered by trees. This research demonstrates that the accuracy of image classification can be improved by using a combination of OBIA and CNN methods. Such a combined method can be used where manual preparation of training samples for CNN is not preferred. Also, our results indicate that the technique can be implemented to calculate parcel level statistics for urban tree coverage that provides meaningful metrics to guide urban planning and land management practices

The use of Geographic Object Based Image Analysis (GEOBIA) for image classification and feature extraction has been increasing due to the introduction of user-friendly GEOBIA software packages such as eCognition, Orfeo Toolbox, Imagic, Spring etc.These GEOBIA software allows users to make their own rule sets based on the study area, available dataset and research objectives.The GEOBIA method considers the texture, shape, colour, size and relationship between contiguous pixels along with the spectral properties of an individual pixel (Benz et al., 2004;Blaschke, 2010).The basic steps behind image classification using GEOBIA follow an iterative process of segmentation and classification (Blaschke, 2010).This method overcomes significant limitation of pixel-wise method on high resolution image classification (Addink et al., 2013).
The GEOBIA method can give better accuracies than pixel-wise method during image classification especially for very highresolution images.But there still exists some gap in this approach in order to meet the required level of accuracy.
Selection of scale parameter for image segmentation is a major challenge wherever segmentation and under-segmentation are likely to appear within same image (Ming et al., 2015).Also, Scale parameter selection and optimisation recently attracted attention of researchers (Belgiu, Drăgu, 2016;Drǎguţ et al., 2010).Selection of scale needs to optimise high number of free parameters and requires domain specific knowledge (Jin et al., 2019).
The land cover classification with GEOBIA in urban areas could be challenging due to high diversity of land cover objects.For example, roof of various buildings might be of different materials in one hand whereas different features like roads and buildings might have similar characteristics in other hand.In addition, GEOBIA also has to interact with occlusion and shadows (Ehlers et al., 2003) which ultimately break image objects into finer objects and hence reduce the accuracy of the classification result.Extracting urban land cover features with high accuracy in an automated way is a challenging task and it demands artificial intelligence workflows for efficiency and thematic quality.
Convolutional Neural Networks (CNN) is one of the rapidly used deep learning neural network algorithms which is mainly designed for image classification (Fu et al., 2018;Zhang et al., 2016;Zhu et al., 2017).Kunihiko Fukushima first proposed CNN in 1988 (Fukushima, 1988), it became popular after release of AlexNet in 2012 (Alom et al., 2018) and with the implementation platform of Google TensorFlow.CNN is a deep learning supervised neural network which uses labelled data.CNN works with the combination of input layer, hidden layers with hidden units and output layer.The hidden units are like neurons that are fully connected with each individual neuron from previous layer.Image is given as an input layer as a multidimensional input which is passed through series of hidden layers to get output.The overall architecture of CNN can be divided into two main parts; feature extraction and classification (Alom et al., 2018).Feature extraction or feature learning part consists of convolutional and pooling layers whereas classification part consist of fully connected layer (Figure 1).Larger the training datasets, better is the performance of CNN (Fu et al., 2018).As training of CNN requires a large number of samples, so it is not easy to prepare training data manually.However, on the other hand it is difficult to generate highly accurate training samples using automatic feature detection methods.Also, the classification obtained from automatic feature classification methods may not have better accuracies than manually prepared samples.But, in this research, we would like to classify the image using OBIA and train CNN with automatically classified OBIA output and assess and compare the accuracy of output to understand if it can improve the accuracy of CNN.Further, in order to filter the noise that might have been introduced due to erroneous training samples, we would like to implement the refinement algorithm to CNN output trained with automated OBIA output and test whether it can further improve the accuracy.
This method is experimented in an urban environment to test the performance for detecting urban trees.The classified output of urban trees is further overlaid with the cadastral parcel layer of study area in order to generate parcel level statistics.These metrics can be meaningful to guide urban planning and land management practices.The urban tree density map of cadastral parcels will have research as well as policy impacts.Further research on ecological abundance, foraging of birds and habitat mapping will be benefited by the density map produced in this research.In term of policy, the output from this research will inform urban planner and cadastral surveyors to bring in their planning of urban suburbia.
The organisation of the paper is as follows: in section 2, we present the location of study area, datasets used and adopted methodology.In section 3 the results are presented with maps, chart and tables.Section 4 presents the discussions from the results.And, section 5 presents conclusions and future works.

Study Area
The study area is a part of Sandy Bay suburb of Hobart (42.904549ºS, 147.328536ºE) which is in the south-east region of Tasmania, Australia.The study area is an urban residential zone and has an area of 4.168 square kilometres.The study area consists of residential properties with different types of vegetation species including Acacia, Allocasuarina, and Eucalyptus.
Figure 2. Location of study area, which is located within Sandy Bay Hobart (42.904549ºS, 147.328536ºE) and has an area of 4.168 km 2 .

Datasets
This research uses different types of datasets including very high resolutions (0.15 metres) orthophotos, LiDAR point clouds and cadastral layer.The orthophotos of the study area are captured by airborne sensors in 2015.One of the orthophotos had red, blue and green (RGB) bands whereas another had additional nearinfrared (NIR) band.LiDAR point cloud were captured in 2011 and has spatial accuracy of 0.15 metres (vertical) and 0.30 metres (horizontal) and were captured with 1-metre average point separation.Similarly, the cadastral parcel data is obtained from the Land Information System Tasmania (TheLIST, 2015) open data portal.

Generation of Canopy Height Model (CHM):
The CHM is generated by subtracting Digital Elevation Model (DEM) from Digital Surface Model (DSM) of vegetation classes, both of which are prepared using classified LiDAR point cloud data.Thus, generated CHM is used as a height threshold while preparing ground truthing samples and during classification refinement of heatmap obtained from CNN algorithm.

Generation of Normalised Difference Vegetation
Index (NDVI): NDVI image is created from the mean value of red band and near-infrared bands using the following band combination ratio expression:

Preparation of Ground Truthing Dataset
The ground truthing dataset are prepared by using CHM and NDVI images for trees and grass classes, which then further used to generate samples in CNN workflow.
The image is segmented based on context, geometry and texture properties of trees and grass by using multiresolution segmentation algorithm with domain pixel level in eCognition software.
The classification of the segmented objects into the trees and grass class is performed by defining threshold of CHM and NDVI value, assuming the NDVI value of trees and grass is more than 0.1 and by considering the height threshold for trees greater than or equal to 1.5 metres.The representative validation data (ground truth) for trees and grass classes are generated from the whole study area.

CNN Training and Classification
The overall analysis was done in a computer system having 64bit operating system, 16 GB RAM and Intel (R) Core (TM) i7-7700 CPU @ 3.60 GHz processor.The CNN workflow of Trimble's eCognition software Developer 9.4 (Trimble eCogntion software, 2019) was applied for the tree's extraction (Figure 3).The CNN workflow in Trimble's eCognition software Developer 9.4 is based on Google TensorFlow API.

Generate Labelled Sample Patches for CNN Model:
Labelled sample patches are created by considering different parameters including sample count, sample patch size and image layers.In this research, 8000 sample patches for each class are generated.The optimum sample size is determined to be of 22 x 22 pixels by trial-and-error method.Smaller sample size than the optimum ones, introduced multiple canopy detection errors whereas, larger sample size could not detect the smaller trees.

Create CNN Model :
The CNN model is created with one hidden layer.The input image size is assigned the same as in sample generation.The hidden layer is based on the kernel size, number of feature maps and max pooling.As the even sized kernels will generate hidden units located between pixels and then are shifted to match pixel borders, old size kernels (13×13) is assigned with 40 number of feature maps.Max pooling using 2×2 filter with a stride of two in both horizontal and vertical direction is applied to reduce the resolution of the feature maps.Thus, the weight of 4 × 13 ×13 × 40 corresponds to the hidden layer kernel.The first factor (4) represents the number of image layer and the second and third factors (13×13) describe the number of units in the local neighbourhood, from which connection are forwarded into the hidden layer.The final factor (40) represents the number of feature maps generated.Therefore, 40 different kernels of 4 × 13 ×13 size is trained in this network.
The only hidden layer of this network thus contains 27,040 different weights, that can be trained.

Train CNN Model:
The model is trained based on labelled sample patches and model weights are adjusted using backpropagation.The learning rate of 0.0015 is assigned based on trial-and-error method.This parameter defines the amount by which weights are adjusted in each iteration of the statistical gradient descent optimization (Trimble eCognition software, 2019).Higher the value of the learning rate, faster the speed of training but the bottom of the optimal minimum may not be reached.While smaller values will slow down the training processing and may stuck in local minimum and end up with weights not even close to the optimal settings (Trimble eCognition software, 2019).A total of 5000 training steps are set in such a way that each training step uses 50 training samples.

Apply CNN Model:
Heat map of tree class is produced after applying the trained CNN model to the input 4-band image.This map shows the likelihood of trees with corresponding probability value.The map is smoothed using a 7 x 7 gaussian filter and local maxima of the smoothed heatmap of trees is generated using morphology (dilate) filter of 3×3 pixels.A threshold value of 0.3 is set for the local maxima to delineate trees.

Classification Refinement
The heatmap obtained from CNN is segmented using multiresolution segmentation algorithm to classify trees and grass.The height threshold of 1.5 metres using CHM and NDVI threshold of 0.1 are applied to refine classification.The segmented tree objects are further refined using assign merge function, pixel-wise object resizing, and remove object algorithm using eCognition software.The tree objects sharing border with neighbouring trees are merged.Growing and shrinking mode with surface tension threshold and box size are applied consequently in pixel-wise object resizing algorithm to refine the shape of tree segments.Number of pixel threshold were used to eliminate smaller non-tree segments.

Mapping Per-Parcel Tree Coverage
The classified tree layer is overlaid with the cadastral parcel layer and hence the area and percentage of tree coverage area perparcel is calculated.The percentage of tree coverage for each cadastral parcel is calculated as:

Accuracy Assessment
The accuracy assessment of the classification outcome develops on the confusion matrix generated from manually digitised test data.The accuracy is assessed for three different methods of image classification i.e. 1) object-based image analysis (OBIA) and 2) convolutional neural network (CNN) and 3) segmentation of the refined CNN outcome.The refinement of CNN outcome is performed using pixel-wise object resizing (growing and shrinking) algorithm after applying minimum tree size threshold (pixel-area>4.5 square metres).

CNN Workflow Output
The output of CNN workflow is a probability heatmap representing the probability of tree in the test region (Figure 4,5).The probability value in heatmap ranges from 0 to 1 where 0 being the least chance and 1 being the highest chance.The figure below shows the heatmap with red colour indicating the highest chance of being a tree whereas, blue indicating the least chance of presence of tree (Figure 6).

Classification Refinement
The output of segmentation of heatmap obtained from CNN using multiresolution segmentation algorithm is presented in figure 7. The shape of merged tree class segments is further refined using pixel-wise object resizing algorithm (Figure 8).

Classification Accuracy Assessment
The result shows that the classification outcome of refined CNN method gives the best overall accuracy of 93.2% with 0.85 kappa coefficient (Figure 9).Second to this classification method is the CNN with an overall accuracy of 92.3% and kappa coefficient 0.83.The OBIA method give the overall accuracy of 90.6% with kappa coefficient 0.80.

Classification results and visual assessment
A final per-parcel urban tree coverage map of the study area was produced by overlaying cadastral parcels layer with the classification results of refined CNN outcome.The result shows that 21.5% of the study area was covered with trees.
To provide a better visualization, Figure 10 provides an overlay of cadastral layer with the urban tree layer and Figure 11 shows a classified per-parcel cadastral map of percentage of tree coverage.The cadastral parcels are classified into five different classes depending on the per-parcel percentage of tree area coverage.The percentage of tree coverage are classified as very high (>=90%), high (60% -90%), medium (30% -60%), low (0% -30%) and none (0%).The tree coverage percentage result shows that two thirds of the parcel areas are covered by low density of trees.
From the Table 1, highest sum area percentage (75.8%) of parcels have 2168 parcels with low density of trees (0% -30%) but the very high tree coverage (>90%) are in 35 parcels which sums up 1.0% in total area.There are 514 parcels whose sum in area represents 17.6% of total area with the medium density of trees (30-60%).Only 4.6% of the parcels (88 in numbers) in area are covered by parcels with high density of trees (60-90%).The remaining 887 parcels representing 1.0% of sum area got no trees.

Percentage of parcel area coverage
Very high (>90%) The land tenure type of tree coverage parcels (Table 2) shows that the authority land has 89 parcels but covers 29.9% of the total study area.This cadastral type has 30.5% of tree coverage which covers 39% of overall total tree coverage in study area.Similarly, there are 3707 private parcels covering 56.4% of the total study area with 18.8% of tree coverage which is half of the overall percentage of tree coverage.

Number of parcels
Area of parcels (%)  The results indicate that the overall accuracy of refined-CNN is better than CNN method alone even if it is computed by using automatically generated training samples (Table 3).Hence, this method can provide an alternative way to achieve improved accuracy in feature classification using automated OBIA output samples for training CNN.

Tree Cover and Cadastral Types
The overlay of improved CNN output with cadastral parcel layer shows that 21.5% of the study area is covered by trees and this is more than that of urban tree coverage of many Australian cities including Melbourne (11% in 2012) and Sydney (15.5% in 2013) of Australia (City of Melbourne, 2012; City of Sydney, 2013).The private parcels which covers 56.4% of the study area has 18.8% of tree coverage which represents the half of overall tree coverage in the study area.But the authority land that covers 26.9% of total study area covers nearly 38.7% of total tree coverage (Figure 12).This means that the land owned, vested or managed by Commonwealth, State or Local Government authority has highest proportion of tree coverage.Having more urban tree coverage in study area means that the study area possesses wider social, aesthetic, climatic, ecological and economic benefit from urban forest and trees.Also, the study area contributes to a better quality of living environment, for example by improving air quality and consequently the health of urban residents.

Limitation of this Study
The main limitation in this research is the time difference between the used orthophoto ( 2015) and LIDAR dataset (2011).This could have introduced error in the analysis because the analysis uses CHM generated from the LiDAR dataset for identifying trees.This means, those trees that have been cleared in between the acquisition of LiDAR data (2011) and orthophoto (2015) may not have been classified as trees.On the other hand, those plantations done after the acquisition of LiDAR data and are taller than two metres during the orthophoto acquisition might not been classified as trees.Hence the result may have erroneously depicted the change in trees, planted, removed, or change in shape and textures.

CONCLUSIONS AND FUTURE WORK
The outcome of this research has two key contributions.First, the use of automatically generated training samples to train CNN model.Second, the application of combined CNN and OBIA method to map urban trees per cadastral parcel.In this context, this research demonstrates that the accuracy assessment of image classification can be improved by using a combination of OBIA and CNN methods.This spatial analysis can be used for multiple purposes including land management, urban planning and cadastral survey.
This research uses a simple CNN model with a single hidden layer.In future research, multiple hidden layers with a change in parameters can be applied and tested.Similarly, deeper CNN methods including Region-based CNN (R-CNN) and Fullyconnected CNN (F-CNN) can be further tested for urban tree coverage mapping and tree species identification.

Figure 1 .
Figure 1.Overall architecture of Convolution Neural Network (CNN) (Trimble eCogntion software, 2019).Deep learning approaches if integrated with OBIA may improve the overall accuracy of image classification.Hence, this research assesses the accuracy of image classification for OBIA, CNN and refined CNN (OBIA segmentation of CNN output) methods.As training of CNN requires a large number of samples, so it is not easy to prepare training data manually.However, on the other hand it is difficult to generate highly accurate training samples using automatic feature detection methods.Also, the classification obtained from automatic feature classification methods may not have better accuracies than manually prepared samples.But, in this research, we would like to classify the image using OBIA and train CNN with automatically classified OBIA output and assess and compare the accuracy of output to understand if it can improve the accuracy of CNN.Further, in order to filter the noise that might have been introduced due to erroneous training samples, we would like to implement the refinement algorithm to CNN output trained with automated OBIA output and test whether it can further improve the accuracy.

Figure 4 .
Figure 4. Before and after selecting the test region a) subset of the study area with ground truth of trees and grass b) selecting the test region within the subset.

Figure 5 .
Figure 5. Removing ground truthing of trees and grass from the test region.

Figure 6
Figure 6.a) Original RGB image of test region b) Probability heatmap of tree presence resulted from CNN with values between 0 to 1 (blue to red respectively) c) Smoothed heatmap

Figure 7 .
Figure 7. Refinement of the classification result from CNN a) Segmentation of local maxima heatmap result b) Segments with local maxima value >0.3, and NDVI value >0.1 and CHM value >2 to trees class which is represented by pink colour c) Merged trees class.

Figure 8 .
Figure 8. Resizing the classified objects, a) Image showing segments before merging b) Image showing segments after merging c) Image showing result of resizing of merged segments using pixel-wise object resizing (growing and shrinking) algorithm.

Figure 9 .
Figure 9. Overall accuracy and kappa coefficient of classification by OBIA, CNN and refined CNN.

Figure 10 .
Figure 10.An overlay of urban trees layer generated from object-based CNN method over cadastral parcel layer within the study area.

Figure 11 .
Figure 11.Classified urban trees coverage per-parcel map of study area.
that the accuracy of image classification can be improved by using a combination of OBIA and CNN methods.Training CNN with automatically classified OBIA output of 90.6% overall accuracy (kappa coefficient 0.8) has improved the classification accuracy to 92.3% (kappa coefficient 0.83).Implementation of refinement algorithm to CNN output further improves the overall accuracy to 93.2% (kappa coefficient 0.85).

Table 1 .
Parcels level statistics in different tree coverage classes.

Table 2 .
Parcel level statistics in different cadastral types.

Table 3 .
Summary of results from studies related to OBIA and CNN for vegetation analysis.