THE EFFICIENCY OF RANDOM FOREST METHOD FOR SHORELINE EXTRACTION FROM LANDSAT-8 AND GOKTURK-2 IMAGERIES

Coastal monitoring plays a vital role in environmental planning and hazard management related issues. Since shorelines are fundamental data for environment management, disaster management, coastal erosion studies, modelling of sediment transport and coastal morphodynamics, various techniques have been developed to extract shorelines. Random Forest is one of these techniques which is used in this study for shoreline extraction.. This algorithm is a machine learning method based on decision trees. Decision trees analyse classes of training data creates rules for classification. In this study, Terkos region has been chosen for the proposed method within the scope of "TUBITAK Project (Project No: 115Y718) titled" Integration of Unmanned Aerial Vehicles for Sustainable Coastal Zone Monitoring Model – Three-Dimensional Automatic Coastline Extraction and Analysis: Istanbul-Terkos Example “. Random Forest algorithm has been implemented to extract the shoreline of the Black Sea where near the lake from LANDSAT-8 and GOKTURK-2 satellite imageries taken in 2015. The MATLAB environment was used for classification. To obtain land and waterbody classes, the Random Forest method has been applied to NIR bands of LANDSAT-8 (5 band) and GOKTURK-2 (4 band) imageries. Each image has been digitized manually and shorelines obtained for accuracy assessment. According to accuracy assessment results, Random Forest method is efficient for both medium and high resolution images for shoreline extraction studies.


INTRODUCTION
Coastal areas are the most important settlement areas throughout human history. Due to increasing of population, urbanization, shorelines and eco-system are under the threat of human being (Bendell and Wan, 2011). According to the International Geographical Data Committee, coastal areas are one of the 27 important natural heritage on Earth (Li et al.., 2001). Thus, rapid, up-to-date, and correct information is essential for coastal management. The determination of shoreline dynamics has primary importance for coastal managers. Therefore, shoreline extraction is the primary step for coastal management issue. Remote sensing and image processing techniques provide rapid shoreline extraction solutions compare to traditional methods (Bayram et al., 2017). The monitoring of shoreline changes is one of concerns of researchers. (Dornbusch et al., 2006;Marques, 2006). Therefore, temporal monitoring of shorelines has primary importance (Gens, 2010). LANDSAT-8 and GOKTURK-2 imageries are open data resources (Machado et al., 2014;Kalkan et al., 2015). Some of commonly used shoreline extraction methods are Unsupervised classification techniques (ISODATA-Iterative Self Organized Data Analysis) (Guariglia et al., 2006), normalized difference water index (NDWI) (Zheng et al., 2011), thresholding and morphological filtering techniques (Pardo Pascual et al., 2012), Wavelet transformation (Yu et al., 2013), active contour method (Shmittet al., 2015), genetic algorithm based methods (Yousef and Iftekharuddin, 2014), particle swarm optimization method (PSO) , Mean-shift segmentation (Bayram-b, vd, 2016), object oriented fuzzy classification methods (Bayram et al., 2015;Bayram et al. 2008), normalized cut approach (Ding and Li, * Corresponding author 2014) . In this study, the shoreline of the Terkos/Istanbul has been extracted using Random Forest method (Breiman, 2001) from LANDSAT-8 and GOKTURK-2 imageries. Extraction results have been evaluated by DSAS and efficiency of Random forest method has been discussed.

STUDY AREA
Study area is consisting of 19 km shoreline of Black Sea which located in the north part of Lake Terkos/Istanbul. Terkos shoreline is under the threat of erosion due to increasing urbanisation. The study area has been given in Figure 1.

MATERIAL AND METHOD
In the presented study, LANDSAT-8 (06.09.2015) and GOKTURK-2 (30.06.2015) images have been used. The 5 th band of LANDSAT-8 and 4 th band of GOKTURK-2 have been processed and shoreline of Terkos/Istanbul has been extracted.
The specifications of LANDSAT-8 and GOKTURK-2 have been given in Table 1 and Table 2  Random Forest classification algorithm is based on decision trees and a pixel based machine learning method. By analysing of training data sets, rules are created and object classes are determined. The rules consist of several if-than conditions (Breiman, 2001). Random Forest algorithm requires two parameters which are number of trees and the number of random variables to be used for each node for creation of decision trees (Belgiu and Dra˘gut, 2016).
After completion of parameters, if any additional data set is not existing, 2/3 of training data set is used as learning data, 1/3 of is used as test data. Multiple CART-like trees are created by Random Forest in training step (Breiman et al., 1994). To determine a split for each node, bootstrapped technique is used and randomly selected subsets from input variables are searched (He et al., 2015;Gislason et al., 2006). CART algorithm uses GINI index to determine the best split (Gislason et al., 2006). GINI index measures the homogeneity of samples for each node. Algorithm calculates the GINI index for random selected variables for each node. The variable which has the minimum GINI index is selected and algorithm and calculations are repeated for next node. If the GINI index is resulted with zero, it means that related node is totally homogeneous and this node is defined as and of branching (Gislason et al., 2006). The out of bag samples (samples of remaining training set are not bootstrapped for a particular tree) of each tree are cross validated. For each pixel, a classification vote is calculated according to weight of decision tree and the pixel is assigned to the majority voted class (Gislason et al., 2006).
In this study, the Random Forest algorithm has been realized by using MATLAB platform. Both LANDSAT-8 and GOKTURK-2 images have been classified and two classes which are land and water body, were created. TreeBagger function of MATLAB has been used. The number of trees and the number of random variable for both images have been selected as 50, 1 respectively. Predict function of MATLAB has been used to define the corresponding class for each image pixel. The segmentation results for LANDSAT-8 and GOKTURK-2 are given in Figure 3 and 4 respectively.

RESULTS
Shape data for both images have been obtained after applying raster to vector conversion process. Manually digitized shorelines each image has been compared with obtained shorelines separately. For this purpose, Digital Shoreline Analysis System (DSAS) (Thieler et al., 2009) has been used. DSAS is a plugin for ArcGIS commercial software to evaluate extracted shoreline with reference data (Jayson-Quashigah et al., 2013). The Net Shoreline Movement function of DSAS has been used for accuracy assessment which calculates perpendicular distances with defined spacing between input and reference data (Oyedotun, 2014). The spacing has been chosen for LANDSAT-8 and GOKTURK-2 as 5m. Length of transects have been defined for LANDSAT-8 and GOKTURK-2 as 300m and 100m respectively as given in Figure 5,6. Red and blue lines represent  The results for LANDSAT-8 and GOKTURK-2 have been given in Table 3 Table 4. Accuracy assessment results for GOKTURK-2.

CONCLUSION
The average distances between manual digitized and extracted shoreline for LANDSAT-8 and GOKTURK-2 imageris are calculated as 11.327m and 3.248 m respectively. The calculated distance ratios in pixel size has been given in Table 5. 1/5, 1/2 and 1pixel size of for LANDSAT-8 image are calculated as 36.48, 80.56% and 99.11% respectively. According to Table 5 80.56% of the distances are calculated in ½ pixel size for LANDSAT-8 image. As it can be seen in  Table 5. Calculated distance ratios in pixel size.
The success of Random Forest method is higher than GOKTURK-2 than LANDSAT-8. Since Random Forest method is a pixel based method, obtained results are not confusing. As many researchers mentioned, object-oriented methods can provide more successful results compare to pixel based methods and this was provided with achieved results. In the future studies, Support Vector Machine method and Random Forest method will be used to increased number of training pixels applied on GOKTURK-2 imageries and obtained results will be compared.