FOREST COVER MAPPING AND Pinus SPECIES CLASSIFICATION USING VERY HIGH-RESOLUTION SATELLITE IMAGES AND RANDOM FOREST

Advances in remote sensing technologies are generating new perspectives concerning the role of and methods used for National Forestry Inventories (NFIs). The increase in computation capabilities over the last several decades and the development of new statistical techniques have allowed for the automation of forest resource map generation through image analysis techniques and machine learning algorithms. The availability of large-scale data and the high temporal resolution that satellite platforms provide mean that it is possible to obtain updated information about forest resources at the stand level, thus increasing the quality of the spatial information. However, photointerpretation of satellite and aerial images is still the most common way that remote sensing information is used for NFIs or forest management purposes. This study describes a methodology that automatically maps the main forest covers in Galicia (Eucalyptus spp., conifers and broadleaves) using Worldview-2 and the random forest classifier. Furthermore, the method also evaluates the separate mapping of the three most abundant Pinus tree species in Galicia (Pinus pinaster, Pinus radiata and Pinus sylvestris). According to the results, Worldview-2 multispectral images allow for the efficient differentiation between the main forest classes that are present in Galicia with a very high degree of accuracy (91%) and ample spatial detail. Pinus species can also be efficiently differentiated (83%).


INTRODUCTION
National Forestry Inventories (NFI) have evolved throughout history, from simply gathering data related to forest stock resources to including a register of the location of the forest resources in the form of forest maps (McRoberts and Tomppo, 2007). The development of remote sensing technologies has been crucial in this process and many countries consider remote sensing data, especially aerial photography and satellite images, essential for their NFIs (Barrett et al., 2016). Aerial photography is the most commonly used remote sensing tool and it involves photointerpretation whereby the distribution of forest resources is manually delineated (Barrett et al., 2016). Satellite images have been used to perform photointerpretation as well (Barrett et al., 2016). However, the increase in computation capabilities in recent decades and the development of new statistical techniques have allowed for the automation of forest resource maps generation through image analysis techniques and machinelearning algorithms (McRoberts and Tomppo, 2007;Kangas et al., 2018;Wulder et al. 2018). In fact, countries that use satellite images for their NFIs commonly use image analysis techniques (Barrett et al., 2016). Furthermore, using satellite images have other important advantages: the availability of large-scale data and high temporal resolution (Kangas et al., 2018). These aspects are crucial for obtaining updated information about available resources and for acquiring stand-oriented information, thus increasing the quality of the NFIs' spatial information (Kangas et al., 2018). The analysis of the use of remote sensing in NFIs performed by Barrett et al. (2016) reveals a worldwide shift from the use of aerial images to using satellite imagery in NFI programs.
In the case of Spain, the geospatial information on forest resources is provided by the Spanish Forest Map (MFE, Mapa * Corresponding author Forestal Español) (MAPA, 2011). It is a map of the entire Spanish surface area at a 1:25000 scale, and it has been produced mainly by photointerpretation of aerial images. It provides information related to forest cover in plots of land greater than or equal to 1 ha. The map is updated every 10 years. The last version was published in 2011 and was produced by photointerpretation of high resolution ortophotography dating from 2009. These specifications may be satisfactory enough for the forestry sector in most regions of Spain. However, the peculiarities of this sector in Galicia, a region in the Northwest of Spain, mean that the minimum mapping unit and the frequency of update may not be adequate for forest management purposes.
Galicia produces more than the 50% of the total annual volume of timber fellings in Spain (MITERD, 2020). It is the region of Spain where the forestry sector is most active. The species that are most harvested in Galicia are Eucalyptus spp. (mainly Eucalyptus globulus and Eucalyptus nitens) (Xunta de Galicia, 2016). They are fast-growing species with rotations between 12-15 years (Tolosana et al., 2017;Arenas et al., 2019). These are followed by conifers, mainly Pinus pinaster, Pinus sylvestris and Pinus radiata (Xunta de Galicia, 2016). Another peculiarity is that Galician forest land is highly fragmented. According to official cadastral information, it is estimated that 162.188 ha are in cadastral parcels that are smaller than 0.5 ha (Gobierno de España, 2011); this accounts for approximately 40% of the land covered by the main productive tree species in Galicia. Therefore, in order to generate reliable information about forest stands for decision-making purposes in this region, an increased frequency of update and a more-detailed map scale would be desirable. As mentioned, these two issues could be improved using high resolution satellite imagery. In fact, Gómez et al. (2019) have already pointed out that the inclusion of satellite imagery in Spanish forest monitoring processes represents a great advancement and an opportunity to fill existing gaps in forestryrelated information.
The satellite images that are most commonly used for NFIs or forest covers maps are open-access since they are free (Barrett et al., 2016;Kangas et al., 2018). Sentinel-2 and Landsat-8 are commonly used given their high spectral resolution. However, their spatial resolutions (20 m and 30 m respectively) may be insufficient for identifying tree species on a parcel level. Commercial satellites such as Worldview 1 and 2 and Quickbird have also been used for NFIs (Barret et al., 2016). The main advantage of these satellites is that they provide higher spatial resolutions which may allow for a better definition of land cover boundaries. However, according to scientific literature a high spectral resolution is also essential since the separability between tree species increases when spectral channels are added (Fassnacht et al. 2016;Pinheiro-Ferreira et al. 2016;Immitzer et al. 2012). The satellites that provide the highest spectral resolutions at the greatest pixel size are Worldview-2 and Worldview-3 (Satellite Imaging corporation, 2021). Some studies have evaluated the use of Worldview in forest species mapping with successful results. Some of them included tree species that are present in Galicia. For example, Inmitzer et al. (2012) discriminate Pinus sylvestris from 10 other different tree species using Worldview-2 images obtaining an overall accuracy of over 80 %. Another successful example is the work performed by Peerbhay et al. (2014) who distinguished Eucalyptus nitens from among 5 other species using Worldview 2 data obtaining accuracy indexes of above 80%. However, lower accuracies were obtained by Ozkan et al. (2020) when testing the possibility of discriminating between Pinus pinaster and 9 other tree species including 3 other pine species. In fact, Fang et al. (2020) have reported a large decline in accuracy when attempting to classify species from within the same genus.
This study describes a methodology used to map the principal forest covers in Galicia (Eucalyptus spp., coniferous and broadleaves) using Worldview-2. Furthermore, it also evaluates the separate mapping of the three coniferous tree species that are most abundant in Galicia (Pinus pinaster, Pinus radiata and Pinus sylvestris).

Study area
Our study area is located in Galicia (A region in Northwestern Spain, see Figure 1), where 69% of the land is covered by forestry surface (Xunta de Galicia, 2016). This is one of the Spanish regions where the forestry sector is most active (MITERD, 2020). The main productive species are Eucalyptus globulus, Eucalyptus nitens, Pinus pinaster, Pinus radiata and Pinus sylvestris (Xunta de Galicia, 2016).
This study was carried out in a pilot area that was selected to ensure the presence of the five main tree species harvested in Galicia. The official cartography (the MFE) was used to check their distribution in Galicia. It is presented in Figure 2. The selected area is shown in Figure 1. It covers a total area of 237.7 km 2 . The selected area has flat areas as well as areas with an incline according to the slope information obtained through the Spanish Cartographical institute (IGNa, 2021). Altitudes range from between 100 m and 700 m above sea level (IGNa, 2021).

Image data acquisition and processing
A Worldview-2 image was used. Worldview-2 is a satellite that was launched in 2009. It incorporates a panchromatic band and 8 multiespectral bands. In this study only the multispectral bands were used. Their spectral characteristics are presented in Table 1 The image used was provided by Digital Globe with atmospheric, geometric and radiometric corrections. The image reference system is WGS89. The image used in this study dates from 12-07-2019. A general view of the Worldview-2 image that was obtained is shown in Figure 3. Details of the different tree covers in the Worldview-2 images are shown in Figures 4 and 5.

Methodology
The methodology consisted of performing two supervised classifications using the image obtained from Worldview-2. The first one was intended to classify the image into general land cover classes (e.g., coniferous forest, Eucalyptus spp., other broadleaves, shrubs, crops), while the second one was designed to identify the three species of conifers present in the study area (Pinus pinaster, Pinus radiata, Pinus sylvestris). This second step was performed on the area previously classified as coniferous forest in the first step. A diagram of the methodology is shown on Figure 6. In order to perform the general classification, the first step was to select the land cover target classes. The selected classes are described in Table 2.
Class Description Eucalyptus spp.
Area covered by eucalyptus trees with adult leaves. Young Eucalyptus spp.
Area covered by eucalyptus trees, with young leaves, either at an early stage of development or due to new outgrowth.

Conifers
Area covered by coniferous trees.

Broadleaves
Area covered by deciduous trees..

Crops and pastures
Area covered by non-woody vegetation.

Shrubs
Area covered by non-tree, woody vegetation. Cuts or shrub clearings Area subjected to recent timber-cut or shrub clearing. Anthropogenic areas Artificial surfaces (buildings, roads, etc). Water Bodies of water. Table 2. Target classes used in the general classification.
Once the classes were identified, it was necessary to obtain reference data to use as training areas for the supervised classification. The reference data used were obtained through photointerpretation. Photointerpretation was performed on an aerial orthorectified image (PNOA, National Aerial Orthophotography Plan, for its initials in Spanish) obtained from the Spanish cartographical institute (IGN) (IGNb, 2021). The PNOA image was obtained through a photogrammetric flight performed in 2017. It had a pixel size of 0.25 m and a georeferencing square quadratic error in x, y of a maximum of 0.5 m (RMSEx,y <= 0.5 m). First a set of polygons was manually delineated to use the inner pixels as training areas. Once the training areas were obtained the supervised classification was performed using the random forest algorithm (Breiman, 2001). It has been shown to be a stable classifier that is easy to use and provides accurate results (Maxwell et al., 2018;Pelletier et al. 2016). It was applied using the R library Randomforest (Liaw and Wiener, 2002). The classification was performed using all available bands.
Once the classification was executed, a cross-verification was performed. The Producer's Accuracy (PA), User's Accuracy (UA) and Overall Accuracy (OA) where estimated.
• PA: the result of dividing the number of correctly classified reference points in each category by the total number of reference points in that category. This value corresponds to the map accuracy from the point of view of the map maker. It represents how often real features on the ground are correctly shown on the classified map or the probability that a given land cover of an area on the ground is correctly classified as such on the map. • UA: computed by dividing the number of correctly classified pixels in each category by the total number of pixels classified in that category. This value represents the reliability of the map or the probability that a pixel classified into a given category actually represents that category on the ground. The conifer-oriented classification started with the masking of the Worldview-2 image. The area classified as coniferous forest in the previous classification was used to generate a new raster layer containing the spectral information of the pixels covered by coniferous forest. This time, the target classes for the classification were Pinus pinaster, Pinus radiata and Pinus sylvestris. In this case the training was accomplished by combining photointerpretation and field work. The field work was planned according to information taken from PNOA images (IGNb, 2021) and the MFE. The sample plots were defined as georeferenced vector points in a Geographical Information System. They were placed in areas that, according to PNOA images (IGNb, 2021) and MFE, contained the target species in single-species, mature stands. The sample plots for each species were selected to ensure the inclusion of different topographical orientations and different slopes in order to collect all of the possible spectral variability for a given species due to varying topographical conditions. In each sample plot, the field work consisted of observing stand parameters in a circular area with a 10 m radius. The parameters were species composition, tree health indicators, canopy cover, age and height of the stand and parametrs regarding the under-canopy cover (shrubs, needles, grass, rocks, etc.).
Once the field work was performed, the data was put into digital format and reviewed. Sample plots were discarded in cases where they corresponded to non-homogeneous stands in terms of tree species, or when the stand characteristics differed significantly from those expected (e.g. leaf bleaching); the remaining plots were used to define training polygons. The training polygons were defined in the neighbouring areas of the sample plots. Then, the supervised classification was performed using the R software Randomforest algorithm (Liaw and Wiener, 2002).
The obtained classification was cross verified using points recorded in field work. The position of these points in the field was recorded with a GPS with centimetric precision. In addition to identifying the tree species present in each sampling plot or at each point, different variables (such as tree height, health status or canopy cover) were recorded as background information to aid in interpreting the results.
Due to the high speckle effect observed in the classification result, once the pine classification was obtained an additional filtering step was included. The filtering consisted of removing the raster polygons that were smaller than a given threshold size (in pixels) and replacing them with the pixel value of the largest neighbour polygon. The chosen threshold was an 8neighbourhood window. The cross-verification process was then performed again. The OA, PA and UA were estimated.

RESULTS
The general classification of the Worldview-2 for the study area was obtained. The cross-verification was performed with a set of 250 points. The distribution was random but stratified, 30 points per class, except for Water where only 10 points were allocated due to the small area covered by water in the study area. Table 4 shows the results obtained from the cross-verification. It should be noted that a very high OA was obtained, 91 %. High UA and PA (above 90%) were also obtained for the three main forestry classes: Eucalyptus spp., conifers, and broadleaves. It was even observed that young Eucalyptus spp. areas can be identified with a high accuracy rate (PA 100% and UA 88%). Classes that presented lower precision levels were the classes that corresponded with shrubs and clear-cuts or clearings. A detailed view of the results obtained, along with the information given by the MFE for the area, can be seen in Figure 7.   To accomplish the classification of Pinus species in the study area, 86 sample plots were observed through field work. Table 3 contains the number of sample plots collected for each species, the number of points that were used for training and the number of training pixels.  Table 3. Field work description. Sample plots: Number of points of each species sampled in field work. Selected sampling points: Number of points for which it was possible to obtain homogeneous training data. Training pixels: Number of pixels used as training data.
The resulting classification was visually reviewed. A high scattering effect was observed, so the classification was filtered by applying the 8-neighbourhood window. Figure 8 shows an example of the classification obtained for Pinus species without this filter and after applying the filter. A total of 77 points were randomly distributed within the classified area, 16 for pixels classified as Pinus pinaster, 35 for pixels classified as Pinus sylvestris and 26 for pixels classified as Pinus radiata. Pinus sylvestris is the least-sampled as it is the least common species in the study area.
The cross verification of the classification with and without filtering reveals that higher accuracy levels are obtained when the filter is applied. The OA is 26% higher in the filtered classification and all of the accuracy metrics have values above 70% while in the non-filtered classification only the PA of Pinus radiata exceed this threshold. Table 9 shows the results of the cross-verification performed for the non-filtered pine-species classification, while Table 10 shows the results obtained for the filtered pine-species classification. When the filter is applied to the classification, the species with the highest accuracy metrics is Pinus radiata (PA 0.93 and UA 0.89). It was observed that all the Pinus pinaster sites that were wrongly classified as Pinus radiata had heights below 10 m. According to the filtered classification there is a total of 122 ha of Pinus sylvestris, 185 ha of Pinus pinaster and 754 ha of Pinus radiata.  Table 10. Confusion matrix obtained for the filtered classification. OA (Overall Accuracy), PA (Producer's Accuracy) and UA (User's Accuracy). Figure 9 displays a comparison of the spatial information retrieved from the MFE (MAPA, 2011) for several pine stands, and the spatial information and detail given for the pine classification performed in this study. Figure 9. Comparison of the information given in a polygon from the Spanish Forest Map, MFE, (MAPA, 2011) and the information given for the same area by the classification performed in this study. Reference image Worldview-2 image in False color (7,5,2).

DISCUSSION
According to the presented results, Worldview-2 multispectral images allow for the efficient differentiation of the main forest classes that are present in Galicia with a very high level of accuracy and high spatial detail. In the study case, the results show that the 44% of the area is covered by forest tree species. These classes are important given that they may be the subject of different management policies. The tree species are mapped into two classes of productive species, eucalyptus and conifers, and another class composed of broadleaves. The productive species have in common that they are considered pyrophyte vegetation according to current Galician legislation (Ley 3/2007, de 9 de abril, de prevención y defensa contra los incendios forestales de Galicia), so they are subject to control for wildfire prevention purposes. Besides, both productive classes are commonly harvested in small parcels. However, they differ in their rotation cycles. Eucalyptus are harvested with short rotation periods, involving very high ratios of land cover changes. The high resolution of the final map obtained with the proposed methodology is essential for forest land cover mapping and harvesting monitoring purposes. Further, the obtained map also allow for the differentiation among shrubs and crops, that involve the 12.6% and 24.6% respectively of the total area under study. If the map is updated periodically, the discrimination between shrubs and crops might make it possible to detect land abandonment trends, which an important topic for forest management in Galicia (Ley 6/2011, de 13 de octubre, de movilidad de tierras).
On a species level, it is clear that a generalization process is needed since the speckle effect leads to low accuracy results. However, the solution presented here, applying a post-processing filter is not the only option. According to the literature there are other solutions that can reduce this effect and lead to higher accuracy levels in the classification. The main two solutions detailed in the literature are including texture variables in the classification (Deur et al., 2020) and changing from a pixel-based approach to an object-based one (Chuang et al., 2016;Kaszta et al., 2016). In future studies it would be interesting to test these solutions.
The filtered classification allows for the visualization of potential Worldview-2 images that could be used to discern between tree species within the Pinus genus. Accuracy levels are lower than when discerning between general classes, but Fang et al. have already reported that attempting to classify species within the same genus with Worldview images presents a challenge (Fang et al., 2020). Taking this into account, it seems appropriate to continue with the approach of first performing a general classification to ensure the correct delineation of the main classes and then going on to species classification, in fact there are other studies where taking this approach has proven to be successful (Immitzer et al., 2019).
Although accuracy metrics are presented and commented upon, it should be noted that only a small area covered by coniferous forest (1,608 ha according to the general classification) is being analyzed. It would be interesting to confirm the obtained results in a larger area. Regarding the Pinus pinaster results, it was observed that all the points wrongly classified as Pinus radiata are trees with a height of less than 10 m, this could indicate that intra-species variations due to the stage of development are affecting the classification and that in future studies it might make sense to address them as separate classes. In fact, Kukunda et al. have already reported problems classifying pine species using Worldview data due to intra species variations (Kukunda et al., 2018). The fact that some Pinus pinaster stands at an early stage of development are classified as Pinus radiata could be due to the imbalance that exists between classes and that is also reflected in the reference data (13.294 pixels were used for Pinus radiata and only 6.543 for Pinus pinaster). This imbalance is due to the small area covered by pine trees in the study area.

CONCLUSION
The described methodology shows the potential of very high resolution imagery for forest land cover classification, being especially useful in regions characterized by a high level of land property fragmentation. The training areas were obtained by photointerpretation, and the supervised classification was performed through the random forest algorithm. The validation process proved to be overall very accurate, with a PA of 91%, and a UA of over 90% for tree forest classes. Considering these results, Worldview-2 data might be considered an important tool for forest management purposes, since it can provide useful information about land abandonment and for wildfire prevention, forest harvesting validation, and land cover changes, among others.
Furthermore, this study shows the capabilities of Worldview-2 images for Pinus species classification. The random forest algorithm combined with the post-processing filter allowed Pinus sylvestris, Pinus pinaster and Pinus radiata to be mapped separately, with an overall accuracy of 83%. These results should serve as a basis for further studies aimed at evaluating Worldview-2 as a source of data for NFIs.