SPATIAL DATA QUALITY EVALUATION FOR LAND COVER CLASSIFICATION APPROACHES

: Data gaps and poor data quality may lead to flawed conclusions and data-driven policies and decisions, such as the measurement of Sustainable Development Goals progress. This is particularly important for land cover data, as an essential source of data for a wide range of applications and real-world challenges including climate change mitigation, food security planning, resource allocation and mobilization. While global land cover datasets are available, their usability is limited by their coarse spatial and temporal resolutions. Furthermore, having a good understanding of the fitness for the purpose is imperative. This paper compares two datasets from a spatial data quality perspective: (1) a global land cover map, and (2) a fit-for-purpose training dataset that is generated using visual inspection of very high-resolution satellite data. The latter dataset is created using Google Earth Engine (GEE), a cloud-based computing platform and data repository. We systematically evaluate the two datasets from spatial data quality (SDQ) perspective using the Analytic Hierarchy Process (AHP) to prioritise the criteria, i.e. SDQ. To validate the results, land cover classifications are conducted using both datasets, also within GEE. Based on the results of the SDQ evaluation and land cover classification, we find that the second training dataset significantly outperformed the global land cover maps. Our study also shows that cloud-based computing platforms and publicly available data repositories can provide an effective approach to filling land cover data gaps in data-scarce regions.


INTRODUCTION
While progress has been made towards the 2030 Sustainable Development Goals (SDG), according to the latest report (UN, 2019), renewed commitment is required in order to meet the targets within a decade. The report emphasizes that the lack of timely, global data across critical targets and indicators has been a hindrance to accurately monitoring progress. Such limitations tend to be concentrated in low-and middle-income countries, where development needs are highest (Phiri et al., 2019). For example, a study on SDG advancement in Africa found that over 60% of the SDG indicators lack sufficient information to properly track progress (Jerving, 2019). Furthermore, only about one-third of data used for SDG measurement are provided by individual countries (Jerving, 2019). According to the "Government Statistical Capacity" score, an indicator provided by the World Bank, the average score of the Middle East & North Africa is 59.9/100 (World Bank, 2018). As a result, estimates and outdated information are heavily relied upon for policy development and resource allocation (Saah et al., 2019).
"Without data, we drive blind -policies are misdirected and progress on the road to development is stunted. We must all act urgently to close the 'data gap,' if indeed we aim to leave no one behind." -Mo Ibrahim, chair of the Mo Ibrahim Foundation (Jerving, 2019) Given the pervasiveness of the data shortage and the immense effort that would be required to address the full scope of indicators, prioritisation of the most important ones is crucial. Land cover, for example, is considered an Essential Climate "Variable (ECV) because land cover data is used for a wide range of purposes, including land use planning, natural resource * Corresponding author management, climate change mitigation, and food security planning (Sessa and Dolman, 2008;Saah et al., 2019).
Many developing countries may suffer from the lack or poor of technical skills institutional capacity, data and computational infrastructure to produce timely, high quality, fit-for-purpose land cover maps. Global land cover datasets tend to be used while their usability is often limited to their spatial and temporal resolutions, classification schemes, and global and regional inconsistencies (Fritz and See, 2008;Verburg et al., 2011;Saah et al., 2019). For example, a comparison of nine different land cover datasets found only 2.5% full agreement for cropland in Africa (Pérez-Hoyos et al., 2017. New and continuous advances in cloud computing and data storage have enabled the development of new tools that potentially address many of the aforementioned issues. The ease of analysis and access to data, provided by cloud-based computing and data repository services, such as Google Earth Engine (GEE), enable the on-demand generation of land cover classified maps and facilitate the measurement of development indicators. For example, GEE is being used to monitor forest degradation in response to SDG indicator 15.1 regarding the conservation and sustainable use of forest and other related land covers (Mondal et al., 2020).
Considering the urgent need to fill the data gap in developing countries, we explore two approaches to generating land cover maps using land cover classification, public datasets, and free cloud computing services. Specifically, we evaluate two different training datasets: (1) global land cover maps, and (2) a fit-forpurpose training dataset that is generating using visual inspection of very high-resolution satellite data.
training data and validation information. A second advantage is that the dataset was built specific to the study area. However, a disadvantage to this dataset is that fewer classes can be produced from visual interpretation.

Definition of Spatial Data Quality Criteria
Spatial Data Quality can have a direct impact on many decisionmaking processes, e.g. choosing among datasets for specific applications or services. Although different organisations may identify SDQ with different characteristics and criteria (Devillers et al., 2007), there is a general agreement on the 'famous five' internal criteria, namely positional accuracy, thematic accuracy, temporal accuracy, logical consistency, and completeness (Guptill and Morrison, 2013). Usability, an external factor that considers the perspective of the end-user or application, is also commonly used (Bruin et al., 2001).
This paper considers the six aforementioned criteria, which were defined in the context of our research object of on-demand, fitfor-purpose land cover classification. For example, spatial resolution can be considered as the closest equivalent for positional accuracy or thematic accuracy is considered as how accurately the correct land cover classification is assigned. See Table 2 for further detail.

Criteria Definition
Positional Accuracy Spatial resolution

Thematic Accuracy Classification accuracy
Temporal Accuracy Temporal resolution

Logical Consistency
Consistency across dataset (E.g. composite imagery might have two pixels side by side that represent different months)

Usability
Relevant classes Table 2. Definitions of spatial data quality criteria

Analytic Hierarchy Process
Analytic Hierarchy Process (AHP) is one of the Multi-Criteria Decision Making (MCDM) processes, which derives ratio scales from paired comparisons between (qualitative and quantitative) criteria and factors (Saaty, 1988). AHP is a powerful, flexible, and simple tool. This enables the decision-making process to include almost any kind of criterion and can be applied to many real-world decision-making problems, however, one of the most widely used applications of AHP is selecting among competing alternatives in a multi-objective environment. It is based on the well-defined mathematical structure of consistent matrices and their associated right-Eigenvectors ability to generate true or approximate weights. To do so, our AHP methodology compares both spatial data quality criteria, and alternative datasets in a pairwise manner (Saaty, 1988). The AHP converts preferences into ratio-scale weights that are combined into linear additive weights for the associated alternatives. The results of AHP is the importance of each SDQ criterion for the purpose of land cover classification. AHP also produces an inconsistency ratio reflecting the inconsistency of pairwise comparison.

SDQ Criteria Prioritisation
Considering our research goal is to generate on-demand, fit-forpurpose land cover maps, we conduct a criteria prioritisation using the AHP method, i.e. a pair-wise comparison of the importance of criteria for the purpose of land cover map generation. for example, the ratio between the spatial (or temporal) resolution of the two datasets is the normalised ratio of the two values. For some of more qualitative criteria, such as logical consistency, experts were asked to compare the two criteria. Consistency Ratio: 7% Table 3. SDQ criteria prioritisation using AHP

Evaluation Pathway
In the following section, we evaluate each of the training datasets using the SDQ criteria, as defined in Table 2, using the AHP generated criteria weightings, as provided in Table 3. The training dataset with the highest score is considered best according to our needs.
We then conduct land cover classifications using both the training datasets to evaluate whether the results of our SDQ evaluation are valid. This pairwise comparison may not be based on expert opinion as most of the criteria have some measurable and quantitative values. For example, it is possible to compare the spatial resolution based on the input data pixel size.

SDQ Evaluation of Training Datasets
The results of the SDQ evaluation indicate that the VISD dataset is 6.12 times more suitable to our research goals. Although completeness and thematic accuracy were the highest weighted criteria, we scored the two datasets almost equally on both those criteria. The key differences were in the positional accuracy and temporal accuracy. The spatial resolution of the VISD dataset, which is vector data, is significantly higher than the 500m spatial resolution of the MODIS dataset. Similarly, the MODIS dataset has a temporal resolution of one per year, while the VIDS dataset can be generated for any week of the year (if not more) considering the weekly revisit times of public optical imagery (e.g. Sentinel-2). Finally, in terms of logical consistency, VISD also scores higher because the MODIS land cover data product is based on yearly composite imagery, which means a pixel may represent a different month compared to an adjacent pixel. This can be controlled in the VISD data if a single-day high-resolution imagery is used.
In summary, the VISD dataset seems to be the best fit for our land cover classification. However, in the following subsections, we validate our result by conducting land cover classification using both datasets.

Land Cover Classification
We use GEE to conduct the land cover classifications for several reasons. It is free for the research, education, and non-profit sectors so this may ensure that users in developing countries can access the service. GEE uses parallel processing, which allows users to run complex algorithms on large satellite datasets, a critical challenge in traditional computational settings. An online integrated developer environment (IDE) enables users to run algorithms and visualise results on a map display in the web browser, preventing the need to download software and to set up a local working environment.
GEE's public data archive, which includes data from key programs such as Landsat, MODIS, and Sentinel, extends 40 years with new images ingested on a daily basis. Any dataset can be quickly viewed and imported into the working environment without needing to download files, addressing broadband concerns. Crucially, this also allows users to analyse large datasets -defined in a spatial sense, as entire countries or continents, or a temporal sense, as in multiple-decade datasets with weekly imagery.
While present-day data gaps are likely most urgent, land cover change over time is also important to study. For this reason, we use Landsat data for the classification due to its extensive temporal range, starting in 1972, and its present-day continuity.
We initially compare three classification methods, namely random forests (RF), support vector machines (SVM) and Classification and Regression Tree (CART) classifiers. They tend to be the most commonly used land cover classification methods with many studies using only RF (Azzari and Lobell, 2017;Huang et al., 2017;Goldblatt et al., 2018;Hassan et al., 2018;Hermosilla et al., 2018;Naegeli de Torres et al., 2019;Zhou et al., 2020), while others use RF in addition to other methods, such as SVM and CART classifiers (Johansen et al., 2015;Goldblatt et al., 2016;Shelestov et al., 2017;Xiong et al., 2017). We find that RF forests produced the best results, in terms of overall accuracy levels. Figures 2 and 3 show the results of the land cover classifications, using RF and both training datasets. Note that while an effort was made to match the classification colour schemes of the two maps, MODIS has more land cover classes, hence more colour shades.

Discussion
We find that both training datasets successfully produced land cover maps at 30m spatial resolution (the resolution of the Landsat images) for the study area.
The overall accuracies of the classification were 0.88 for the VISD training data and 0.747 for the MODIS dataset. However, this measures the accuracy of the classification given a random sample of the training dataset.
Arguably, given the absence of in-situ observations, the VISD dataset can be considered near-ground truth data given that it was built specifically for the purpose of the land cover classification of the study area. However, to ease juxtaposition, we plot the proportions of the land cover classes in Figure 4.
The MODIS land cover classes are more numerous as compared to the VISD land cover classes. To facilitate assessment, we combine several of the MODIS land cover classes as follows:  Considering that the VISD results is the closest to ground truth, we find that the MODIS map overestimates non-forest vegetation and crop and underestimates water/wetland, built-up areas, and forest cover. The inconsistency of the MODIS dataset is well documented (Fritz and See, 2008;Fritz et al., 2011;Verburg et al., 2011). For example, a study found that MODIS overestimated forest cover in Colombia by approximately 50% (Fritz and See, 2008).
In summary, the land cover classification results confirm the SDQ evaluation results. Both indicate that creating a specific-tostudy dataset using high resolution imagery is better than using global land cover datasets, considering the goal of generating high quality, timely, fit-for-purpose land cover datasets in data scarce environments.

CONCLUSIONS
This paper aims to (1) provide a systematic evaluation and a better understanding of the pros and cons of using the two datasets from spatial data quality measures for the purpose of land cover classification, and to (2) conduct land cover classifications for regions that lack appropriate and sufficient data and evaluate the results of this mathematical and pairwisebased process.
We address the first research goal by defining the SDQ criteria in the context of our study, prioritising the criteria using the AHP methodology, and finally evaluating the two datasets according to the weighted criteria. The results of the evaluation suggested that the VISD approach was more appropriate to our objectives, from a spatial data quality perspective. We then validate the findings by addressing the second research goal by conducting land cover classifications using both datasets and comparing the results.
Our research shows that cloud-based computing platforms, such as GEE, and public remote sensing datasets, including Landsat, Sentinel, MODIS, etc. can be used to fill critical data gaps in developing countries, which would facilitate SDG measurement, and, ideally, fulfilment.
We are planning to progress this research further to build a userfriendly tool that can conduct land cover classification for any areas of interest. Our overall goal is that users in developing or developed countries would be able to use the tool to generate data according to their needs.
In terms of training data, we are considering existing and to-bedeveloped crowdsourced datasets. This is a growing body of literature, especially in spatial data, to which we will contribute.
Finally, while measuring land cover is integral to many applications, we aim to use the results of the tool to study land cover change, from a historical perspective and more recent change detection. Goldblatt