DEVELOPMENT OF A DATABASE FOR BENCHMARK DATASETS IN PHOTOGRAMMETRY AND REMOTE SENSING

: Data are a key component for many applications and methods in the domain of photogrammetry and remote sensing. Especially data-driven approaches such as deep learning rely heavily on available annotated data. The amount of data is increasing significantly every day. However, reference data is not increasing at the same rate and finding relevant data for a specific domain is still difficult. Thus, it is necessary to make existing reference data more accessible to the scientific community as far as possible in order to make optimal use of it. In this paper we provide an overview of the development of our photogrammetry and remote sensing specific Be nchmark Me tadata Da tabase (BeMeDa). BeMeDa is based on MongoDB, a NoSQL database system. In addition, the development of a user-oriented metadata schema serves for data structuring. BeMeDa enables easy searching of benchmark datasets in the field of photogrammetry and remote sensing.


INTRODUCTION
The evaluation and comparison of newly developed methods and algorithms constitutes an important part of the scientific process. Benchmark datasets enable the development and testing of algorithms and thus make a valuable contribution to improve transparency and traceability (Long et al., 2020). Benchmark datasets and thereby benchmarking are used in numerous fields of applications, such as computer evaluation (Walters, 1976) or management applications (Išoraitė, 2004). Especially with the advent of deep learning, the importance of qualitative and quantitative data has increased (Munappy et al., 2019). Therefore the number of new extensive benchmark datasets constantly increases; this also applies to the remote sensing domain (Long et al., 2020).

Photogrammetry and Remote Sensing domain
The popularity of benchmarking in photogrammetry and remote sensing is shown by the analysis of a literature database search by Bakuła et al. (2019). For the evaluation of newly developed methods benchmark datasets are usually used. In the benchmarking process these datasets can be evaluated on the basis of their accuracy, sensitivity, effort and transferability by measuring, for example, execution time or memory requirements (Walters, 1976).
In order to perform such comparisons, benchmark datasets have to meet certain criteria compared to ordinary datasets (Hall, 2019). One of the key elements of benchmark datasets is the availability of reference data. For remote sensing, for example, this refers to the importance of the quality of annotated data (Long et al., 2020). However, compared to the available data, qualitative reference data are rare. Hence the availability of benchmark datasets is highly dependent on the application domain. For example, with the trend toward autonomous driving, * Corresponding author many of the benchmarks in photogrammetry and remote sensing address this environment, such as the well-known Kitti benchmark dataset (Geiger et al., 2012).
The challenge, however, is to find benchmark datasets relevant for one's own research, especially for young researchers (Brickley et al., 2019). In particular, very specific benchmark datasets exist in photogrammetry and remote sensing (e.g. multiplatform photogrammetry (Nex et al., 2015)) which are only known by a small expert group. Furthermore, there is a trend for such benchmark datasets to be of interest to researchers from other domains (Brickley et al., 2019). The current search options include, but are not limited to, querying scientific literature databases using corresponding keywords such as 'benchmark' and 'remote sensing' (Bakuła et al., 2019;Long et al., 2020). However, this requires the description of a benchmark dataset in a published scientific paper with corresponding keywords that are subsequently added to the literature database (e.g. Scopus). In contrast, the Google Dataset Search offers the possibility to find open data by a web crawler, but relies on qualitative metadata (Brickley et al., 2019).
With these type of tools a specific effective benchmark dataset search in the photogrammetry and remote sensing domain is difficult. Furthermore, the search results have to be filtered manually. The use of general keywords such as 'remote sensing', 'photogrammetry' and 'benchmark' results in very different applications (Bakuła et al., 2019). Moreover, not all found paper include open data and fit to the desired application. Additionally, metadata are not yet provided by all providers of specific photogrammetry and remote sensing benchmark datasets. Without such metadata, web crawler are unable to find the desired benchmark datasets.
Much more domain specific search tools focus on search options that are thematically relevant. For example, in the field of remote sensing, the EOD platform (Earth Observation Database, 2022) developed by IADF TC 1 allows filtering by sensors, tasks and locations. However, the included datasets are dependent on the input of the datasets by crowd working.

Related Work
For research and benchmarking suitable datasets are necessary. One possible source is to find them through search queries. Nevertheless, such a search is based on keywords and metadata (Chapman et al., 2020). This also applies to the Dataset Search from Google, developed since 2016 (Brickley et al., 2019;Benjelloun et al., 2020). Despite large growth, the datasets included depend on the quality and availability of metadata. With the increasing amount of data, the importance of using unique identifiers for datasets such as digital object identifier (DOI) is also increasing (Abe et al., 2014).
Databases of datasets can perform multiple tasks. One of them is to improve the usage of the datasets through a wider distribution or more extensive evaluations (Tohyama et al., 2008). Some of these databases are based on the acquired data by including them in the database (Quiring et al., 2016). Others, in contrast, are using only metadata in their database to improve the use and sharing of the data to avoid conflicts with ownership and permission (Abe et al., 2014;Lofstead et al., 2019;Tohyama et al., 2008). An example for a database related to the geospatial data domain is presented by Abe et al. (2014). To optimize databases for metadata, Lofstead et al. (2019) investigated NoSQL database systems.
For the collection and provision of datasets via metadata, the attributes they contain usually follow a predefined schema. For example, Brickley et al. (2019) uses, among others, the dataset type from schema.org (Data and Datasets -schema.org, 2021). Specialized databases use attributes developed for their purpose (Tohyama et al., 2008;Abe et al., 2014).
From the user's point of view, it is not enough just to create the databases and collect the information the datasets contain. Instead, the query functionalities have to be taken into account (Abe et al., 2014;Tohyama et al., 2008). This leads to different query possibilities and an appealing user interface (Tohyama et al., 2008).

Contribution
Our aim is to design and implement a database of benchmark datasets (Benchmark Metadata Database, BeMeDa) for remote sensing and photogrammetry applications. On the one hand, this should considerably simplify the search for suitable benchmarks, and on the other hand, increase the visibility of existing benchmarks in these research fields. In this regard, the use of benchmarks can be increased, thus improving comparability. A standardization of the properties is carried out by defining certain categories. This can be used to achieve a better comparability of different benchmarks and to facilitate the selection of suitable benchmarks. In particular, our database also considers datasets not already explicitly designated as benchmarks, but fulfill our criteria to be considered as such. The use of a NoSQL database allows flexibility to easily implement adaptions if necessary. This paper is organized as follows. At first we present an overview about database systems in section 2. In section 3 we introduce our procedure for the benchmark database. The implementation is presented in section 4. Finally, section 5 contains the discussion and section 6 the conclusion.

SQL versus NoSQL
Database systems are divided into relational and non-relational. While relational database systems make use of the SQL query language non-relational databases do not provide a SQL interface and are called 'Not only SQL' (NoSQL). (Meier and Kaufmann, 2019). Both database types store data. Thereby, entity-relationship (ER) models help to structure the data. For relational databases, such predefined schemas are necessary and fixed. Therefore, the model design must be done carefully. In contrast, NoSQL databases are schema-free (Meier and Kaufmann, 2019). Nevertheless, ER models represent the structure of data in an understandable and graphical way and thus also help to understand storage for NoSQL databases (Kaur and Rani, 2013). Furthermore, it is also possible to store unstructured data in NoSQL databases. The processing performance of NoSQL databases is more efficient, especially for a huge amount of data (Meier and Kaufmann, 2019). NoSQL database types are vertically and horizontally scalable. This allows to increase not only the number of elements entered in one server, but also to distribute them to several parallel servers (Kaur and Rani, 2013). In contrast to the previous advantages (e.g. high performance), a loss of consistency must be accepted with NoSQL (Meier and Kaufmann, 2019). Furthermore, there is no equivalent for NoSQL to SQL as a standard query language. Therefore, many users use APIs (Kaur and Rani, 2013).

Document stores
There are different specializations of NoSQL databases, such as key-value or column stores (Martins et al., 2021). Other types are document oriented stores. Thereby, documents save (semi-) structured data (Kaur and Rani, 2013;Meier and Kaufmann, 2019). Each document has its own identification value. In each document, the associated data is structured in a specific way e.g. using JSON format (Meier and Kaufmann, 2019). Thus, a very high flexibility is possible. The document ID is stored as a key-value pair and the data itself in a attribute-value manner (Meier and Kaufmann, 2019). Document databases are able to process many heterogeneous data (Meier and Kaufmann, 2019). Furthermore, only available attributes are included in the individual documents. If information is missing, empty fields can be omitted (Kaur and Rani, 2013).
There are numerous document oriented database software systems that can be used for free. These include, for example, Couchbase, CouchDB and MongoDB (Martins et al., 2021). The comparison presented by Martins et al. (2021) shows the best overall result for MongoDB software. Additionally, Mon-goDB is widely used. The graphical user interface 'Compass' support the use of MongoDB (MongoDB, 2021).

Benchmark criteria
As mentioned in section 1, it is necessary to distinguish between dataset and benchmark datasets. Accordingly, in this section the benchmark criteria are defined. If a dataset meets these criteria, it can be included in our database. The starting point for the definitions is the contribution of Hall (2019). In our database a benchmark dataset: 1. includes reference data e.g. label, check points, etc.
2. has published results for a specified task e.g. classification accuracy 3. has a documentation of the data acquisition e.g. via published paper 4. provides free data access e.g. via dataset website With this specifications it is possible to perform systematic evaluations and comparisons, for example with the use of public platforms (Long et al., 2020).

ISPRS keywords
Each ISPRS Congress paper identifies a few specified keywords. These keywords and corresponding papers were parsed exemplary for Commissions I to IV for the 2021 edition of the XXIV ISPRS Congress (ISPRS, 2021) ( Figure 1) to determine important topics, the remote sensing and photogrammetry community is interested in. Due to the variety of different notations of the keywords the arrangement in groups is based on the count of the substrings of each keyword. In this regard the keywords and their corresponding scientific papers are extracted for each commission. As a result of the arrangement process keywords that exhibit the same count of substrings and a similar structure, i.e. upper vs. lower case as well as singular vs. plural form of the substrings belong to the same group. Based on the generated groups the corresponding count of scientific papers is summed up. Figure 1 visualizes the four most frequent keywords for each commission. The depicted keywords represent single or groups of keywords that contain different notations. However, this approach does not allow the creation of keyword groups if keywords exhibit a similar meaning but a different count of substrings. This causes the creation of unnecessary groups for keywords that exhibit a similar meaning (e.g. 'UAV' vs. 'Unmanned Aerial Vehicle (UAV)').
Based on this keyword grouping, an analysis of current and past trends may be performed. The frequency of the keywords 'machine learning' and especially 'deep learning' illustrates the commission-independent importance of reference and benchmark data.

Metadata
For the use of metadata depending databases, a predefined metadata schema simplifies the database construction. Hence, a suitable metadata scheme for photogrammetry and remote sensing has to be found. Possible evaluation options for sensors, geospatial data recorded by a specific sensor and algorithms are provided by Bakuła et al. (2019). The keywords analysis from section 3.2 supports the statement by Bakuła et al. (2019) and provides additional information, which metadata are relevant in the photogrammetry and remote sensing domain. In particular, the keyword analysis reavealed that the use of different platforms (UAV, Sentinel-2 (Figure 1)) and machine learning tasks such as semantic segmentation are important. Various metadata are derived from this analysis, e.g. 'sensors' or 'platforms'. Relevant basic metadata attributes are derived from the 'Dataset' type  documented by schema.org (Data and Datasets -schema.org, 2021).
Based on this, our metadata considers the different requirements of the photogrammetry and remote sensing community. An ER model (Figure 2) was developed to assist with data structuring. This model illustrates that numerous of our attributes are multivalued. In contrast to relational databases, document stores like MongoDB allow inserting such multiple values into a single attribute and no normalization is necessary.
In addition to Figure 2, Table 1 and Table 2 contain our metadata attributes. Thus the metadata distinguishes between different sensors and platforms. Furthermore, the dimensionality and special acquisition configurations specify the data properties. The areas of application for which a dataset is designed for are defined by the tasks. For further information, the paper presenting the dataset is included. More general attributes identify the benchmark dataset by its name, URL, publication year and, if available, an unique identifier (UID). The URL can also be used to provide a possibility to download the dataset. Finally, the environment and a short description provide an insight into the applications the benchmark dataset can be used for.
For a comparison to the existing schema.org attributes for datasets (Data and Datasets -schema.org, 2021), a brief presentation is given below. On the one hand schema.org metadata includes some general attributes, which are also included in our approach.  which are nonspecific and incomplete to represent all the different acquisition methods and measurements in photogrammetry and remote sensing. While the measurement technique can still be assigned to the sensor attribute, the measured variables are indirectly derived from the dimension, sensor and acquisition configuration attributes. The measured variables are often not explicitly stated and a certain level of expert knowledge is assumed. For our domain, a stronger differentiation of the properties facilitates the subsequent filtering in the search query (section 4.4). Table 3 contains an overview about the described correspondences between specific schema.org dataset attributes and our metadata schema. Moreover, schema.org dataset attributes are extended by further attributes inherited from parent object types e.g. 'abstract' as a type of short description. However, these attributes are also as basic as possible and do not reflect the desired search criteria.
The desired metadata information are often not directly available. Instead, the attribute values are extracted from the respective website or scientific paper. For our approach we introduce a generalization for the different multi-valued attributes. This simplifies the extraction process. To get an idea which attribute values the specified attributes accept,

IMPLEMENTATION
The implementation of the database consists of two main components: the Back-End and the Front-End. The Back-End contains the database itself and the Rest-API. The Front-End, on the other hand, contains a user-friendly interface for the communication with the Back-End. Below are the details about the respective implementations. A demo version of the implemented database will be linked on our website 2 .

Initializing the database
To initialize the database, a manual search for benchmark datasets was performed. We used a variety of keywords, like the ones presented in section 3.2, for manual search in established search machines to find datasets in our field. A short evaluation on the datasets was performed to check if they classify as benchmarks (section 3.1). Indications of this were, for example, the availability of training and reference data. The benchmark datasets were further analyzed to manually create metadata for each dataset according to section 3.3.

BeMeDa Back-End
The Back-End is primarily responsible for data management and processing. Our Back-End consists of two applications running simultaneously. The first one is the MongoDB database and the second one is a python-based middleware application, which works as a link between the Front-End and the database. Furthermore we designed our Back-End regarding to REST-Architecture to ensure that our application is scalable for future extensions. In particularly we used FastAPI-Framework to achieve this goal. After installation, new collections are defined. A collection summarizes multiple documents and is analogous to tables in relational databases (MongoDB, 2021). The first collection includes the benchmarks itself. The second collection contains the authors of the papers. To transfer the model from section 3.3 into MongoDB documents, the use of attribute pattern is particularly suitable. This primarily concerns on the similarities between multi-valued attributes. Exceptions in this case are the 'tasks' and 'sensors' categories. These two attributes are considered as mandatory. All other multi-valued attributes are subordinated in a common 'features' field. This means that fewer indexes are needed, which makes queries more efficient (MongoDB, 2021). An example of a benchmark document is presented in Figure 3. The collected benchmark datasets and related papers used to initialize the database are saved as CSV files. The processing of the data is done with Python using pymongo as an interface between MongoDB and Python. Thus, database queries are also made via this interface. Furthermore JSON objects are used as data exchange format between Backand Front-End.

BeMeDa Front-End
For easy access to the database we built a Front-End with a graphical user interface. Therefore, we used HTML, CSS and Javascript and built our Front-End according to VueJS and Vuetify-Frameworks. With our graphic interface (Figure 4, Figure 5) it is possible to filter the database according to the attributes and attribute values, defined in section 3.3, via lists of checkboxes. The current state of our database is presented in Figure 4 that visualizes statistics about current parameters and their distribution of the already included benchmark datasets. Figure 5 demonstrates the results of a search query. In addition to the benchmark name, URL and publication year the accompanying information for tasks and sensors as well as a short description of the dataset and additional attributes are shown. A text-based input for the search for benchmarks is implemented as well. Currently, the text-based search is only possible with single words. The readout of the filter checkboxes is performed with a JavaScript function. This function saves the filter inputs as a JSON-file and hands it over to the Back-End. The JSON file contains the

Queries
As described in the previous section 4.3, there are two ways to search for benchmark datasets. The first possibility constitutes the use of the text search field. However, this search is currently limited to single inputs such as the name of the benchmark. The previous restrictions on free text search are based on the assumption that the user is unaware of the benchmark datasets. Thus, the main focus of the search is the selection of the desired filters, the second search option. Furthermore, it is possible to combine both search options. The different query cases are described below.

Select none:
In this case, neither one of the filters is selected nor an entry is made in the search field. This keeps the search unrestricted. Thus the entire database is returned as the result of this query.

Text search:
When using the text field search, the entire input is passed to the Back-End as one string. As already mentioned, to simplify further processing, the input is limited to only one search criterion. This allows searching for the entire string in the database. To find matches, correct spelling is necessary.

Select filter:
In contrast to the previous cases, in this case the user selects some of the predefined filters. For example, Figure 5. Example of a query with a search filter for all benchmarks with at least a Camera and a Lidar sensor.
it is desired that the benchmark data was captured with a camera from an UAV. With the filters selected accordingly, all datasets that meet these two criteria are filtered out. However, the filters do not have an exclusionary effect. With reference to the example, this means that data sets are also displayed that contain, for example, Lidar data in addition to the desired camera images.

Select filter and text search:
Finally, by combining text search and filter selection it is possible to apply additional search criteria beyond the predefined filters. For example, a user selects multi-temporal data for indoor environment. By additionally entering e.g. a year of publication in the text field, the search results of the filter inputs can be limited to this year.

DISCUSSION
BeMeDa offers numerous advantages compared to the current more general search capabilities, because BeMeDa is optimized for the photogrammetry and remote sensing domain. The attributes defined in Table 1 are specialized for this area. This allows a much more targeted search than, for example, with the Google Dataset Search and is also more extensive than with the EOD Platform (Earth Observation Database, 2022). Our scheme also includes unique identifiers, in contrast to the EOD platform. This makes our database more robust in case a website link of a dataset changes. However, based on our defined attributes, some expertise is helpful. Nevertheless, the attribute value-based filters in the user-oriented Front-End make the search more accessible for the user, even to non-specialists.
BeMeDa includes not only the reference to the dataset itself, but also relevant metadata helping with selection of the benchmark dataset for specific application. For these metadata, there could be a need for frequent adjustment, which is related to the developments on the sensors market and progress of photogrammetric and remote sensing approaches. It requires high flexibility in updating the metadata structure. This requirement was considered while developing the structure of the database. By using NoSQL, the database can be adapted to future technologies and methods. Current development stage of BeMeDa, however, features some limitations and open tasks. Since the manual search for benchmark datasets and the compilation of the CSV file is very time consuming, BeMeDa so far only contains a small number of datasets. Thus, a high completeness cannot be achieved with this approach. This could be improved by the employment of advanced text processing tools including machine learning techniques. This issue, however, was not considered in the scope of this paper.
Further the development of BeMeDa has a large potential for the analysis of existing benchmarks and can support the photogrammetric and remote sensing community in defining gaps. On the one hand, these gaps may concern the publication process of benchmark datasets e.g. the use of metadata. On the other hand, as the size of the database increases, certain research fields can be identified where the selection of benchmark datasets remains difficult. These gaps can be used to create a road map for establishing new benchmark datasets.

CONCLUSION & FUTURE WORK
In this paper, we present an approach for collecting and searching benchmark datasets especially for photogrammetric and remote sensing applications. Previous search approaches for benchmark datasets are limited to high level of manual effort and, therefore, are usually very time-consuming. At the same time, the demand for data continues to grow strongly. In order to make progress in the development of new methods, it is essential to create comparable conditions. Therefore, the use of benchmark datasets is particularly important. With our BeMeDa presented here, we have succeeded in achieving a significant simplification in the search for benchmarks.
Nevertheless, there are numerous options to improve and extend BeMeDa. First of all, it is necessary, to make the search functionalities more flexible. Especially with an advanced processing of the text field input to enable the search for multiple attribute values as free text. The completeness of the database is of equal importance. In the future, we aim to add more benchmark datasets automatically or at least semi-automatically. The latter can be achieved, for example, by using a web form. This also allows to check the quality of the entries. Furthermore, it is possible to enrich the database with more information about the individual benchmark datasets. This includes, for example, details of the used methodology and an indication of the distribution in other papers. However, both are associated with an increased text processing effort.