The SEN1-2 Dataset for Deep Learning in SAR-Optical Data Fusion

While deep learning techniques have an increasing impact on many technical fields, gathering sufficient amounts of training data is a challenging problem in remote sensing. In particular, this holds for applications involving data from multiple sensors with heterogeneous characteristics. One example for that is the fusion of synthetic aperture radar (SAR) data and optical imagery. With this paper, we publish the SEN1-2 dataset to foster deep learning research in SAR-optical data fusion. SEN1-2 comprises 282,384 pairs of corresponding image patches, collected from across the globe and throughout all meteorological seasons. Besides a detailed description of the dataset, we show exemplary results for several possible applications, such as SAR image colorization, SAR-optical image matching, and creation of artificial optical images from SAR input data. Since SEN1-2 is the first large open dataset of this kind, we believe it will support further developments in the field of deep learning for remote sensing as well as multi-sensor data fusion.


INTRODUCTION
Deep learning has had an enormous impact on the field of remote sensing in the past few years (Zhang et al., 2016. This is mainly due to the fact that deep neural networks can model highly non-linear relationships between remote sensing observations and the eventually desired geographical parameters, which could not be represented by physically-interpretable models before. One of the most promising directions of deep learning in remote sensing certainly is its pairing with data fusion (Schmitt and Zhu, 2016), which holds especially for a combined exploitation of synthetic aperture radar (SAR) and optical data as these data modalities are completely different from each other both in terms of geometric and radiometric appearance. While SAR images are based on range measurements and observe physical properties of the target scene, optical images are based on angular measurements and collect information about the chemical characteristics of the observed environment.
In order to foster the development of deep learning approaches for SAR-optical data fusion, it is of utmost importance to have access to big datasets of perfectly aligned images or image patches. However, gathering such a big amount of aligned multi-sensor image data is a non-trivial task that requires quite some engineering efforts. Furthermore, remote sensing imagery is generally rather expensive in contrast to conventional photographs used in typical computer vision applications. These high costs are mainly caused by the financial efforts associated to putting remote sensing satellite missions into space. This changed dramatically in 2014, when the SAR satellite Sentinel-1A, the first of the Sentinel missions, was launched into orbit by the European Space Administration (ESA) in the frame of the Copernicus program, which is aimed at providing an on-going supply of diverse Earth observation satellite data to the end user free-of-charge (European Space Agency, 2015).
Exploiting this novel availability of big remote sensing data, we publish the so-called SEN1-2 dataset with this paper. It is comprised of 282,384 SAR-optical patch-pairs acquired by Sentinel-1 and Sentinel-2. The patches are collected from locations spread across the land masses of the Earth and over all four seasons. The generation of the dataset, its characteristics and features, as well as some pilot applications are described in this paper.

SENTINEL-1/2 REMOTE SENSING DATA
The Sentinel satellites are part of the Copernicus space program of ESA, which aims to replace past remote sensing missions in order to ensure data continuity for applications in the areas of atmosphere, ocean and land monitoring. For this purpose, six different satellite missions focusing on different Earth observation aspects are put into operation. Among those missions, we focus on Sentinel-1 and Sentinel-2, as they provide the most conventional remote sensing imagery acquired by SAR and optical sensors, respectively.

Sentinel-1
The Sentinel-1 mission (Torres et al., 2012) consists of two polarorbiting satellites, equipped with C-band SAR sensors, which enables them to acquire imagery regardless of the weather.
Sentinel-1 works in a pre-programmed operation mode to avoid conflicts and to produce a consistent long-term data archive built for applications based on long time series. Depending on which of its four exclusive SAR imaging modes is used, resolutions down to 5 m with a wide coverage of up to 400 km can be achieved. Furthermore, Sentinel-1 provides dual polarization capabilities and very short revisit times of about 1 week at the equator. Since highly precise spacecraft positions and attitudes are combined with the high accuracy of the range-based SAR imaging principle, Sentinel-1 images come with high out-of-the-box geolocation accuracy (Schubert et al., 2015).
For the Sentinel-1 images in our dataset, so-called ground-rangedetected (GRD) products acquired in the most frequently available interferometric wide swath (IW) mode were used. These images contain the σ 0 backscatter coefficient in dB scale for every pixel at a pixel spacing of 5 m in azimuth and 20 m in range. For sake of simplicity, we restricted ourselves to vertically polarized (VV) data, ignoring potentially available other polarizations. Finally, for precise ortho-rectification, restituted orbit information was combined with the 30 m-SRTM-DEM or the ASTER DEM for high latitude regions where SRTM is not available.
Since we want to leave any further pre-processing to the end user so that it can be adapted to fit the desired task, we have not carried out any speckle filtering.

Sentinel-2
The Sentinel-2 mission (Drusch et al., 2012) comprises twin polarorbiting satellites in the same orbit, phased at 180 • to each other. The mission is meant to provide continuity for multi-spectral image data of the SPOT and LANDSAT kind, which have provided information about the land surfaces of our Earth for many decades. With its wide swath width of up to 290 km and its high revisit time of 10 days at the equator (with one satellite), and 5 days (with 2 satellites), respectively, under cloud-free conditions it is specifically well-suited to vegetation monitoring within the growing season.
For the Sentinel-2 part of our dataset, we have only used the red, green, and blue channels (i.e. bands 4, 3, and 2) in order to generate realistically looking RGB images. Since Sentinel-2 data are not provided in the form of satellite images, but as precisely georeferenced granules, no further processing was required. Instead, the data had to be selected based on the amount of cloud coverage. For the initial selection, a database query for granules with less than or equal to 1% of cloud coverage was used.

THE DATASET
In order to generate a multi-sensor SAR-optical patch-pair dataset, a relatively large amount of remote sensing data with very good spatial alignment needs to be acquired. In order to do this in a mostly automatic manner, we have utilized the cloud-based remote sensing platform Google Earth Engine (Gorelick et al., 2017). The individual steps of the dataset generation procedure are described in the following.

Data Preparation in Google Earth Engine
The major strengths of Google Earth Engine are two-fold from the point of view of our dataset generation endeavour: On the one hand, it provides an extensive data catalogue containing several petabytes of remote sensing imagery -including all available Sentinel data -and other freely available geodata. On the other hand, it provides a powerful programming interface that allows to carry out data preparation and analysis tasks on Google's computing centers. Thus, we have used it to select, prepare and download the Sentinel-1 and Sentinel-2 imagery from which we have later extracted our patch-pairs. The workflow of the GEE-based image download and patch preparation is sketched in Fig. 1. In detail, it comprises the following steps: 3.1.1 Random ROI Sampling In order to generate a dataset that represents the versatility of our Earth as good as possible, we wanted to sample the scenes used as basis for dataset production over the whole globe. For this task, we use Google Earth Engine's ee.FeatureCollection.randomPoints() function to randomly sample points from a uniform spatial distribution. Since many remote sensing investigations focus on urban areas and since urban areas contain more complex visual patterns than rural areas, we introduce a certain artificial bias to urban areas by sampling 100 points over all land masses of the Earth and another 50 points only over urban areas. The shape files for both land masses and urban areas were provided by the public domain geodata service www.naturalearthdata.com at a scale of 1:50m. If two points are located in close proximity to each other, we removed one of them to ensure non-overlapping scenes.
This sampling process is carried out for four different seed values (1158,1868,1970,2017). The result of the random ROI sampling is illustrated in Fig. 2a.

Data Selection
In the second step, we use GEE's tools to filter image collections to select the Sentinel-1/Sentinel-2 image data for our scenes. Since we want to use only recent data acquired in 2017, this first means that we structure the year into the four meteorological seasons: winter (1 December  . Each season is then associated to one of the four sets of random ROIs, thus providing us with the top-level dataset structure (cf. Fig. 3): We structure the final dataset into four distinct sub-groups ROIs1158 spring, ROIs1868 summer, ROIs1970 fall, and ROIs2017 winter.
Then, for each ROI, we filter for Sentinel-2 images with a maximum cloud coverage of 1% and for Sentinel-1 images acquired in IW mode with VV polarization. If no cloud-free Sentinel-2 image or no VV-IW Sentinel-1 image is available within the corresponding season, the ROI is discarded. Thus, the number of ROIs is significantly reduced from about 600 to about 429. For example, all ROIs that were located in Antarctica are rendered obsolete, since the geographical coverage of Sentinel-2 is restricted to 56 • South to 83 • North.

Image Mosaicking
Continuing with the selected image data, we use the Google Earth Engine in-built functions ee.Image-Collection.mosaic() and ee.Image.clip() to create one single image for each ROI, clipped to the respective ROI extent. The ee.ImageCollection.mosaic() function simply composites overlapping images according to their order in the collection in a last-on-top sense. As mentioned in Section 2.2, we select only bands 4, 3, and 2 for Sentinel-2 in order to create RGB images.
3.1.4 Image Export Finally, we export the images created in the previous steps as GeoTiffs using the GEE function Export.image.toDrive and a scale of 10m. The downloaded GeoTiffs are then pre-processed for further use by cutting the gray values to the ±2.5σ range, scaling them to the interval [0; 1] and performing a contrast-stretch. These corrections are applied to all bands individually.

First Manual Inspection
We then visually inspect all downloaded scenes for severe problems. These can mostly belong to one of the following categories: • Large no-data areas. Unfortunately, the ee.ImageCollection.mosaic() function does not return any error message if it does not find a suitable image to fill the whole ROI with data. This mostly happens to Sentinel-2, when no sufficiently cloud-free granule is available for a given time period.
The cloud-coverage metadata information that comes with every Sentinel-2 granule is only a global parameter. Thus, it can happen that the whole granule only contains a few clouds, but the part covering our ROI is where all the clouds reside.
• Severely distorted colors. Sometimes, we observed very unnatural colors for Sentinel-2 images. Since we want to create a dataset that contains naturally looking RGB images for Sentinel-2, we also removed some Sentinel-2 images with all too strange colors.
3.1.6 Tiling Since our goal is a dataset of patch-pairs that can be used to train machine learning models aiming at various data fusion tasks, we eventually seek to generate patches of 256 × 256 pixels. Using a stride of 128, we reduce the overlap between neighboring patches to only 50% while maximising the number of independent patches we can get out of the available scenes. We end up with 298,790 Sentinel-1/Sentinel-2 patch-pairs after this step.

Second Manual Inspection
In order to remove suboptimal patches that, e.g., still contain small clouds or visible mosaicking seamlines, we have again inspected all patches visually.
In this step, 16,406 patch-pairs are manually removed, leaving the final amount of 282,384 quality-controlled patch-pairs. Some examples are shown in Fig. 4.

Dataset Availability
The SEN1-2 dataset is shared under the open access license CC-BY and available for download at a persistent link provided by the library of the Technical University of Munich: https:// mediatum.ub.tum.de/1436631. This paper must be cited when the dataset is used for research purposes.

EXAMPLE APPLICATIONS
In this section, we present some example applications, for which the dataset has been used already. These should serve as inspiration for future use cases and ignite further research on SARoptical deep learning-based data fusion.

Colorizing Sentinel-1 Images
The interpretation of SAR images is still a highly non-trivial task, even for well-trained experts. One reason for this is the missing color information, which supports any human image understanding endeavour. One promising field of application for the SEN1-2 dataset thus is to learn to colorize gray-scale SAR images with color information derived from corresponding optical images, as we have proposed earlier . In this approach, we make use of SAR-optical image fusion to create artificial color SAR images as training examples, and of the combination of variational autoencoder and mixture density network proposed by (Deshpande et al., 2017) to learn a conditional color distribution, from which different colorization samples can   . Some results for colorized SAR image patches. In each row, from left to right: original Sentinel-1 SAR image patch, corresponding Sentinel-2 RGB image patch, artificial color SAR patch based on color-space-based SAR-optical image fusion, artificial color SAR image predicted by a deep generative model.

SAR-optical Image Matching
Tasks such as image co-registration, 3D stereo reconstruction, or change detection rely on being able to accurately determine similarity (i.e. matching) between corresponding parts in different images. While well-established methods and similarity measures exist to achieve this for mono-modal imagery, the matching of multi-modal data remains challenging to this day. The SEN1-2 dataset can assist in creating solutions in the field of multi-modal image matching by providing the large quantities of data required to exploit modern deep matching approaches, such as proposed by (Merkle et al., 2017) or : Using a pseudosiamese convolutional neural network architecture, corresponding SAR-optical image patches of a SEN1-2 test subset can be identified with an accuracy of 93%. The confusion matrix for the model of  trained on 300,000 corresponding and non-corresponding patch pairs created from a SEN1-2 training subset can be seen in Tab. 1. Furthermore, some exemplary matches achieved on the test subset are shown in Fig. 6.

Generating Artificial Optical Images from SAR Inputs
Another possible field of application of the SEN1-2 dataset is to train generative models that allow to predict artificial SAR images from optical input data (Marmanis et al., 2017, Merkle et al., 2018 or artificial optical imagery from SAR inputs (Wang and Patel, 2018, Ley et al., 2018, Grohnfeldt et al., 2018. Some preliminary examples based on the well-known generative adversarial network (GAN) pix2pix (Isola et al., 2017) trained on Figure 6. Some true positives achieved in SAR-optical image matching. The first row depicts the Sentinel-1 SAR image patch, while the second row depicts the corresponding Sentinel-2 optical patch as predicted by a pseudo-siamese convolutional neural network.
108,221 SEN1-2 patch pairs are shown in Fig. 7. Figure 7. Some preliminary examples for the prediction of artificial optical images from SAR input data using the pix2pix GAN. In each row, from left to right: original Sentinel-1 SAR image patch, corresponding Sentinel-2 RGB image patch, artificial GAN-predicted optical image patch.

STRENGTHS AND LIMITATIONS OF THE DATASET
To our knowledge, SEN1-2 is the first dataset providing a really large amount (> 100,000) of co-registered SAR and optical image patches. The only other existing dataset in this domain is the so-called SARptical dataset published by . In contrast to the SEN1-2 dataset, it provides very-high-resolution image patches from TerraSAR-X and aerial photogrammetry, but is restricted to a mere 10,000 patches extracted from a single scene, which is possibly not sufficient for many deep learning applications -especially since many patches show an overlap of more than 50%. With its 282,384 patch-pairs spread over the whole globe and all meteorological seasons, SEN1-2 will thus be a valuable data source for many researchers in the field of SARoptical data fusion and remote sensing-oriented machine learning. A particular advantage is that the dataset can easily be split into various deterministic subsets (e.g. according to scene or according to season), so that truly independent training and testing datasets can be created, supporting unbiased evaluations with regard to unseen data.
However, also SEN1-2 does not come without limitations: For example, we restricted ourselves to RGB images for the Sentinel-2 data, which is possibly insufficient for researchers working on the exploitation of the full radiometric bandwidth of multi-spectral satellite imagery. Furthermore, at the time we carried out the dataset preparation, GEE stocked only Level-1C data for Sentinel-2, which basically means that the pixel values represent top-ofatmosphere (TOA) reflectances instead of atmospherically corrected bottom-of-atmosphere (BOA) information. We are planning to extend the dataset for a future version 2 release accordingly.

SUMMARY AND CONCLUSION
With this paper, we have described and released the SEN1-2 dataset, which contains 282,384 pairs of SAR and optical image patches extracted from versatile Sentinel-1 and Sentinel-2 scenes. We assume this dataset will foster the development of machine learning, and in particular, deep learning approaches in the field of satellite remote sensing and SAR-optical data fusion. For the future, we plan on releasing a refined, second version of the dataset, which contains not only RGB Sentinel-2 images, but full multispectral Sentinel-2 images including atmospheric correction. In addition, we might add coarse land use/land cover (LULC) class information to each patch-pair in order to foster also developments in the field of LULC classification.