ESTIMATION OF WATER QUALITY IN A RESERVOIR FROM SENTINEL-2 MSI AND LANDSAT-8 OLI SENSORS

The low operational cost of using freely available remote sensing data is a strong incentive for water agencies to complement their field campaigns and produce spatially distributed maps of some water quality parameters. The objective of this study is to compare the performance of Sentinel-2 MSI and Landsat-8 OLI sensors to produce multiple regression models of water quality parameters in a hydroelectric reservoir in Brazil. Physical-chemistry water quality parameters were measured in loco using sensors and also analysed in laboratory from water samples collected simultaneously. The date of sampling corresponded to the almost simultaneous overflight of Sentinel-2B and Landsat-8 satellites which provided a means to perform a fair comparison of the two sensors. Four optically active parameters were considered: chlorophyll-a, Secchi disk depth, turbidity and temperature (the latter using Landsat-8 TIR sensor). Other six optically non-active parameters were also considered. The multiple regression models used the spectral reflectance bands from both sensors (separately) as predictors. The reflectance values were based on averaging kernels of 30 m and 90 m. Stepwise variable selection combined with a priori knowledge based on other studies were used to optimize the choice of predictors. With the exception of temperature, the other optically active parameters yielded strong regression models from both the Sentinel and Landsat sensors, all with r > 0.75. The models for the optically non-active parameters produced less striking results with r as low as 0.03 (temperature) and as high or better than > 0.8 (pH and Dissolved oxygen).


INTRODUCTION
The use of remote sensing techniques to assess water resources started in the 60's with the availability of the first digital satellite images (Mertes et al., 2004). Initially, it was the presence of chlorophyll-a (chl-a) in the water and the water surface temperature that motivated the use of such techniques and also motivated the study of optical properties of water components to determine if these water quality related parameters could be estimated remotely (Mishra et al., 2017). It is the relation between the spectral behaviour of water and the presence of optically active components (OAC) that make remote sensing and digital image processing valuable tools to monitor the quality of water. The fact that water has a very distinctive spectral signature, characterised mostly by its strong absorption in the near and shortwave infrared makes it easily detectable using these wavelengths (Teodoro, 2016). The interaction between solar radiation and OAC alters the spectral properties of water and can therefore be used as indicators of water quality (Barbosa et al., 2019).
Traditional methods for monitoring water quality parameters involving the extraction and analysis of water samples are known to consume a lot of time and resources (Karaoui et al., 2019, Yepez et al., 2018 and demand very specialised labour (Giardino et al., 2010), even more so for large reservoirs. In comparison, remotely monitoring some water quality parameters using satellite images can largely reduce these costs while providing spatially distributed estimates with a higher * Corresponding author frequency in time. A large number of satellite missions systematically acquire image and point data of the Earth with a variety of spectral and spatial resolution. Some of these satellites can be used to estimate water level through radar altimetry while others equiped with optical sensors produce image data that can be used to estimate water quality parameters. Within the realm of missions providing systematic optical image acquisition, the Landsat (National Air and Space Administration -NASA) and Sentinel-2 (European Space Agency -ESA) programs offer high resolution optical image products in a free and open data policy. The Landsat-8 satellite (launched in 2013) is equipped with the Optical Land Imager (OLI) and offers a revisit frequency of 16 days while Sentinel-2A (launched in 2015) and 2B (launched in 2017) is a twin satellite constellation equipped with the MultiSpectral Imager (MSI) providing together image data every 5 days almost anywhere.
Recent research in many parts of the world involving water quality have made use of data from Landsat-8 (Liu, Wang, 2019, González-Márquez et al., 2018, Mushtaq, Nee Lala, 2017, Sentinel-2 (Karaoui et al., 2019, Potes et al., 2018 or both (Dutra et al., 2019, Yadav et al., 2019, Watanabe et al., 2018. Table 1 gives a list of some of these authors with the sensor they used and the parameters they analysed. In parallel to the development of remote sensing applications for water quality, spatial modelling together with statistical inference have become common use for creating distributed models of bio-optical parameters of water quality. Correlation as well as simple and multiple regression analysis have been used widely for this purpose (Mainali et al., 2019). In Brazil, the legislation requires managing agencies of large reservoirs to periodically monitor a number of water quality parameters that they must make publicly available to society, regardless of their use (water supply, energy, leisure, etc.). This is the case of the Três Marias Reservoir in Minas Gerais constructed in the 50's mostly for regulating the discharge of the São Francisco River and to generate hydroelectric power (Cachapuz, 2006). Up to now, this has been performed by traditional in situ methods of water sampling and laboratory analysis. The time and efforts allocated to in situ monitoring hinders actions of environmental preservation and control in a satisfactory time frame, especially given the peculiarities and size of such reservoirs (Karaoui et al., 2019).

Author
The high costs associated with this practice has brought the CEMIG (Companhia Energética de Minas Gerais) to investigate alternate methods of monitoring using remote sensors both on satellite and aerial platforms. The objective is to optimise the in situ campaigns and to reduce the number of parameters requiring laboratory analysis and to provide a distributed version of some of these parameters, mostly for the OACs. Within this general workframe, our objective is to evaluate which parameters and to what extent the sensors on-board the Landsat-8 and Sentinel-2 satellite can provide satisfactory estimates. In this study we used a fully empirical approach through multiple regression to create models of both OAC and NOAC (Non Optically Active Components) for the particular case of the Três Marias Reservoir. With these dimensions, Três Marias occupies the ninth and tenth place in ranking of Brazilian reservoirs in terms of area and volume respectively (von Sperling, 1999). The reservoir is characterised by a highly dentritic shape ( Figure 1) and is inserted in a two seasons semi-humid climate, one dry winter season and one wet summer receiving most (>85%) of the 1440 mm of rain yearly.

Aquisition of In Situ Limnological Data
The water sampling was done between 8H00 and 14H00 local time on 4 October 2019. The sampling locations were selected considering a number of factors including depth, affluents, land use of surrounding areas (presence of agriculture and urban areas) and pisciculture activities. It was also important to do the sampling with shortest time possible to reduce possible changes in the water properties. The date was selected for corresponding to the special condition of having both Landsat-8 and Sentinel-2B overflying the reservoir within about 10 minutes, a situation that only occurs every 80 days. On that particular day, the sky was completely free of clouds and atmospheric attenuation was considered almost null. During the sampling of the 13 points, some parameters were measured in loco using specialised sensors: dissolved oxygen (DO), pH, temperature (T • ), Secchi disk depth, turbidity, colour, conductivity and oxidation-reduction potential (ORP). The remaining parameters were analysed in laboratory the following days and included: chlorophyll-a, -b, -c, total carbon (TC), organic carbon (OC), nitrogen (N), phosphorous (P), chlorate, sulphate, total iron and total suspended solid. For these analyses, about nine litres of water were collected at each sampling point.
The parameters directly liable to be monitored through optical remote sensing were Secchi disk depth, turbidity, chlorophyll-a (chl-a) and colour. The thermal bands of Landsat-8 (TIRS) were also considered to estimate the surface water temperature. Some of the remaining parameters could also be estimated through statistical inference as indicators of the dynamic processes within the water body. These depend on indirect relation between the OAC and their NOAC counterparts and would only be applicable to the particular situation of the reservoir on the day of sampling. Only some of these parameters will be analysed here.

Remote Sensing Data Acquisition and Pre-processing
Our study used freely available Level 2 data from the Landsat-8 (https://earthexplorer.usgs.gov) and Sentinel-2B (https://sentinel.esa.int) satellites. Level 2 image data are preprocessed to include top-of-atmosphere (TOA) radiometric and geometric corrections and are supplied as spectral reflectance data. The Sentinel-2B satellite overflew the reservoir at 10:12:49 and Landsat-8 at 10:03:31 local time. Both images had less than 1% of cloud and no atmospheric interference of any kind could be observed.
The acquired images were converted from TOA reflectance to bottom-of-atmosphere (BOA) reflectance meaning that any atmospheric interference in the spectral response has been reduced to a minimum. The Sentinel-2B image was converted using the MAJA (MACCS ATCOR Joint Algorithm (Lonjou et al., 2016)) pre-processing supplied by the Centre National d'Études Spatiales (CNES). The Landsat-8 image were processed using the Landsat-8 Surface Reflectance Code (LASRC (Landsat-USGS, 2018)) provided by the United States Geological Service (USGS). The Level 2 Landsat image does not come with the thermal infrared sensor (TIRS) bands, so, in this case the Level 1 image was used. For the temperature, because of reported radiometric errors of ±1 • K and ±2 • K for Landsat-8 TIRS bands 10 and 11 respectively (USGS, 2019), only band 10 was used to produce water surface temperature data using Equation 1.
where ρλ = Top Of Atmosphere (TOA) planetary reflectance Mρ = Band-specific multiplicative rescaling factor from the metadata Q cal = Quantized and calibrated standard product pixel values (DN) Aρ = Band-specific additive rescaling factor from the metadata θSE = Local sun elevation angle To simplify the extraction of spectral data for building the statistical models, all Sentinel-2B 20 m bands and 60 m band 1 (coastal) were resampled to 10 m, making them all compatible within the same dataset. The same was done with Landsat-8 band 10 (TIRS) but at 30 m in this case. The spectral information (reflectance and temperature) was extracted considering a neighbouring kernel of 3 pixels × 3 pixels (30 m× 30 m for the MSI sensor and 90 m× 90 m for the OLI sensor) to account for any imprecision of the navigation GNSS or the image geometry but also for the drifting of the boat during the few minutes of collecting. In addition an extraction kernel of 90 m× 90 m was also created for the Sentinel-2B data so that a fair comparison can be made with the Landsat-8 data. The extracted spectral values were combined with the water sampling data in the same spreadsheets for analysis.

Regression models
Multiple linear regressions were produced with RStudio (v. 1.5.5109) to analyse the relationship between some of the water quality parameters and the mean reflectance values of the three analysis kernels (two for Sentinel-2B, one for Landsat-8). At this stage, a completely empirical approach was adopted which can yield robust models in many cases (Matthews, 2011). The selection of predictors was performed using two different approaches: • The first approach used a hybrid bidirectional stepwise method by which the predictors are iteratively selected first by addition (forward) and then by removing the ones having a poor contribution (backward) (James et al., 2013); • The second approach used predictors most quoted in previous work by other authors, many of which were quoted in the Introduction section. This was performed in an attempt to generate comparable results. Only the OACs were considered for this approach.
The adoption of this double strategy aimed at creating models with a reduced number of predictors as well as being more objective while avoiding over-fitting problems a too numerous number of predictors would create.

Model Validation and Accuracy Estimation
Four evaluation metrics were used to characterise the performance of the models. The Coefficient of Determination (r 2 ) gives an estimate of the proportion of variance explained by the model. The r 2 is only a valid estimator if its significance is high. A significance level of 95% (p < 0.05) was used here. The Mean Absolute Error (MAE) gives an idea of the average vertical distance between the predicted and the observed value. The Root Mean Square Error (RMSE) and Normalised Root Mean Square Error (NRMSE) are typically used to evaluate models but are rather sensitive to outliers.

Water Quality in the Três Marias Reservoir
The sampling point results obtained both in loco and in the postcampaign laboratory are presented in Table 2 and 3. Of the ten variable presented in these tables, only four are considered optically active (Table 2).
A rapid analysis of the 13 sampling points collected in the reservoir reveals a decrease of the water quality from the North end towards the South end which is upstream. In particular the two affluents (São Francisco River on the left and Paraopeba River on the right) appear much more turbid then the rest of the points. Conversely, the two northernmost points (1 and 2) show a Secchi disk transparency of almost 6 m (turbidity of 0.2 NTU).
Sampling points 3 to 6 are all within the central-north portion of the reservoir and are characterised by the presence of fish farms agricultural activities (centre pivot) in the areas near the reservoir shore. This is particularly observable at Point 6 which has the greatest values of chl-a (8.10 µg/L). Although this concentration of chl-a is much higher than for the remaining samples, this does not constitute a state of eutrophication according to the State Water Agency (IGAM). However it should be mentioned that these measurements were made during the dry season when the water quality is often better.
Two sampling points are located near the northern extremity of an island called Mangabal and also have relatively high concentrations of chl-a compared with the remaining points (except Point 6). Algaes and or cyanobacterias were in fact visually observable as can be seen on Figure 2. Although not yet considered a bloom, these observations are good indicators of the necessity to continuously monitor the reservoir's water (Barbosa et al., 2019).
The sampling points 9 through 13 are all located near the confluence between the São Francisco River (9, 10 and 11) and the Paraopeba River (12 and 13). These sample appear to be the worst in terms of water quality. Not only the transparency is poor (Secchi disk depth of 1.9 m) but this is where the highest concentrations of carbon (total carbon or TC) and organic carbon (OC) were recorded and are often indicative of water pol- lution (Moore, 1998). It is also important to emphasize that this stretch of the reservoir presents a turbulent flow, thus differing from the other locations analysed in this article.

Multiple Regression Results
The first set of multiple regression models were created using the hybrid bi-directional stepwise approach. The second set combined the stepwise approach with a priori information from previous authors. After executing the stepwise regression, the poor predictors were removed except if indicated by the reference authors and until the level of significance is above 95%.  Table 4 shows the results for nine parameters for the two Sentinel kernels and 10 parameters for the Landsat one. Table 5 describes the equations resulting from these regression models.  Table 5. Regression equations for the water quality models for the Sentinel-2 using 30x30 kernels (top), 90x90 kernels (middle) and Landsat-8 using 90x90 kernels (bottom).
coefficients of determination (r 2 ) ranging from 0.75 to 0.85 for both Sentinel and Landsat and for both kernel sizes (30×30 and 90×90). This proves that the spectral bands resulting from the two selection methods are good predictors, especially the MSI instrument with 90×90 kernels.
However, the temperature model produced a very low r 2 value of 0.03. We attribute this poor result mainly to the time difference between the in situ measurements and the time Landsat sensed the scene (≈10H00). It is also reported that the Landsat-8 TIR sensor suffers from calibration problems which hinders the absolute conversion from radiance to temperature (Montanaro et al., 2014) but that does not explain the very low r 2 obtained. Applying a radiance-to-temperature transformation would probably be preferable to the statistical inference we used in this article but since we had no usable in situ temperature to validate the data we simply left out the temperature models for future field campaigns when we will acquire temperatures samples within a short time lag close to the satellite sensing time.
The Sentinel-2B MSI sensor produced better results than the Landsat-8 OLI for all parameters except for the carbon (total and organic). The greater number of spectral bands (red edge and near infrared) and shortwave infrared and the finer spectral resolution of bands 3-7 and 8a can probably explain this result. Sentinel's band 1 (coastal) would also be useful because of its potential penetration in the water but has calibration problems that causes a large vertical banding in the images. The two kernel sizes for the MSI results appear to have relatively little effect on the results. Table 6 shows all the accuracy metrics used to evaluate the models. Although the p < 0.05 critical value was used, we observe that many models are under 0.01 (99%) including the chl-a, Secchi disk depth and turbidity for all three kernel size -sensor combinations. The TC model with the MSI sensor (30 m kernel) yielded p = 0.06, which could still be considered acceptable with r 2 = 0.63. Because the stepwise method was used to remove predictors, it has been suggested that the critical p value could be relaxed (Babyak, 2004). The MAE, RMSE and NRMSE all show the same trend in the values. The MAE is particularly interesting as it gives a more realistic idea of the kind of precision one can expect from these models.

Relation Between Optically Active and Non-active Parameters
Three of the four optically active components parameters have yielded very good and significant results. These results can also explain some of the good results obtained for the optically nonactive parameters such as pH, DO, conductivity and carbon (TC and OC) through indirect relationships.
With r 2 of 0.85, 0.83 and 0.69 for the three kernels (MSI 30, MSI 90 and OLI 90), dissolved oxygen achieved the highest score among the optically non-active parameters. This can be related to the higher concentrations of chlorophyll associated with photosynthesis, hence production of oxygen by chlorophyllian organisms. The pH models achiever r 2 of 0.89, 0.79 and 0.43 for the three testing satellite data, the latter with p = 0.14 can be discarded. The pH is also related with the amount of chlorophyll and tend to increase with the presence of algaes and cyanobacteria if other factors do not have a more important influence. Conductivity models are also associated with relatively high r 2 (0.69, 0.76 and 0.69). The conductivity  Table 6. Evaluation metrics for all models and for the three image situations considered  of water is related with the concentration of ions, specifically dissolved salts from inorganic material. These chemical analysis were not available for this article but TC concentrations could include carbonates. Higher pH values are also related to greater conductivity. These relations are observable in Table 2 and Table 3 and bring an explanation to the strength of these models. Figure 3 and Figure 4 illustrate the maps that were generated for the chl-a and turbidity models and for both the MSI and OLI sensors.

CONCLUSION
This study aimed at comparing the performance of Sentinel-2's MSI and Landsat-8's OLI sensors for estimating water quality parameters through empirical statistical inference using regression. Both sensors performed very well for creating multiple regression models of three optically active components: chl-a, Secchi disk depth and turbidity. Furthermore, the MSI sensor performed slightly better than the OLI one. For the optically non-active parameters, the MSI sensor had a more evident advantage over the OLI sensor. While the former produced five  Figure 4. Maps produced using the chlorophyll-a (a) and turbidity (b) models for the Landsat-8 OLI sensor. models with r 2 > 0.6, the latter only three. These parameters have no direct effect on the at-sensor radiance but are the consequence of the optically active parameters recorded. As such these models are not exportable to other locations or even for the same location at different times.
For the MSI sensor, the size of the MSI kernel (30×30 vs 90×90) had little effect on the results; some parameter models were better with the 90 m kernel (chl-a, turbidity, conductivity), some were worse (Secchi depth, pH, DO). These averaging kernels were used instead of single pixel vectors because 1) only an hand-held navigation GNSS with an estimated precision of ≈5 m was used and, 2) the boat was not anchored and wind (although very light) and current affected the boat's location during the sampling.
No valid temperature model could be produced using Landsat-8's TIR sensor. Sampling delays were probably the major cause for this poor results but might also be attributed to the calibration problems of the sensor.
We explain the difference between the two sensors based on their differences. Firstly, the MSI has more bands, especially in the visible-near infrared region (three red-edge bands: 700-800 µm). On top of these additional bands, some of the MSI bands have a finer spectral resolution like the green (3), red (4) and near infrared (8a) bands. Finally, the MSI spatial resolutions are of 10 m (4 bands), 20 m (6 bands) and 60 m (3 bands) whereas all the OLI bands have a single spatial resolution of 30 m. Other authors that have compared the performance of statistical models with these two sensors have reached similar conclusions (Yadav et al., 2019, Govedarica, Jakovljević, 2019.
The field campaign lasted about six hours which probably caused problems for the surface temperature that responds very quickly to the sun elevation. Although the remaining parameters are considered more stable, they are still affected by the changing conditions of temperature and illumination. This means that conducting faster sampling near the image acquisition time could potentially improve these models as well. Future campaigns will try to reduce these effects by using two teams instead of one.