ACCURACY ASSESSMENT OF CLOUD MASK DETECTION ALGORITHMS FOR CBERS-4 WFI IMAGERY

: Clouds limit the potential use of optical images. Proper clouds and cloud shadows detection are necessary steps for optical image applications. Few algorithms are flexible in detecting clouds in images with a limited number of bands, such as the Wide-Field Imager (WFI) sensor on board the China-Brazil Earth Resources Satellite (CBERS-4), which has four spectral bands (blue, green, red, and near-infrared). Therefore, this work aims to assess the accuracy of two cloud detection algorithms: CMASK and ATSA, and evaluate the influence of the ATSA configuration parameters. We selected four regions in Brazil for our analysis. In all cases, ATSA presented overall accuracy (OA) superior to CMASK. While the ATSA obtained OA greater than 0.91 for all analyzes, the OA from CMASK did not exceed 0.84. CMASK presented commission errors for the No Clear class (combination of Cloud and Cloud shadow ) and inclusion errors for the Clear class close to zero. However, many errors of omission of clouds misclassified as the Clear class was observed. The ATSA algorithm presented a better balance between inclusion errors and omission errors. Our results can be used as guidance for choosing a cloud mask algorithm for the CBERS-4 WFI images and for analysis considering the images from WFI on board CBERS-4A and Amazonia-1, as they have similar characteristics.


INTRODUCTION
Optical satellite imageries are widely used to map land use and land cover (LULC), monitor crops and ecosystems, and estimate land surface parameters, enabling a better understanding of the Earth system's functioning and how it has changed over time (Cohen and Goward, 2004;Ennouri et al., 2019;Gómez et al., 2016;Hansen and Loveland, 2012;Wulder et al., 2018Wulder et al., , 2015Zhu and Helmer, 2018). Nonetheless, in optical remote sensing images, clouds and their corresponding shadows are inevitable and limit the potential of the imagery for ground information extraction (Li et al., 2019). Estimates show that the global mean cloud cover over land surfaces is greater than 55% (King et al., 2013;Rossow and Schiffer, 1999). In tropical regions, this value can be even higher, as in the Amazon region, where the frequency of cloud cover is higher than 80% in the wet season (Prudente et al., 2020). Many applications need remote sensing data periodically, as is the case of LULC change and agricultural monitoring, and cloud contamination of optical imagery presents a major limitation (Hansen and Loveland, 2012;Whitcraft et al., 2015).
Thus, accurately extracting clouds and cloud shadows from cloud-contaminated images can help reduce the negative influences that cloud coverage brings to the application of the imagery (Li et al., 2017). This becomes essential in applications that require dense time series, such as agricultural monitoring (Bendini et al., 2019). Furthermore, due to the large amount of data required for multi-temporal and large-scale studies, it is important to acquire cloudless images automatically (Sun et al., 2017). Therefore, masking clouds and cloud shadows is often the first and most necessary step of image pre-processing in optical remote sensing applications (Baetens et al., 2019;Braaten et al., 2015;Zhu and Helmer, 2018).
Automatic and accurate detection of clouds and cloud shadows is challenging (Li et al., 2017;Zhu and Helmer, 2018;Woodcock, 2014, 2012). Different clouds with different spectral signatures (Bian et al., 2016) can be easily confused with some cloud-free bright objects on the land surface (Zhu and Helmer, 2018). Furthermore, the spectral signature of thin clouds can be similar to the signature of the land surfaces underneath, as the observed reflectance contain a mixture of cloud and land signals, making them more difficult to identify (Baetens et al., 2019;Zhu and Woodcock, 2014). Cloud shadows are another challenge as they are easily confused with dark land surfaces due to the spectral similarity between them (Zhu and Helmer, 2018;Zhu and Woodcock, 2012).
Despite the challenges mentioned above, various methods have been developed to detect clouds and cloud shadows. The methods for masking cloud and cloud shadows can be divided into two categories according to the single or multi-temporal images that the algorithms use (Li et al., 2017). Most singleimage methods screen clouds in individual images using predefined or adaptive thresholds (Zhu and Helmer, 2018). Single images methods require fewer input data than multitemporal methods, being more popular (Li et al., 2017). In multi-temporal methods, the temporal information in the images acquired at different times is used to detect clouds and shadows (Zhu and Helmer, 2018). The idea of these algorithms is that clouds and cloud shadows will cause sudden changes to the reflectance, and by comparing the image analysed with a reference without clouds, the clouds and cloud shadows will be easily detected (Zhu and Woodcock, 2014). Multi-temporal methods usually achieve a higher cloud detection accuracy by requiring more scenes over a short period (Li et al., 2017). But this may cause problems for applications like change detection because LULC change will also result in sudden changes in satellite observations (Zhu and Woodcock, 2014).
Most of the methods for detecting clouds and cloud shadows were designed for images of a specific sensor. Fmask, for example, was originally designed for cloud screen and cloud shadows in Thematic Mapper (TM) and Enhanced Thematic Mapper (ETM+) sensors on board Landsat satellites (Zhu and Woodcock, 2012). Later, this algorithm was improved to be used on Landsat-8 OLI (Operational Land Imager) and Sentinel-2 MSI (Multispectral Instrument) images (Qiu et al., 2019;Zhu et al., 2015). Other algorithms are image processors, which generate the cloud mask as part of the converting radiance process at the top of the atmosphere to surface reflectance . This is the case of the Sen2Cor algorithm (Louis et al., 2016), developed for Sentinel-2 MSI images. However, these algorithms use specific bands that many other sensors do not have (e.g., cirrus, SWIR, thermal) (Zhu and Helmer, 2018).
The Wide-Field Imager (WFI) on board the CBERS-4 satellite (China-Brazil Earth Resources Satellite) is a sensor that has four spectral bands (i.e., Blue, Green, Red, and Near-infrared-NIR) with a spatial resolution of 64 m and a revisit period of 5 days (Pinto et al., 2016). A WFI sensor with similar characteristics to the instrument on board the CBERS-4 is also on board the CBERS-4A and the Amazonia-1 satellites. The CBERS-4 WFI imageries have been used for agricultural and environmental monitoring (Chaves et al., 2021;Picoli et al., 2020), and they are the primary data source for the Real-Time Deforestation Detection System (DETER), which aims to generate data to support Brazil's actions in protecting Amazon rainforest against deforestation (INPE, 2019). Due to the limited number of spectral bands, detecting clouds and cloud shadows in WFI images is even more challenging, and few algorithms have been developed for such characteristics. For example, the Fmask needs SWIR and thermal spectral bands in older versions and at least SWIR and cirrus bands in newer versions (Zhu et al., 2015;Zhu and Woodcock, 2012). These bands are not present in the WFI sensor, which makes the Fmask unfeasible to be applied to the images of this sensor. The Automatic Time-Series Analysis (ATSA) (Zhu and Helmer, 2018) is suitable for sensors such as the WFI since it needs a minimum number of bands and fewer predefined parameters. This algorithm can be applied for areas with persistent clouds.
The reliability of the cloud mask is also a key element that determines the noise present in the reflectance time series (Baetens et al., 2019). In practice, performance assessment is done by selecting representative images and assessing how well each algorithm performs in each image . Several studies have compared the accuracy of different cloud and cloud shadow detection algorithms. For example, Foga et al. (2017) assessed the accuracy of multiple cloud masking algorithms to determine the best globally applicable algorithm to be used in future Landsat quality assurance data products. Sanchez et al. (2020) compared four cloud detection methods (Fmask 4, MAJA, Sen2Cor 2.8, and s2cloudless) for Sentinel-2 MSI images in the Amazon region.
Although the remote sensing community is making extensive use of CBERS-4 WFI data and the importance of cloud and cloud shadow masks for optical analysis of satellite imagery is well known, no cloud and cloud shadow masks assessment has been documented yet in the literature for this sensor. Thus, the objective of this work is to compare two cloud detection algorithms for CBERS-4 WFI images: the CMASK and the ATSA. The CMASK was previously used to generate WFI data cubes , and the ATSA was initially tested with Landsat-8 OLI, Landsat-4 MSS, and Sentinel-2 images (Zhu and Helmer, 2018).

Study sites
We selected four Military Grid Reference System (MGRS) tiles for our analysis (Figure 1). These four tiles have different characteristics of LULC and cloud cover incidence. As the ATSA algorithm needs a time series, we decided to use subsets delimited by the MGRS tiles for our analysis, and for that, we clipped every WFI image that intersected with these tiles.
The 20NPH tile is in the Amazon biome. The predominant LULC in this region is forest formation and pasture (Souza et al., 2020). This tile has a high incidence of clouds all year round (Prudente et al., 2020), making difficult to acquire cloud-free images. Tiles 21LYD and 23LLG are in the Cerrado biome. The predominant LULC in these tiles are intensive agriculture, pasture, grassland, and savanna formation (Souza et al., 2020). In these two tiles, there are well defined dry and rainy seasons. Thus, during December and February, the rainy season, there is a high incidence of clouds. However, during the dry season (from June to August), clouds have low incidence (Prudente et al., 2020). Tile 22JBT is located in the Atlantic Forest biome (Souza et al., 2020), a region with a predominance of annual agriculture. In this tile, there is a high incidence of clouds between December and January and a medium incidence in the rest of the year (Eberhardt et al., 2016;Prudente et al., 2020).

WFI data
The WFI sensor is one of the four instruments on board the CBERS-4 satellite. The WFI sensor is a pushbroom imaging spectrometer acquiring data at four spectral bands that include blue (450-520 nm), green (520-590 nm), red (630-590 nm), and NIR (770-890 nm) (Pinto et al., 2016). WFI has a large field of view of ±28.63º, which generates a swath width of 866 km, allowing the revisit capacity of 5 days at the equator with a ground resolution of 64 m at nadir.

CMASK
We used the MS3 software to generated the CMASK in this work. CMASK classifies the image as clear or cloudy. CMASK is also used by the Brazil Data Cube project to generate ARD (analysis-ready data) data cubes for Brazil .

Automatic Time-Series Analysis (ATSA)
The ATSA was designed to identify clouds and cloud shadows in multitemporal optical images, being more suitable for areas with persistent clouds, and can be used for sensors with a limited number of spectral bands (Zhu and Helmer, 2018). The algorithm has five main steps: (i) to calculate cloud and shadow indices to highlight cloud and cloud shadow information; (ii) to obtain an initial cloud mask by unsupervised classifier; (iii) to refine the initial cloud mask by analysing the time series of a cloud index; (iv) to predict the potential shadow mask using geometric relationships; and (v) to refine the potential shadow mask by analysing time series of a shadow index (Zhu and Helmer, 2018).
The ATSA algorithm needs a water mask. We selected images with less than 5% clouds in the metadata to create the water mask. Afterward, we collected samples by visual interpretation of these images. We applied the supervised classifier Spectral Angle Mapper (SAM) over the stack of these selected images (Souza et al., 2013). We extracted the mask of water/no water, along with the elevation and azimuth solar angles information from the metadata, and we used it as input to the ATSA algorithm.
Some parameters need to be configured in the ATSA algorithm. First, the longest and shortest distance between the shadow and its corresponding cloud must be selected. These values were empirically set to 1 and 40 pixels (64 m and 2560 m, respectively) after inspection of the images, as recommended by Zhu and Helmer (2018). ATSA uses these values to estimate shadow zones. Two other parameters, A and B, are the thresholds used by ATSA to identify cloud and shadow, respectively. We evaluated different combinations of A and B. We considered values of A equal to 0.5, 1.0, and 1.5, and B equal to 1.0 and 3.0. As the original algorithm was tested with Landsat data, and the WFI data has a lower spatial resolution (64 m), we changed the filter to remove isolated pixels from 4 to 2 inside the 3-by-3 neighbourhood, for both cloud and cloud shadow.

Accuracy assessment
We assessed the cloud mask accuracy for CMASK and ATSA, and the accuracy of the cloud shadow mask for ATSA on a tile basis. For this, we randomly chose four images in each tile, and for each image, we randomly selected 100 sample points. These points were tagged by a remote sensing expert through image visual interpretation, following previous work . Thus, each tile had a total of 400 points for accuracy assessment. The photo interpreter labeled each sample as "Cloud", "Cloud shadow", or "Clear", based on images in a true-colour composite (red, green, and blue) and a false-colour composite (NIR, red, and green). Furthermore, the interpreter was unaware of the classes of the validation sample points in the cloud mask.
We generated an error matrices from the random sampling points. So the overall accuracy (OA), user's accuracy (UA), and producer's accuracy (PA) (Foody, 2002) were derived from the error matrices. OA indicates the proportion of correctly classified pixels, and it's calculated by dividing the total number of correctly classified pixels by the sample size. The PA indicates the probability of a reference pixel is correctly classified, and it's calculated by dividing the total number of correct pixels in a class by the total number of pixels of that class. The UA is calculated by dividing the total number of correct pixels in a class by the total number of pixels classified in that class, it indicates the probability that a pixel classified on the map actually represents that category (Congalton, 1991).
While the ATSA classifies images into three classes (Clear, Cloud, and Cloud shadow), the CMASK classifies them into two classes (Clear and Cloud). Therefore, we initially evaluated the accuracy of the ATSA considering the three classes and different combinations of parameters A and B (as described in section 2.3.2). Afterward, to compare ATSA with CMASK, we consider only two classes for the two algorithms: Clear and Not clear. So, the Cloud and Cloud shadows classes have been grouped into the Not clear class for ATSA.

RESULTS AND DISCUSSION
In our experiments, when we considered the three classes (Clear, Cloud, and Cloud shadows), for all analysed tiles except 23LLG, the parameter A equal to 0.5 had the higher OA ( Figure  2). For tile 23LLG, the highest value of OA was obtained with A equal to 1.0, while the lowest value was obtained with A equal to 0.5. Considering parameter B, except for tile 22JBT, the highest OA was reached with a parameter value equal to 3.0.

Figure 2 Overall accuracy for different A and B parameters combinations in ATSA algorithms considering three classes: Cloud, Cloud shadow, and Clear.
When we consider only two classes (Figure 3), Clear and Not Clear, the OA is generally greater than in the case of three classes (Figure 2). However, the OA patterns for parameters A and B are similar. Low values of A result in high OA, except for tile 23LLG. Meanwhile, low values of B result in smaller OA, except for tile 22JBT. Comparing OA between ATSA and CMASK, any combination of the A and B parameters in ATSA results in higher OA than CMASK. The lowest value of OA was 0.91 in the 23LLG tile, considering the combination of A equal to 1.0 and B equal to 3.0 in ATSA. In comparison, the highest value of OA for CMASK was on the 20NPH tile, with the OA equal to 0.84.
For the case of the three classes, increasing the parameter A values in ATSA, there is an increase of UA for the Cloud class and a decrease for the Clear class ( Figure 4). Conversely, increasing the parameter A value reduces the PA for the Cloud class and increases it for the Clear class. By reducing the parameter A value, more pixels are detected as clouds, reducing the omission error of the Cloud class. However, this also increases the number of clear pixels wrong classified as clouds, increasing the commission error of the Cloud class. In most cases, the UA and PA for ATSA were higher for the Cloud class than for the Cloud shadow class. A similar result was found by Zhu and Helmer (2018). For the Cloud shadow class, in tiles 21LYD and 22JBT, the PA was higher with parameter B equal to 1.0. There was practically no difference in PA for the other tiles when parameter B was equal to 1.0 or 3.0. For most tiles and the parameter A values, the UA was higher when parameter B was equal to 3.0. The confusion of the Cloud shadows class occurred when it was classified as Cloud, mainly on the edges of clouds, and the Clear class was misclassified as Cloud shadow. One of the possible causes of confusion in the Cloud shadow class can be the replacement of the SWIR band by the NIR band. In the case of the tests performed by Zhu and Helmer (2018), they used SWIR in the shadow index. However, as WFI does not have a SWIR band, we needed to replace it with the NIR band, as Zhu and Helmer (2018) suggested.
In the ATSA algorithm, when the Cloud and Cloud shadow classes are combined in the No clear class, and the parameter A value is increased, there is an increase in the UA and a reduction in the PA for the No clear class ( Figure 5). On the other hand, there is a reduction in UA and an increase in PA for the Clear class when the parameter A value is increased. Increasing the B parameter value results in an increase in UA, for the No clear class, and in PA, for the Clear class, in most cases.

Figure 5 User's accuracy (UA) and Producer's accuracy (PA) for CMASK and different combinations of A and B parameters values in ATSA algorithm considering two classes: Clear and No clear.
CMASK presented UA close to 1.0 for the No clear class and PA close to 1.0 for the Clear class. However, it presented an omission error between 25% and 36% for the No clear class, and commission errors between 29% and 39% for this class. Almost all pixels classified in the No clear class are clouds. However, the CMASK fails to classify many cloud pixels in the No clear class and misclassify them in the Clear class. CMASK also doesn't classify cloud shadows, which increases the omissions in the No clear class.
As shown in Figure 6a, CMASK fails to classify many clouds' edges as Cloud and does not detect smaller clouds. CMAKS also does not identify semi-transparent clouds (Figure 6b), in addition to not identifying cloud shadows. This explains the large number of omission errors in the No clear class ( Figure 5). The ATSA algorithm can better detect the edges of clouds and small clouds, and are able to detect shadows ( Figure 6). However, when parameter A was equal to 0.5, it presented cloud and cloud shadows commission errors (see the southeast part of Figure 6). As ATSA calculates the potential shadow zones using sun-cloud geometry, commission errors in cloud identification can lead to commission errors in cloud shadow identification, as in this case. ATSA can detect semi-transparent clouds better than CMASK ( Figure 6). However, when parameter A is equal to 1.5, some edges of semi-transparent clouds are not detected either.
In our analysis, the ATSA parameters A and B strongly influenced omission and commission errors (Figures 4, 5, and  6). Therefore, the proper choice of these parameters is important. The performance of cloud detection algorithms may depend on the region's characteristics where it is being used. However, studies targeting specific regions can guide these algorithms . In our study, parameters A equal to 1.0 and B equal to 3.0 presented a better balance between omission and commission errors. However, for the regions where we conducted our analyses, and for the case of applications sensitive to noise induced by clouds, it may be better to choose parameters A equal to 0.5 and B equal to 3.0.

CONCLUSIONS
In this study, we assessed the accuracy of two cloud mask algorithms for the CBERS-4 WFI data. The CMASK and the ATSA were selected because they are suitable for WFI's number of spectral bands (total of four). For ATSA, we also evaluated the accuracy with different A and B parameters settings. The ATSA showed overall accuracy (OA) superior to CMASK. Considering the parameters A equal to 1.0 and B equal to 3.0, in all tiles, the ATSA OA was higher than 0.91, while for the CMASK, the OA did not exceed 0.84. The CMASK had omission errors for the Clear class and commission errors for the No clear class close to zero. However, there were several omission errors (25% to 36%) for the No clear class, failing to classify cloud in this class and misclassifying them in the Clear class. ATSA algorithm was successful in balancing omission and commission errors using the parameters A equal to 0.5 and 1.0 and B equal to 3.0. Despite needing an image time series, the ATSA proved suitable for screening cloud and cloud shadows in CBERS-4 WFI imagery. Applying the ATSA algorithm in these images can enhance the robustness of the methods used for several applications such as agricultural and environmental monitoring and deforestation detection. We recommend, in future works, the ATSA evaluation also for WFI images from CBERS-4A and Amazonia-1.