NEW SOLAR-INDUCED CHLOROPHYLL FLUORESCENCE RETRIEVAL ALGORITHM BASED ON TANSAT SATELLITE DATA

Solar-induced chlorophyll fluorescence (SIF) is an indicator of plant photosynthesis which could be detected by satellite. However, some existing algorithms are easily affected by the inaccuracy of satellite data which will causing deviation in the retrieval of SIF. To avoid "outliers" with inaccuracy affecting the retrieval results, a random sample consensus algorithm (RANSAC) was introduced to retrieve SIF in this paper. The results show that the chlorophyll fluorescence value obtained by this method is consistent with the OCO-2 SIF product (R = 0.81), and also consistent with the MODIS vegetation index (R = 0.87 with NDVI, R = 0.85 with EVI). Compared with the existing SIF products (OCO-2 SIF), the SIF retrieved in this paper was better in spatial details and outlier distribution. * Corresponding author


INTRODUCTION
Sun-induced chlorophyll fluorescence (SIF) is a type of light released by vegetation during photosynthesis and its wavelength is within 650-800 nm (Porcar-Castell et al., 2014). Studies have shown that SIF is closely related to Gross Primary Productivity (GPP) (Damm et al., 2010; and the state of vegetation stress (Liu, Cheng, 2010;. By detecting SIF, the light energy utilization rate of vegetation can also be perceived (Liu, Cheng, 2010). Therefore, sensing and detecting SIF are extremely significant for monitoring the growth of vegetation and investigating ecological conditions. SIF is a kind of weak canopy emission information, and its energy value accounts for about 5% of the total reflected energy ( Van der Tol et al., 2014;Zhang et al., 2018). Therefore, the main problem solved by satellite retrieval SIF is to extract SIF from the detected radiance. Many SIF retrieval algorithms have been proposed, such as FLD (Plascyk, Gabriel, 1975), 3FLD (Maier et al., 2003), iFLD (Alonso et al., 2008) and SFM (Mazzoni et al., 2008). However, the current SIF retrieval technology based on remote sensing products is not mature enough yet. In 2013, Xinjie Liu and Liangyun Liu (Liu X, Liu L, 2013) used the weighted least squares method to search GOSAT satellites and obtained SIF products in China. Their paper provides a new idea for the SIF retrieval method based on the physical model. But the SIF retrieval method based on the weighted least squares method is still affected by outliers containing random inaccuracy. Random sample consensus (RANSAC) was proposed by Fischler and Bolles in 1981 to eliminate the effects of "outliers" that are unrelated to real data (Fischler, Bolles, 1981). Therefore, in the process of fitting linear equations, RANSAC can effectively exclude "outliers", thereby effectively avoiding the effects of "outliers". Based on the above theory, this paper uses the RANSAC algorithm to retrieve the global SIF values in March and August 2018 with the help of Chinese TanSat satellite ACGS (Atmospheric Carbon dioxide Grating Spectroradiometer) sensor data. Then, compared with other satellite products in terms of spatial distribution and quantitative analysis of data to verify the reliability of the algorithm.

The TanSat Satellite Data
Launched in December 2016, the TanSat satellite is China's first satellite to monitor and detect carbon dioxide (CO2). TanSat flies in a sun-synchronous, 700 km altitude orbit. TanSat has a spatial resolution of 2 km×2 km and a revisit period of 16 days (Yang et al., 2018). The two main instruments onboard the TanSat satellite are the Atmospheric Carbon dioxide Grating Spectroradiometer (ACGS) and the Cloud and Aerosol Polarimetry Imager (CAPI). CAPI measurements are mainly used to provide cloud and aerosol optical properties, which can determine whether the pixel is affected by the atmosphere. ACGS has three sets of grating spectrometers with similar structures, which can realize hyperspectral detection of radiation information in three bands of 0.76 μm (O2-A band, 758-778 nm), 1.61 μm (Weak CO2 band, 1,594-1,624 nm) and 2.06 μm (Strong CO2 band, 2,042-2,082 nm). The spectral resolutions of ACGS are 0.044 nm in the O2-A band, 0.12 nm in the weak CO2 band, and 0.16 nm in the strong CO2 band. In this paper, the O2-A band was used to invert the SIF value. TanSat ACGS L1b data provides radiance data from Science product and irradiance data from Calibration product which they are both radiometrically accurate and spectrally calibrated. Their wavelength extraction formula is as follows: where P = the pixel number Ci = the dispersion coefficients λ = wavelength In this paper, spectral offset correction was performed on the radiance and irradiance spectra with offset before retrieval. Figure 1 shows the example of the radiance spectrum and irradiance spectrum at O2-A band, and figure 2 shows the normalized spectral curve of radiance spectrum and irradiance spectrum between 769~771 nm. Within this range, the most obvious change is the KI Fraunhofer line (770.1 nm).Since the KI Fraunhofer spectral line has the least interference with oxygen absorption，so several radiance data covering around 770.1 nm were selected for SIF retrieval in this study.

Comparative Data
In order to evaluate the reliability of the results in this paper, comparisons were made by using other satellite-based datasets.
All comparative data were converted to GeoTIFF format using Matlab R2019a software.

Basic equations
Assuming that the surface reflection conforms to Lambertianequivalent reflectivity surface model and that the chlorophyll fluorescence intensity F and surface reflectance R do not change with the wavelength in the studied radiance spectrum (769-771 nm), the radiance can be expressed as (Joiner et al., 2011): Where I (λ) = observed radiance I0 (λ) = atmospheric path radiance T↑(λ) = up-welling transmittances T↓(λ) = down-welling transmittances E(λ) = solar irradiance S(λ) = atmospheric spherical albedo θ = solar zenith angle In the 769 ~ 771nm radiation spectral range, the atmosphere has significant permeability, so the effects of atmospheric scattering and atmospheric absorption can be ignored. Thus, we set atmospheric path radiance and atmospheric spherical albedo value as 0, up-welling and down-welling transmittances as 1, then equation 2 could be simplified as: Let constant cos R   = k, the equation 2 can be written as: We can use radiance and irradiance data provided by TanSat to retrieve the SIF.

Random Sample Consensus Algorithm
Random sample consensus (RANSAC) is an algorithm that can estimate mathematical model parameters from a set of observations containing outliers (Fischler, Bolles, 1981). A basic assumption is that the data consists of "inliers" and is also affected by "outliers" that contain inaccuracies. Therefore, RANSAC assumes that a process can be used to estimate model parameters that can explain or fit the data, eliminating the influence of "outliers" (Figure 3).
RANSAC algorithm needs to input a set of observation data, a parametric model that can be interpreted or adapted to observation data, and some credible parameters. Then, the algorithm achieves its goal by repeatedly selecting a random subset of the data. The selected subset is assumed to be the "inliers" and verified using by the following method: Step 1: There is a model that is adapted to the hypothetical "inliers", that is, all unknown parameters can be calculated from the hypothetical "inliers".
Step 2: Use the model obtained in step 1 to test all other data. If a point is suitable for the estimated model, it is considered to be an "inlier".
Step 3: If enough points are classified as "inliers", the estimated model is reasonable enough.
Step 4: Re-estimate the model with all hypothetical "inliers".
Step 5: Finally, the model is evaluated by estimating "inliers" and the model's error rate.
This process is repeated to perform a specified number of times.
Each time the new model is generated, there are two situations: either it is abandoned by few "inliers", or it is selected cause it is better than the existing model. Based on this, we designed the process of retrieval SIF via RANSAC.

Retrieved Result of SIF
We compare the retrieved SIF value with OCO-2 SIF and MODIS vegetation indices to evaluate the reliability of the algorithm. Figure 4 shows the global distribution of the retrieved TanSat SIF, OCO-2 SIF, NDVI, and EVI. All products were resampled to a uniform resolution under ArcGIS 10.5 software.  It is obvious that the global change of SIF is closely related to the season. In March, the direct solar point was in the southern hemisphere, so higher SIF values appeared in South America, southern Africa, and near the equator. Meanwhile, the global vegetation index (NDVI and EVI) distribution shows the same trend. In August, due to the enhanced photosynthesis of vegetation in the northern hemisphere, higher SIF values appeared in the eastern United States, eastern Asia, and near the equator, Europe and Siberia also show higher vegetation activity.
It can be seen that the global distribution of several products has highly consistent. However, retrieved TanSat SIF can provide richer details than OCO-2 SIF, such as the SIF distribution in North America in March 2018 ( Figure 5). But it is worth noting that TanSat has more strip loss than OCO-2 (Red circle position in Figure 5). The preliminary verification shows the consistency between the retrieved TanSat SIF and the actual situation.

Quantitative comparison
To quantitatively investigate the reliability of the retrieved TanSat SIF against other remote sensing datasets, scatter plot and Box-plot were drawn in this paper, as shown in Figure 6 and Figure 7. It should be noted that several locations scattered on the land which have avoided the effects of default values and cloud occlusion were randomly selected for statistics to generate a scatter plot in this study. In order to evaluate the reliability of SIF products retrieved from TanSat data, OCO-2 SIF Products and other satellitebased vegetation datasets (MODIS Vegetation Index Products) were used for comparison to overcome the problems caused by the lack of direct observation data of SIFs in global regions which is a common practice in related research As shown in Figure 6, retrieved TanSat SIF is consistent with other remote sensing products. Further comparison with OCO-2 SIF shows that it has a linear relationship with OCO-2 SIF (R 2 = 0.81), and most of the points are located near the x = y line, which indicates that the high-level consistency between retrieved TanSat SIF and OCO-2 SIF. A comparison with the vegetation index (NDVI and EVI) found that retrieved TanSat SIF also had a linear relationship with NDVI (R 2 = 0.87) and EVI (R 2 = 0.85).  Figure 7 shows the difference between the data distribution of retrieved TanSat SIF and OCO-2 SIF. The two products are consistent medians, but the data distribution of retrieved TanSat SIF is relatively more scattered, especially the part larger than the median. The difference between the two groups of values is relatively large, which may be due to random errors generated during satellite observations and changes brought by the time difference between satellites covering the same area. In contrast, the OCO-2 SIF has more outliers, which may be due to the filtering of abnormal results by the RANSAC algorithm. In summary, through a series of quantitative studies, it has been found that retrieved TanSat SIF and related products showed consistency and extra advantages, which proves that the SIF value retrieval based on the RANSAC algorithm is feasible.

CONCLUSION
Chlorophyll fluorescence is closely related to plant photosynthesis, and detecting chlorophyll fluorescence by remote sensing has important application value. At present, limited by remote sensing detectors and retrieval methods, chlorophyll fluorescence cannot be accurately retrieved. Based on the random sample consensus algorithm, this paper used TanSat satellite's ACGS sensor data to retrieve the global SIF values in March and August 2018. It was found that both in the global spatial distribution and data consistency, retrieved TanSat SIF showed correlations with other satellite products (OCO-SIF, NDVI, and EVI). Compared with OCO-SIF, the SIF retrieval product in this paper has advantages in spatial detail performance and outlier data distribution. This article has proved that SIF products retrieved using random sample consensus algorithm have certain reliability. However, the method in this paper still needs to be improved. The basic retrieve equations used in this paper ignore the effects of the atmosphere (such as temperature, pressure, aerosol, etc.), which will bring uncertainty to the result. Besides, the resolution of the satellite has an inestimable impact on the ground SIF retrieval, and the difference in scale also brings considerable difficulties to the SIF verification. Next, it is positive to apply the RANSAC algorithm to OCO-2 data and compare it with its SIF products. Fortunately, the FLuorescence EXplorer (FLEX) mission, which will be launched by ESA (European Space Agency) in 2022, provides expectations for solving these problems.