Real-time and Seamless Monitoring of Ground-level PM2.5 Using Satellite Remote Sensing

Satellite remote sensing has been reported to be a promising approach for the monitoring of atmospheric PM2.5. However, the satellite-based monitoring of ground-level PM2.5 is still challenging. First, the previously used polar-orbiting satellite observations, which can be usually acquired only once per day, are hard to monitor PM2.5 in real time. Second, many data gaps exist in satellite-derived PM2.5 due to the cloud contamination. In this paper, the hourly geostationary satellite (i.e., Himawari-8) observations were adopted for the real-time monitoring of PM2.5 in a deep learning architecture. On this basis, the satellite-derived PM2.5 in conjunction with ground PM2.5 measurements are incorporated into a spatio-temporal fusion model to fill the data gaps. Using Wuhan Urban Agglomeration as an example, we have successfully derived the real-time and seamless PM2.5 distributions. The results demonstrate that Himawari-8 satellite-based deep learning model achieves a satisfactory performance (out-of-sample cross-validation R2=0.80, RMSE=17.49 ug/m3) for the estimation of PM2.5. The missing data in satellite-derive PM2.5 are accurately recovered, with R2 between recoveries and ground measurements of 0.75. Overall, this study has inherently provided an effective strategy for the real-time and seamless monitoring of ground-level PM2.5.


INTRODUCTION
Previous studies have indicated that long-term exposure to PM2.5 (airborne particles with aerodynamic diameter of less than 2.5 m  ) is associated with many health concerns, such as cardiovascular and respiratory morbidity and mortality (Madrigano et al., 2013). However, the assessment of PM2.5 exposure is limited due to the sparse and uneven distribution of ground monitoring stations.
Satellite remote sensing has the potential to expand PM2.5 estimation beyond those only provided by ground stations . The most widely used satellite parameter is aerosol optical depth (AOD) (Hoff and Christopher, 2009). Many satellite instruments own the capacity to provide AOD products, and have been applied to the monitoring of PM2.5, for instance, the Moderate Resolution Imaging Spectroradiometer (Li et al., 2017b) and Multiangle Imaging SpectroRadiometer (You et al., 2015) on board Earth Observing System (EOS) satellites (i.e., Terra and Aqua) etc.
However, previous satellite-based PM2.5 estimation usually rely on polar-orbiting satellites (e.g., Terra, Aqua). In general, this type of satellite provides only one observation per day for a given location. This means the polar-orbiting satellite will not be able to monitor PM2.5 once again until the next day. Hence, the PM2.5 pollution (especially sudden pollution event) may not be monitored in real time by polar-orbiting satellite.
Furthermore, previous studies have suggested that there exist diurnal variation of PM2.5 (Guo et al., 2016). The diurnal variation of PM2.5 cannot be effectively characterized by the polar-orbiting satellites. With a high temporal resolution (e.g., 1 hour), the geostationary satellite has been attempted to be used for the estimation of ground PM2.5/PM10. The results indicated that the hourly geostationary observations show great potential in the real-time monitoring of PM2.5/PM10 (Emili et al., 2010;Paciorek et al., 2008). While these studies still mainly focused on PM2.5/PM10 estimation at a daily scale, and the hourly estimation accuracy has great room for improvement.
On the other hand, due to the cloud contamination, there are many gaps in satellite remote sensing data Shen et al., 2014). The satellite-based PM2.5 estimates are usually seamed in space. To address this issue, two main strategies have been carried out. Firstly, the satellite AOD products were spatially interpolated to improve its coverage Ma et al., 2014). Secondly, the spatial smooth techniques were adopted to fill the missing data of satellite-derived PM2.5 (Just et al., 2015;Kloog et al., 2011). For these two strategies, they mainly considered spatial correlation information of aerosol/PM2.5 for the reconstruction of missing data. There may be great uncertainty, especially for large gaps, because of the lack of preference data. It should be noted that the valid observations at a nearby time may exist, and they are a good supplementary for the reconstruction of PM2.5. Whether is it possible to fuse the spatial and temporal correlation information of satellite and ground observations for the reconstruction of missing PM2.5?
Here arrives at the objective of this study. Firstly, taking advantage of geostationary satellite observations with high temporal resolutions, we would like to improve the time efficiency of satellite-based monitoring of PM2.5 from the daily scale to the hourly scale. In addition, a spatio-temporal fusion technique is developed to recover the missing data of satellitederived of PM2.5 using time series observations of satellite and ground stations. Therefore, the purpose of this paper is to derive hourly and seamless PM2.5 distributions by fusing the satellite remote sensing and ground station measurements.

Study region and period
The study region is Wuhan Urban Agglomeration (WUA), which is presented in Figure 1. The study period is a total year of 2016. WUA is located in central China (as shown in Figure  1). To make full use of PM2.5 station measurements, the monitors in the range with latitude of 28.4° ~32.3°N and longitude of 112.0° ~116.7°E are all included in our analysis. WUA is an urban group with the center of Wuhan, covering the vicinal 8 cities (Huangshi, Ezhou, Huanggang, Xiaogan, Xianning, Xiantao, Qianjiang, and Tianmen).

Himawari AOD
Himawari-8 is one of the third generation of geostationary weather satellites, launched on 7th October 2014 carrying the new AHI instrument. The Himawari-8 has an observation range of 80º E ~ 160º W and 60º N ~ 60º S, with the center of 140.7º E over equator. The aerosol optical depth product is derived from Himawari-8 visible and near-infrared data. This product provides information on AOD at 500 nm for areas over oceans and land during the daytime. The algorithm references a lookup table calculated on the basis of an assumed spheroid-particle aerosol model (Fukuda et al., 2013).
The Himawari-8 Level 3 hourly AOD data corresponding to the ground-level PM2.5 measurements were downloaded from Japan Aerospace Exploration Agency (JAXA) P-Tree System (http://www.eorc.jaxa.jp/ptree/). This AOD products have a spatial resolution of 5 km, and they are available every 1 hour. In this study, only aerosol retrievals with the highest confidence level ("very good") were adopted for the estimation of PM2.5.

Meteorological parameters and land cover data
The Goddard Earth Observing System Data Assimilation System GEOS-5 Forward Processing (GEOS 5-FP) meteorological data were used in this study. The reanalysis meteorological data have a spatial resolution of 0.25° latitude × 0.3125° longitude. Hourly specific humidity (SH, kg/kg), air temperature at a 2 m height (TMP, K), wind speed at 10 m above ground (WS, m/s), and, surface pressure (PS, kPa) from this datasets were extracted. Each variable was regridded to 0.05° to be consist with satellite observations. More details about the GEOS 5-FP data can be found at the website (https://gmao.gsfc.nasa.gov/forecasts/). MODIS normalized difference vegetation index (NDVI) products (MOD13) were downloaded from the NASA website (https://ladsweb.modaps.eosdis.nasa.gov/). This product is available at a resolution of 1 km every 16 days, and was incorporated to reflect the land-use type.

Data pre-processing and matching
Firstly, we created a 0.05-degree grid for the data integration, model establishment, and spatial mapping. For each 0.05-degree grid, ground-level PM2.5 measurements from multiple stations are averaged. Meanwhile, we resampled the meteorological data to match with satellite observations. All the data were reprojected to the same coordinate system. Finally, we extracted satellite observations, meteorological parameters on the locations where PM2.5 measurement are available.

METHODOLOGY
The main procedure of our method includes two parts, which is illustrated in Figure 2. Firstly, a deep learning architecture is developed to estimate ground-level PM2.5 using Himawari AOD and auxiliary predictors. On this basis, the satellite-derived PM2.5 in conjunction with ground-level PM2.5 measurements are incorporated into a spatio-temporal fusion model for the reconstruction of PM2.5. The details of each part can be seen in Section 3.1 and 3.2. Figure 2. Flowchart describing the procedure for deriving hourly and seamless PM2.5.

Deep learning for the satellite-based PM2.5 estimation
The deep belief network (DBN) model, which is one of the most typical deep learning models (Hinton et al., 2006), was introduced to represent the relationship between PM2.5, AOD, and auxiliary factors. Additionally, the geographical correlation of PM2.5 were incorporated into the DBN model (Geoi-DBN) (Li et al., 2017a). Because the nearby PM2.5 from neighbouring stations and the PM2.5 observations from prior days for the same station are informative for estimating PM2.5. The general structure of Geoi-DBN model used to estimate PM2.5 is: S PM T PM DIS denote as the geographical correlation of PM2.5, their calculation can be found at Li et al. (2017a). A Geoi-DBN model comprising two restricted Boltzmann machine (RBM) layers for estimating ground-level PM2.5 is presented in Figure 3. The input variables are satellite AOD, meteorological parameters, NDVI, and geophysical correlation of PM2.5; the output is ground PM2.5. The Geoi-DBN model is firstly trained using the collected AOD-PM2.5 matchups, and subsequently utilized to predict spatial values where there are no monitoring stations.

Spatio-temporal fusion for the reconstruction of PM2.5
The optical satellite is often impacted by clouds, and thus the spatial discontinuity exists in satellite-derived PM2.5. To address this issue, using time series satellite-retrieved PM2.5 and stationmeasured PM2.5, we propose a satellite-station spatio-temporal fusion method to fill the data gaps. The initial PM2.5 data are obtained by the interpolation of station PM2.5 using the inverse distance weighting (IDW) method. The basic supposition is that the variation of the interpolated PM2.5 remains similar trend with that of the satellite-derived PM2.5 between the same periods for a given location.
For convenience, we refer to the interpolated PM2.5 as coarseresolution data, and the satellite-derived PM2.5 (in Section 3.1) as fine-resolution data. Then, we denote the given seamed PM2.5 as target data. To reconstruct the target data, the auxiliary data used are N pairs of coarse-and fine-resolution data acquired prior the target time p T and the coarse-resolution data at the target time. For a given missing pixel: where ( , , ) p PM x y T is the prediction of the missing pixel (x, y) at prediction time where k RMSE  means root-mean-square error between coarse data at prediction time ( p T ) and auxiliary time ( k T ). Here, the "spatio-temporal fusion" means the combination of spatial information (similar pixels) and temporal information (data at auxiliary time) for the reconstruction of PM2.5.
Through the above reconstruction process, some tiny regions are still missing due to the lack of auxiliary data. They are interpolated using the IDW method to achieve full-coverage PM2.5 data.

Model evaluation
Firstly, to evaluate the accuracy of PM2.5 retrieval, a 10-fold cross-validation technique (Rodriguez et al., 2010) was adopted to test the potential of model overfitting for Geoi-DBN. All samples in the model dataset are randomly and equally divided into ten subsets. One subset is used as validation samples and the rest subsets are used to fit the model for each round of validation. We adopted the coefficient of determination (R 2 ), the root-mean-square error (RMSE, 3 / gm  ), the mean prediction error (MPE, 3 / gm  ), and the relative prediction error (RPE, defined as RMSE divided by the mean ground-level PM2.5) to evaluate the model performance.
Secondly, in order to verify the accuracy of PM2.5 reconstruction, we compared the reconstruction PM2.5 in a total year of 2016 with the corresponding station measurements. Statistical indices of the R 2 and RMSE are used to give a quantitative assessment.

Model evaluation
4.1.1 Evaluation of PM2.5 retrieval performance: Figure 4 shows the scatter plots for the modelling fitting and crossvalidation results of Geoi-DBN model. For the model fitting, the R 2 value is 0.80, and the RMSE is 17.37 3 / gm  . The results indicate that the Geoi-DBN model can effectively describe the AOD-PM2.5 relationship. From model fitting to cross-validation, the R 2 value is equal and the RMSE only increase 0.12 3 / gm  . These findings show that the Geoi-DBN model is not over-fitted. On the other hand, the cross-validation slope of observed PM2.5 versus estimated PM2.5 is 0.79, reporting some evidences for bias. This means that the Geoi-DBN model tends to underestimate PM2.5 concentrations when the ground measurements are greater than ~55 3 / gm  . The possible reason could be that we used point-based monitoring data and a spatially averaged modelling framework. The sampling distribution of monitors in a grid may not give a great estimation of the spatially averaged concentration for that grid. Generally, the Geoi-DBN model has achieved a satisfactory performance for the Himawari-based AOD estimation.

Evaluation of PM2.5 reconstruction accuracy:
To evaluate the performance of PM2.5 reconstruction, we compare the reconstruction results with ground station measurements. As shown in table1, the R 2 value between observed PM2.5 and reconstruction PM2.5 is 0.75, and the RMSE is 19.44 The results show that the reconstruction PM2.5 are highly consistent with the station measurements. For the satellite retrievals, they report R 2 and RMSE values of 0.81 and 16.96 3 / gm  versus station measurements. These findings indicate that the reconstruction results almost obtain a same level of performance to the satellite retrievals, when comparing with the ground station observations. Therefore, we can say that the proposed approach is effective for reconstructing the seamless PM2.5 distributions.  The data gaps are filled using the proposed approach, as presented in Figure 6. The missing parts of satellite-derived PM2.5 are effectively recovered, so that the PM2.5 patterns can be comprehensively investigated. Compared with satellitederived PM2.5 at 17:00 ( Figure 5), we can clearly see Huanggang has a very high level of PM2.5. Furthermore, the PM2.5 distributions in night are also reconstructed, which cannot be directly monitored by Himawari satellite.

CONCLUSIONS
To sum up, the hourly Himawari-8 observations are adopted to greatly improve the time efficiency of PM2.5 monitoring. Furthermore, a spatio-temporal fusion model is applied to the fill the data gaps using satellite-derived PM2.5 in conjunction with ground PM2.5. The results show that Himawari-8 satellitebased deep learning model achieves a satisfactory performance (cross-validation R 2 =0.80, RMSE=17.49 3 / gm  ). The missing data in satellite-derive PM2.5 are accurately recovered, with R 2 between recovery results and ground measurements of 0.75. This study has provided an effective strategy for the real-time and seamless monitoring of ground PM2.5.