IDENTIFYING THE DRIVING FACTORS OF POPULATION EXPOSURE TO FINE PARTICULATE MATTER (PM2.5) IN WUHAN, CHINA

Characterizing the spatiotemporal dynamics of population exposure to fine particulate matter (PM2.5) and the underlying external forcing can provide proactive implication for public health precautions. In this study, satellite-derived surface-level PM2.5 concentration as well as landscape factors and socioeconomic data are collected to identify the inter-annual variations and potential driving forces of population exposure to fine particulate matter (PM2.5) in Wuhan, China from 2000 to 2015. The fine-scale PM2.5 exposures in 2000, 2005, 2010 and 2015 were first estimated. Then the contributions of landscape factors and socioeconomic forcing are quantified by a machine learning method (i.e. Random Forest). The results revealed that the population in Wuhan faced increasing and more clustering PM2.5 threats from 2000 to 2010. Then a weakened and dispersed health threat of PM2.5 was witnessed in 2015. In general, the Gross Domestic Product (GDP) contributed the most to high-level PM2.5 exposure in the period of 2000-2015, i.e. variable importance (VIM) equalled to xxx. Among all the biophysical and landscape characteristics, the percentage of urban landscape (PLAND_UA) and urban area fraction were attributed the most to the PM2.5 population exposure. In parallel, precipitation played a crucial part in the mitigation of PM2.5 exposure. The identification of inter-annual dynamics of population PM2.5 exposure and the underlying forcing can facilitate the decision making and epidemiological precautions in the evaluation and alleviation of population exposure risks.


General Instructions
With the unprecedented socioeconomic development, there are more serious air pollution has been witnessed in mainland China. In 2015, only 21.6% municipalities (73of 338) in China achieved the second-level ambient air quality, and most municipalities (78.4%) exposed to the worse ambient air quality (Cheng et al., 2017). The primary air pollutants in China has been characterized as fine particulate matter (PM2.5, 66.8% of all the pollutants), particulate matter (PM10, 15.0%) and ozone (O3, 16.9%). PM2.5 (particle size less than 2.5μm in the aerodynamic diameter) has been explored to be associated with numerous adverse health outcomes, e.g. lung cancer , hypertension , neonatal jaundice (Zhang et al., 2019a), pregnancy abortion (Zhang et al., 2019b) and even premature death (Cohen et al., 2017). In 2014, nearly all the population in China exposed to high-level PM2.5 concentration, which exceeded the World Health Organization (WHO) Air Quality PM2.5 Interim Target-1 level (IT-1, annual mean of 35 mg/m3) (Fang et al., 2016;Ma et al., 2014;Song et al., 2019). Furthermore, the high level PM2.5 concentration (15-35μg/m 3 ) and potentially associated premature death (up to 1.2 million) has been witnessed in China (Peng et al., 2016). With this in mind, it has become urgent to estimate population exposure to PM2.5 with the aim of ensuring the environmental sustainability in China by establishing responses to serious air pollution.
Numerous epidemiological studies has been implemented to explore and establish associations between PM2.5 concentration and underlying health outcomes, especially with geographical focus on China. However, such studies are limited in China for the lack of spatially and temporally dense in-site PM2.5 observations. There was no nationwide official in-site PM2.5 observations prior to 2013 and only exists scattered sites established and operated by research groups. In this, the PM2.5 population exposure was conventionally estimated by assigning the spatially sparse observations within neighbouring communities (kilometres to tens of kilometres) Song et al., 2019). Such coarse estimation may cause underestimation or overestimation of health risks posed by PM2.5 concentration (Geng et al., 2015;He and Huang, 2018;Ma et al., 2014). A nationwide PM2.5 monitoring network has been established in China since 2013 to provide more spatially and temporally continuous PM2.5 measurements covering nearly all the municipalities. Nevertheless, even the spatial and temporal coverage of this point-observation network is incapable to provide regional PM2.5 concentration (Kloog et al., 2011;Li et al., 2017b). Therefore, more supporting measurements are needed to support the further investigation of PM2.5 concentrations and underlying health risks (van Donkelaar et al., 2016;van Donkelaar et al., 2006).
Satellite remote sensing opens the door for the insights into spatially and temporally continuous surface-level PM2.5 concentration. Satellite-derived aerosol optical depth (AOD), which measures aerosol caused scattering and absorption light extinction in the column, has been examined to empirically and physically related to surface-level PM2.5 concentration. The estimation of surface-level PM2.5 concentration based on column AOD and point PM2.5 measurements are mainly achieved in two approaches: (1) the statistical models (Li et al., 2017a;Li et al., 2017b;Yao et al., 2019) and (2) chemical transportation model (CTM) based methods (Geng et al., 2015;van Donkelaar et al., 2016). The statistical method can achieve simple but relatively accurate PM2.5 estimation by introducing spatially explicit model and advanced regression model. However, the statistical AOD-PM2.5 relationship relies on the emissions and meteorological factors, thus limiting the extendibility of statistical methods in other sites (Geng et al., 2015;van Donkelaar et al., 2006). The CTM-based methods can provide spatial and temporal PM2.5 patterns considering AOD/PM2.5 ration and chemical components transformation and transportation. Nevertheless, the spatial scales of CTM-estimated PM2.5 patterns are usually coarse (He and Huang, 2018;van Donkelaar et al., 2006). To combine the merits of statistical and CTM-based methods, van Donkelaar et al. (2016) publish a global surface-level 0.01°× 0.01° PM2.5 estimation datasets estimated by combining multi-source satellite AOD product (including MODIS, MISR and SeaWiFS) and GEOS-Chem model.
Numerous studies have explored the associations between PM2.5 concentration with natural and socioeconomic factors using spatially explicit models and machine learning methods. Yang et al. (2018b) quantified the impacts of climatic and socioeconomic factors on PM2.5 pollution in China using in-site PM2.5 measurements. Yang et al. (2018a) established the association between fine-resolution PM2.5 patterns and impact factors using random forest (RF). However, only a few attempts have been made to examine the underlying forcing of population PM2.5 exposure. Thus, urban planning counter measures and epidemiological precautions can not be properly established to reduce health risks. Keeping this in mind, this study aimed to characterize the temporal dynamics of surface-level PM2.5 concentration and identify the driving mechanisms and principle determinants in Wuhan, China, which may contribute to the PM2.5 pollution reduction and mitigation from a local perspective.

Study case
In this study, Wuhan, the economic and industrial center of central China, has been selected as the study case. Wuhan is famous for its convenient transportation and outstanding industrial achievements. In 2018, the Gross Domestic Product (GDP) of Wuhan was up to 1484.73 billion China Yuan (CNY), ranking the first in central China and 9 th in mainland China. In parallel, Wuhan has experienced the sever air pollution in the last decade. In 2018, the air quality in Wuhan is "mediate" or "poor" on nearly a third of the days (116 days

PM2.5 dataset
The annual mean PM2.5 patterns in Wuhan were collected from the public dataset of Global Annual PM2.5 Grids from MODIS, MISR and SeaWiFS Aerosol Optical Depth (AOD) with Geographically Weighted Regression (GWR), v1 (1998 -2016) provided by the Socioeconomic Data and Applications Centre, National Aeronautics and Space Administration (NASA). In this datset, the AOD products from Moderate Resolution Imaging Spectroradiometer (MODIS, Multi-angle Imaging Spectroradiometer (MISR) and Sea Viewing Wide Field-of-View Sensor (SeaWiFS) are combined with GOES-Chem model to establish the AOD-PM2.5 linkage, and GWR has been introduced to adjust the bias of satellite-derived PM2.5 concentration using Aerosol Robotic Network (AERONET) measurements. This dataset is claimed to have an ideal accuracy and has been used in many PM2.5 related studies Peng et al., 2016;van Donkelaar et al., 2016).

Population distribution
In this study, 1km resolution LandScan population data was used to delineate the population distribution within Wuhan from 2000 to 2015. The LandScan population data produced with census data and other multiple source spatial data (https://landscan.ornl.gov/documentation).

Gross Domestic Product (GDP)
The 1km gridded GDP data was collected from the website of Resource and Environment Data Cloud Platform (http://www.resdc.cn/Default.aspx). The gridded GDP was estimated based on the prefecture-level GDP census data and night-time light (NTL) as well as land cover maps (Liu et al., 2005).

Natural factors
In this study, elevation characters (Digital Elevation Model, DEM), annual mean precipitation, annual mean land surface temperature (LST), as well as urban landscape composition and configuration metrics are adopted to investigate the impacts of natural forcing on PM2.5 exposure.

Digital Elevation Model:
The 90-meter Shuttle Radar Topography Mission (SRTM) DEM product released by NASA was collected as the elevation independent variables of PM2.5 concentration in 2000, 2005, 2010 and 2015. To be consistent with PM2.5 data, the 90-meter DEM was aggregated into 1kilometer resolution in ArcGIS 10.6 software platform.

Precipitation:
To obtain the annual mean precipitation in 2000-2015, the precipitation of more than 2400 meteorological station are annually averaged at first. Then the spine line interpolation in ANUSPLIN software was used to interpolate the point measurements into 1km resolution spatial continuous annual mean precipitation patterns.

Land Surface Temperature:
Daily Terra MODIS Collection 6 LST products (MOD11A1) was collected from Level-1 and Atmosphere Archive & Distribution System Distributed Active Archive Centre (LP DAAC), NASA. The poor quality LST observations and null pixels are first eliminated from the raw data according to the Quality Control (QC) flag, and the annual mean LST was derived from the screened LST products.

Urban Landscape Composition and Configuration:
To derive the urban landscape composition and configuration metrics, the 300-meter resolution land cover dataset provided by European Space Agency (ESA) Climate Change Initiative (CCI) was collected. The ESA CCI land cover (LC) maps were derived from Medium Resolution Imaging Spectrometer (MERIS) Surface Reflectance (SR) time series with the aid of multisource satellite imageries (including AVHRR, SPOT-VGT and PROBA-V). This dataset provides global maps describing the land surface into 22 classes, which have been defined using the United Nations Food and Agriculture Organization's (UN FAO) Land Cover Classification System (LCCS). In this study, this dataset has been reclassified into three categories: urban areas, vegetation and water bodies.

Population weighted PM2.5 exposure
Since the surface-level PM2.5 concentration (He and Huang, 2018;Kloog et al., 2011) and potential population exposure Song et al., 2019) are both claimed to spatially and temporally varied, the population weighted PM2.5 exposure is believed to be more persuasive in representation of PM2.5 caused health risks Lin et al., 2016;Song et al., 2019). The population weighted PM2.5 concentration is calculated as (Aunan et al., 2018;Chen et al., 2018): where the is the population of a specific pixel , indicates the surface-level of a pixel, N is the total number of pixels within Wuhan.

Random Forest (RF) regression
In this study, to explore the driving forces of PM2.5 population exposure in Wuhan, the RF regression has been adopted to quantify the impacts of external drives. As a robust and commonused machine learning method, RF has been widely adopted in environment-related studies (Yang et al., 2019;Yang et al., 2018a;Zhang et al., 2018). As an extension of the decision tree regression, the samples of each decision tree in the RF are obtained from the training set through the relocation sampling. To minimize the out-of-bag error (OOB Error), RF is capable to evaluate the importance of each variable in a specific regression using the increase of mean squared error (%IncMSE) as an quantitative measurement (Breiman, 2001).

Validation of satellite PM2.5 data
In this study, the precision of satellite-based PM2.5 estimation has been evaluated by 10 in-situ PM2.5 observation. The root mean square error (RMSE) is adopted as the accuracy measurement of PM2.5 estimation. The overall RMSE of PM2.5 estimation in 2015 is 5.32μg/m3. Furthermore, in 2015, the absolute error has been calculated at every monitoring sites. As revealed in Figure 2, the satellite derived PM2.5 concentration data demonstrated higher accuracy in sub-urban areas (2.08μg/m 3 ) than in urban areas (6.22μg/m 3 ). In addition, along the southnorth direction, the absolute errors of PM2.5 estimation gradually increased.

The inter-annual dynamics of PM2.5 exposure in Wuhan
As shown in Figure 3,    The spatial patterns of population exposure to PM2.5 in Wuhan has been shown in Figure 4 and the corresponding statistical information is listed in Table 3.    (Table 3). On the one hand, as one of the most developed areas in Wuhan, Hankou has the largest population and the highest population density. On the other hand, Hankou also acts as the economic and commercial center of Wuhan. More socioeconomic activities can be seen in Hankou than other district, which may be partly attributed to the sever PM2.5 exposure in 2010. In parallel, more severe PM2.5 exposure has been witnessed within Hankou in 2010 and 2015 compared with 2000 and 2005. The spatial distributions of PM2.5 exposure risk in 2010 and 2015 are quite resemble. Particularly, in 2005, although the downtown area in Wuchang and Hankou did not suffer with serious PM2.5 pollution, Hanyang and sub-urban areas exposed most to PM2.5 in the study period. Overall, PM2.5 exposure in Wuhan significantly increased from 2000 to 2010 and then decreased in 2015.

Impacts of natural and socioeconomic factors on PM2.5 exposure
To comprehensively and quantitatively explore the impacts of natural and socioeconomic factors on population exposure to PM2.5 in Wuhan, three categories factors are included in this study: (1) natural variables, including annual mean precipitation (PRE), annual mean LST and digital elevation model (DEM); (2) landscape metrics (composition, aggregation index, edge density and shape index) of urban built-up areas, vegetation and water bodies; (3) socioeconomic factors (GDP). The statistics of impact factors (including natural, socioeconomic and landscape factors) in 2000-2015 are respectively reported in Table 4 To quantify the impact of natural and socioeconomic factors on PM2.5 population exposure, the RF regression has been introduced to establish the association between population weighted PM2.5 concentration and external factors. The variable importance (VIM) has been quantified in the RF regression. A high VIM value indicate that the corresponding variables exert significant impacts on the dependent factors (Yang et al., 2018a;Yao et al., 2017;Zhang et al., 2018). The calculation of VIM can be checked in the references (Yang et al., 2018a;Zhang et al., 2018). The VIMs (%IncMSE) of natural and socioeconomic factors are shown in Table 8. As listed in Table 8, precipitation exerted the most significant impacts on PM2.5 pollution, and annual mean LST also influenced the PM2.5 concentration. However, inconsistent with our previous work (Yang et al., 2018a), the topographical features did not associate with PM2.5 pollution at the municipality scale. Such results might be explained that downtown areas are more flat and locate in lower elevation compared with rural surroundings.
The %IncMSE of landscape factors listed in Table 4-7 indicated that PLAND_UA had more significant impact on PM2.5 exposure in Wuhan. However, PLAND_VEG and PLAND_WB have stronger association with PM2.5 concentration than PLAND_UA in 2015. In addition, in 2005, not all the landscape metrics had association with PM2.5. In general, a conclusion can be draw according to the results recorded in Table 4 that percent cover of landscapes have more significant effects on PM2.5 pollution. However, GDP is a much stronger driver of PM2.5 exposure in Wuhan (Table 8) than natural and landscape factors.

CONCLUSION
This study synchronously examine the impacts of natural factors (precipitation, land surface temperature, elevation), landscape metrics (shape index, edge density, aggregation index and landscape percent cover) and socioeconomic driver (GDP) on satellite-derived PM2.5 population exposure. The impacts of external factors on PM2.5 health risks has been quantified using a machine leaning method, i.e. Random Forest (RF) regression. The main findings can be summarized as: 1. The health risk posed by PM2.5 exposure in Wuhan increased from 2000 to 2010, especially in downtown areas of Hankou and Wuchang. Moreover, the PM2.5 population exposure decreased in 2015.
2. The socioeconomic development (characterized as GDP increase) in Wuhan from 2000 to 2015 was accompanied with sever PM2.5 pollution.
3. Among the natural factors, precipitation had the strongest association with PM2.5 population exposure. The topographical features did not exert significant influence on PM2.5 exposure in Wuhan. 4. Although the relationships between PM2.5 and landscape metrics were quite weak, the composition of urban landscape did have impacts on PM2.5 exposure.
In the future works, the spatial and temporal big data (such as social media data (Song et al., 2019)) can be introduced to improve the veracity and reliability of PM2.5 exposure assessment. Furthermore, more insightful climatic factors, including boundary layer, air pressure, wind direction and wind speed, can be helpful the external forcing identification of PM2.5 dynamics.
long-term exposure to ambient PM1 with hypertension and blood pressure in rural Chinese population: The Henan rural cohort study.