DESIGN AND RESULTS OF AN AI-BASED FORECASTING OF AIR POLLUTANTS FOR SMART CITIES

This paper presents the design and the results of a novel approach to predict air pollutants in urban environments. The objective is to create an artificial intelligence (AI)-based system to support planning actors in taking effective and adequate short-term measures against unfavourable air quality situations. In general, air quality in European cities has improved over the past decades. Nevertheless, reductions of the air pollutants particulate matter (PM), nitrogen dioxide (NO2) and ground-level ozone (O3), in particular, are essential to ensure the quality of life and a healthy life in cities. To forecast these air pollutants for the next 48 hours, a sequence-to-sequence encoder-decoder model with a recurrent neural network (RNN) was implemented. The model was trained with historic in situ air pollutant measurements, traffic and meteorological data. An evaluation of the prediction results against historical data shows high accordance with in situ measurements and implicate the system’s applicability and its great potential for high quality forecasts of air pollutants in urban environments by including real time weather forecast data. * Corresponding author


INTRODUCTION
Air pollution is still one of the main public health issues in cities worldwide. It affects not only human health, but also the environment. Biodiversity as well as functioning terrestrial and aquatic ecosystems are threatened by air pollutants. Main sources for anthropogenic air pollution are traffic, industry, agriculture and combustion processes (EEA, 2019: 8, 15, 71 ff.;Arndt, 2012: 87;WHO, 2015;UBA, 2018). Due to legal regulations, new environmentally friendly technologies and the use of low-emission fuels, all air pollutants, except for groundlevel ozone, could be decreased by at least 40 % since the base year 1990 (UBA, 2020a). Nevertheless, the European Environmental Agency focuses on three air pollutants which have currently the highest impact on human health of citizens in Germany. Particulate matter (PM), nitrogen dioxide (NO2) and ground-level ozone (O3) are responsible for serious neurological, cardiovascular and respiratory diseases (Dora et al., 2011: 2;EEA, 2019;Brunekreef and Holgate, 2002;Schneider et al., 2018). Therefore, one of the main objectives of municipal authorities and environmental agencies is to reduce air pollution with adequate measures. Because of the complex interaction between air pollutants among themselves (e.g., NO2 -NO -O3) as well as with external influences (such as the weather conditions, traffic amount, and land use) it is not a trivial task to predict and simulate air pollution concentrations. However, in order to conduct adequate measures it is crucial and important to support local decision makers in traffic, urban, and environmental planning with reliable information on the air pollution variability in time and space. The WHO air quality guidelines (WHO, 2006) assess the risks of air pollutants and suggest the following maximum annual mean values: 10 µg/m 3 for PM2,5; 20 µg/m 3 for PM10; and 40 µg/m 3 for NO2. The European Union enacted several agreements and directives. The Directive 2008/50/EC on ambient air quality and cleaner air for Europe, which was enacted in 2018, defines limiting and target values for each air pollutant. The daily mean of 50 µg/m³ PM10, for example, is limited to 35 days per year, the annual mean should not exceed 40 µg/m³. NO2 concentrations should not exceed the hourly limit value of 200 µg/m³ for more than 18 hours per year, in addition the annual mean is limited to 40 µg/m³. The directive does not define limiting values for groundlevel ozone. It only states a long-term objective of a maximum daily 8-hour mean of 120 µg/m³ (39. BImSchV; EEA, 2019: 7, 12). Spot-based measuring stations detect and observe the air quality in Europe. Those measuring stations are mainly located in hotspot areas, such as roads and intersections with a high traffic volume (cf. Petry et al., 2020).

PROBLEM STATEMENT
Cities worldwide are facing similar problems concerning finding a balance between quality of life (including air quality) and providing the necessary traffic services, industrial and economic activities, which are, in turn, the main contributors to air pollution in urban areas. To employ adequate short-term measures, it is necessary to better understand the interactions between air pollutants and their influencing parameters.
There is a strong linkage between weather conditions and air pollutants. There are seasonal fluctuations (winter vs. summer) as well as short-term weather fluctuations. In general, a higher emission of PM and NOx in winter, due to low temperatures and increased heating, and an increased photochemical degradation of NOx in summer is observed (Plaß-Dülmer, 2020). During the observation period from 2015 to 2018 the Leibniz-Institute for Tropospheric Research (TROPOS) observed a strong influence of wind speed, height of mixing layer and humidity to the NO2concentration as well as a strong influence of high temperatures and relative humidity on O3-concentration in their study areas in Saxony, Germany (Pixteren et al., 2020). Other studies observe additionally a strong linkage between global radiation and O3. High O3-concentration occur especially on hot and sunny days during summer (Stroh and Marb, 2020).
The retention period of PM in the air is strongly influenced by many weather parameters, for example precipitation, wind velocity, wind direction, nightly ground inversion and highpressure weather conditions. Therefore, especially dry winter weather situations with limited air exchange can result in high PM-concentrations in Germany (UBA, 2020b). In addition, low temperatures lead to higher PM-concentrations due to increased emissions from domestic heating.
The complex chemical and photochemical interaction between NO, NO2 and O3 is not only influenced by weather parameters, but also by traffic volume. A high traffic volume, for example, results in high NOx emissions that react with O3. Nevertheless, O3 concentrations are significantly higher in rural areas than in urban areas with high traffic volumes (Pixteren et al., 2020;Stroh and Marb, 2020;Behera and Balasubramanian, 2016).
The aim of this study was to design a system that forecasts pollutant values for 48 hours at the location of the pollutant measurement stations. This is important to inform decision makers for taking and announcing short-term measures, such as pop-up low emission zones, road cleaning, decreasing speed limits, increasing parking fees or free public transportation. To achieve this, a deep neural network was trained on historic air pollutant and weather data.

Research Questions
The main objective of this study is to develop and evaluate new AI-based approaches to support decision makers in finding the right measurements to improve air quality in cities and regions. Therefore, the following research questions are posed: • How can an AI-based system be designed to integrate real time measurements with meteorological and traffic data to forecast and simulate the variability of air pollutants in urban environments?
• What input parameters are relevant? What is their influence on the quality and validity of the forecast?
• What variant of DNNs is best suited to model time series of air pollutants and their interdependencies with meteorological data?

State of the Art
Limited area models have been developed to assess and forecast air pollution concentrations considering the most relevant chemical and physical processes (e.g. Kukkonen et al., 2012, for an overview). The combined use of weather models, chemical transport models or particle dispersion models enables the spatiotemporally consistent (synoptic) simulation and forecast of chemical and physical quantities from global to regional and local scales. On the European scale, the Copernicus Atmosphere Monitoring Service (CAMS) delivers daily air quality forecasts at a spatial resolution of 0.1 ° (~ 10 km) (COP, 2021). The CAMS European Forecasting Ensemble is an ensemble of seven chemical transport models described in more detail by Marécal et al. (2015). For air quality forecasting at a higher spatial resolution for Germany and regions therein, the air quality modelling system Polyphemus/DLR operationally provides predictions of up to 72 hours at spatial resolutions of 2 km down to 500 m (Khorsandi et al., 2018;Mallet et al., 2007).
Using data assimilation methods, models and observations from in-situ measurements or satellites can be merged to generate best estimates of the atmospheric state and to improve model forecasts (e.g. Talagrand, 1997). However, a large amount of the variability of air pollution occurs on very small scales (Lefebvre et al., 2013). Consequently, the limiting factors of chemical and physical models are: they require an accurate and highly resolved initialization of static (topography) and dynamic (meteorology, emissions, land use) fields, a good representation and parametrization of processes, a numerically stable solver and last but not least massive computing capacities to resolve effects in complex terrain or urban environments. The results are prone to model-based uncertainties that are difficult to quantify.
Using artificial intelligence this study proposes a different approach to simulate and forecast air quality at spot-based measurement stations as well as at city level. The quality of the prediction of an AI-system is based on the quality of input data. Therefore, only air pollutant data from official measuring stations with continuous data is considered, so far. For time series analyses, sequence-to-sequence models with recurrent neural network (RNN) have been widely used. RNNs are a special class of deep neural networks, which use an internal loop to propagate information between tokens of an input sequence. Since RNNs as such have problems in representing long term relationships in the input sequences due to vanishing or exploding error signals, various architectural improvements have been suggested to deal with this problem. Long-Short-Term-Memory (LSTM) (Hochreiter, 1997) and Gated-Recurrent-Unit (GRU) (Cho, 2014) both propagate an additional cell state between input tokens, that can be controlled via gating-parameters learned by the network to enable it to keep information over longer time intervals. Sequence-to-Sequence models, originally developed for language translation tasks (Sutskever, 2014, Cho, 2014, have also shown promising results in time-series-forecasting problems (Tang, 2016, Lai, 2018, Shih, 2019. The model consists of two parts: An encoder network processes past time steps and produces an encoding, the so-called hidden state. A decoder network receives this hidden state as input and produces the forecast step by step in an autoregressive loop.

Parameter Selection and Study Areas
The objective is to test, train and evaluate the system in two areas in Germany and to make it transferable to other areas in Germany. The study areas are the City of Stuttgart and the federal state of North Rhine-Westphalia (NRW), which have been described in Petry et al. (2020). As the data of the measuring stations in NRW is more consistent, the following examples refer to stations in NRW. The official measuring stations are classified on the basis of their location (such as urban, suburban, rural) as well as on the predominant emission source (such as traffic, industrial, or background, which are representative of the general exposure of the public and the vegetation) (EEA, 2019: 11).
The following figures display results from different measuring stations in NRW. The first station VWEL represents a typical urban station with a high volume of traffic, situated in Wuppertal-Elberfeld directly at a four-lane road. Multi-storey residential and commercial buildings are located on both sides of the street. Wuppertal is located close to Köln, Düsseldorf, Duisburg, Essen and Dortmund, which has an indirect impact on the traffic volume and emissions. The second station EIFE represents a rural background station. It is located in the municipality of Simmerath-Lammersdorf on a field path, the nearest federal road is 200 meters away. It is surrounded by spruce forests and meadows used for cattle-grazing (LANUV, 2021).
Urban traffic stations do often not collect data f O3. Therefore, a third station, WALS, had to be selected to be able to compare O3 concentration of a rural with an urban station. WALS is located in front of a school building and in the middle of a residential area in Duisburg-Walsum. A coal-fired power station, a coking plant, a steel mill and a paper mill are nearby. This is why it is classified as an urban industrial station (LANUV, 2021).
To design a system that predicts the pollutant load for the next 48 hours, the relevant input parameters for determining the pollutant loads of NO, NO2, O3, PM10 have to be identified. For this purpose, the results of previous studies were used.
The main polluters are traffic, combustion, industry and agriculture. Since pollution is mostly of importance in urban regions, traffic and pollution from heating play a role here. Traffic is causally responsible for the pollution of NO, NO2 and PM10. Heating systems mainly produce PM10. These polluters show a temporal characteristic that reflects the temporal behaviour of certain pollutants.
The weekly course of the pollutants NO and NO2 shows a behaviour that corresponds to the traffic volume (see figure 1; station VWEL). On weekdays there are usually two maxima representing rush hour traffic, on weekends this curve flattens out. Public holidays "behave" like Sundays.
The PM10 load has few fluctuations during the course of the day, but is higher in winter than in summer; this is due to the additional load caused by heating (see figure 2). The comparison of the values of the stations EIFE and VWEL shows that densely populated areas have a higher PM10 load. Ozone pollution depends on many factors; it has a specific daily pattern that is independent of the day of the week. There is rather a relationship to solar radiation and temperature. This also means that the ozone pollution is significantly higher in summer than in winter (see figure 3).  Other factors are complex chemical reactions, such as 1 O3 + 3 NO = 3 NO2 (Pixteren et al., 2020). This equilibrium reaction in turn depends on weather conditions, mainly temperature.
This effect means that ozone pollution in rural areas can be higher than in urban areas, since the NO, which is caused by traffic, binds ozone (see figure 4). The daily course correlates with a small delay with the solar radiation. However, the minima at the urban station are much lower than at the rural station. This is due to the interaction with NO, which binds O3. Considering the long-term development is important, too, since the levels of NO, NO2 and PM10 have continuously decreased over the last ten years. This is probably due to more environmentally friendly technologies. The COVID-19 pandemic plays a special role here, as shown in Figure 5. The pandemic has led to a significant reduction in overall pollution (Erbertseder and Loyola, 2020). The temporal behaviour is therefore an important input parameter for predicting the pollution load. In detail, these are hour of day, day of week, holiday, month and year. Weather conditions play a significant role, directly and indirectly (Plaß-Dülmer, 2020).
The following conditions have a direct influence: temperature and radiation affect the ozone-nitrogen oxides equilibrium; precipitation washes out pollutants (UBA, 2020b); and higher wind speeds disperse pollutants faster and reduce peak loads.
Indirect influence on the pollution load have the following conditions: combustion engines produce more nitrogen oxides when it is colder than at higher temperatures; at lower temperatures, more heating takes place, which leads to a higher PM10-concentration. Thus, temperature, precipitation, humidity, wind speed, wind direction, and sunshine duration are important input parameters for the analysis and forecast. The surroundings of the measuring stations such as the presence of roads, their traffic volume (number of lanes, traffic density), the density of buildings, the population density, the presence of vegetation in the form of forests, parks, green spaces and gardens are further influencing factors (cf. the values of the urban and rural station in figure 1).
In order to represent these factors, which are characteristic for the respective location, the course of the pollutant concentration together with the weather and time parameters of the previous week were used as input for the analysis. This also takes into account longer-term influences, such as the impact of the COVID-19 pandemic. Of course, this is only possible at the locations where measuring stations are available.
The influencing parameters are very diverse. There are interdependent and very complex effects that can hardly be represented with simple statistical methods. Newer methods that can depict such complex relationships are deep neural networks (DNNs). Therefore, a special DNN was designed, which is described in the next chapter.

PROPOSED MODEL
For this task, a sequence-to-sequence encoder-decoder model (see figure 6) has been designed and implemented, following roughly the approach described by Tang et al. (2016). Encoder and decoder are both n-layer GRUs (Gated Recurrent Units), and the decoder has in addition two linear output layers with a rectified linear unit (ReLU) in between. The network produces a so-called hidden state after each time step which is used as input for the computation of the next time step. Thus, information can flow from earlier to later time steps, enabling the model to learn dependencies and recurring patterns in the dataset. After having read all the inputs sequentially, the encoder produces n hidden states and the ith layer of the decoder takes the ith produced hidden state as an input hidden state. So in the first decoding time step, the decoder receives as input the hidden states from the encoder, the last measured pollutant values and additional time and weather input data (see next section). Then, in an autoregressive loop, the decoder takes its own output from the last time step as input for the prediction of the next time step, in addition to the last produced hidden state and additional input. In this way, the whole 48 hour output sequence is produced step by step. This approach has the advantage of being able to handle variable lengths of the input and output data, since the GRU in both encoder and decoder can be unrolled to the desired length. The split into an encoder and a decoder also provides the model with a natural modularity. Encoder and decoder can have completely different inputs and can be modified independently from one another. This is particularly relevant for our project, since the model has to deal with many heterogeneous data sources that have to be integrated into the model in multiple development cycles.  Table 1 gives an overview over the input data used to train the model. Historical pollutant load data from 2010 until 2020 have been acquired from 52 measuring stations across North-Rhine-Westphalia (NRW) and four measuring stations in Stuttgart. All the stations in NRW collect data of nitrogen monoxide (NO), nitrogen dioxide (NO2) and particulate matter (PM10), 20 stations collect ozone data (O3) in addition. The Stuttgart stations only have consistent measurements of NO2. For the weather data, data has been collected from stations of German Weather Service (DWD) across NRW and around Stuttgart. The values used for our dataset are temperature, precipitation, sunshine duration, wind speed, wind direction and amount of rain. Since the weather stations are in different locations than the pollutant measurement stations, the weather data at the pollutant stations had to be interpolated. For this, an inverse distance weighting has been applied to the values from weather stations in a 25 km radius around the pollutant load stations. All data is collected in hourly resolution, missing data has been linearly interpolated for up to 3 hour wide gaps.

Dataset Description
As described in the previous section, the statistical analysis of the dataset suggests that recurring patterns over a year, month, week or day are characteristic. To provide information about this to the network, cyclical time features were created by encoding the month, day, weekday, hour and wind direction as sine and cosine values, so that the network can recognize, for example, the hour 23 being close to hour 0. The cited weekend effect also applies to work free days in general, so a flag value was created stating if the considered day is a holiday or not. For the training dataset, a mean normalization was applied for each feature. Overlapping sequences were created with 168 hours (one week) of past input data for the encoder with pollutant load, time and weather data and 48 hours of subsequent weather data and time input data for the decoder plus 48 hours of corresponding pollutant load labels. For splitting these data into training, validation and test data set, there was a trade-off between the amount of randomization and hence least possible bias in the split, and the amount of training data lost due to the splitting, because the 216 step long sequences have to be taken from continuous time regions and cut off at the borders between regions going to different datasets. We decided to split the data according to months, choosing for each station for each year one month at random going to validation and one month going to test data set, and the rest going to the training data set. Thereby the training, validation and test data set ratio is about 10/12 to 1/12 to 1/12. For all stations of NRW this added up to 2,299,349 samples in training, 169,523 in validation and 168,183 in test sets. These sequences have been shuffled and batched before each training epoch.

Training
Three separate models have been trained: One on data from all the NRW-stations producing a forecast of NO, NO2 and PM10, the second on data from only the NRW stations where O3 measurements were available, producing a forecast of O3 in addition to the other values and the third model for the four stations in Stuttgart, producing a forecast of NO2.
Due to the unknown complexity of dependencies between the input parameters, many different combinations of hyperparameters were explored, as shown in Learning rate 0.0001-0.01 Table 2. Explored parameters of the network and the training

Technical Details
The server for the analysis is running on Linux Ubuntu 18.04.1. All of the programming for the project was done with Python 3.8. The data are stored in a PostGreSQL database and accessed via the Python library SQLAlchemy. Data preparation and preprocessing was done using mainly Numpy and Pandas. The neural network model, training and prediction was implemented with PyTorch 1.4.

Results
The most important evaluation metric to compare the models is the mean absolute prediction error (L1) on the test data set. Additionally, the produced curves were evaluated visually see how good they capture the variability of the data. The following combination of hyper parameters yielded optimal results according to these criteria (see   Figure 9 shows an example of a PM10 prediction where the measured values were below the minimum detection threshold of 10 µg/m 3 PM10. Here, the network predicts a smooth curve but stays close to the constant minimum. The initial results of the trained model show a consensus with the studies discussed in chapter 2. By adding the weather data to the model, the influence of the individual datasets becomes apparent and the prediction is getting more accurate.
To assess the influence of the weather conditions on the prediction quality, the model was run without weather data.

CONCLUSION AND FUTURE RESEARCH
In this study we have successfully developed and trained a basic sequence-to-sequence model to forecast air pollutant concentrations at the location of ambient air monitoring sites. An initial evaluation with measurements (target) shows a fairly good agreement also considering the mean absolute prediction error.
In a next step, the model shall be further improved by first making use of additional data sources, namely topographic information, satellite-based observations and traffic data. Second, further architectural refinements shall be explored, such as adding residual connections and applying an attention mechanism. Third, the model shall be adapted to perform a forecast on arbitrary locations across the considered regions. For this, the network shall be retrained on the station locations, but this time the pollutant concentrations will be removed from the encoder input and additional data from other sources, namely topography, land use, weather, traffic and satellite observations of trace gases will be added. The measurement stations will provide real pollutant load values as labels for the training. Since the topographical, satellite, traffic and weather data are available all over the related regions, the network can then perform a forecast anywhere in those regions. The codesign with local experts and administrative authorities ensures a tailored system according to their needs and fosters the applicability and acceptance of the system. Further research comprises the integration of additional data sources, the integration of the forecast capabilities in local applications, as well as the use of the trained model to simulate urban air quality under different scenario conditions.