SEASONALITY DEDUCTION PLATFORM: FOR PM10, PM2.5, NO, NO2 AND O3 IN RELATIONSHIP WITH WIND SPEED AND HUMIDITY

Human and ecosystem health is affected by the risk of air pollution. A comprehensive understanding of the parameters generating pollution and governing their nature in time is essential to devise functional policies focusing on minimising the concentration of the pollutants. The effect of pollution parameters on meteorological data and existing in between relationships, have been the focus of the researcher’s planning of better city future. Thorough study of resources utilisation is required for contributing to framing effective, sustainable development, government policies management, and advance public services convenience. For protecting the environmental quality, renewable resources like solar and wind are more incorporated in techniques supporting better city planning. This paper considers the hourly time series Particular Matter (PM) PM2.5 and PM10, Nitrogen Oxide (NO), and Nitrogen Dioxide (NO2), and Ozone (O3) along with measured wind flow and humidity. This study’s objective is to assess the temporal seasonality patterns of these parameters in Stuttgart, Germany. The temporal variations over the city center in Stuttgart are analysed using unsupervised approach to perform seasonal hierarchical clustering on a series of parameters NO, NO2, O3, PM10, and PM2.5, wind speed and humidity. Furthermore, the correlations between meteorological and pollution parameters are analysed using the Spearman rank correlation method. Moreover, a dashboard is developed to provide the user desired time frame visualisation of these parameters. Proposed work would provide empirical meaning and seasonality comparison among the above mentioned parameters combined with interactive dashboard support. The analyses of the presented results clearly demonstrates the relationship between air pollutants, wind, humidity together in combine temporal activities frame. Thus, it would help city planner and policies maker with advanced knowledge of seasonality for meteorological and pollution parameters conditions.


INTRODUCTION
The human activities not only contributed to the lifestyle advancement and developments, meanwhile also to pollution, and change in the climate as byproducts. Very small aerial pollutants are discharged from the chimneys, industrial waste, vehicle smokes, and construction sites, that can be inhaled with the air leading to heart diseases, lung and respiratory problems all over the world. The traffic-related pollutants like Particular Matter (PM) PM2.5 and PM10, Nitrogen Oxide (NO) and Nitrogen Dioxide (NO2), and Ozone (O3) remain at a high level. The air quality is affected by, NO, NO2, O3, PM10, PM2.5 and their atmospheric concentrations. The lung tissue damage, cardiovascular and chronic respiratory diseases, could be hassled by coming in exposure to PM10 and PM2.5 i.e., particles with aerodynamic diameters less than 10 and 2.5 µm, respectively (Chen and Zhao, 2011). Over the urban areas, the elevated levels of pollution parameters are incorporated with both local emission sources and regional transportation (Chen andZhao, 2011, Jasen et al., 2013). Regional transportation with diesel vehicles are the main sources of particular matters and contribute a significant portion to their levels (Wallace andHobbs, 1977, Hardin andKahn, 1999). Many studies have been performed to discover the seasonality of the pollution parameters along with the meteorological datasets, e.g., wind speed, wind direction, temperature, humidity, precipitation, pressure. Some existing literature concluded that when the wind speeds are lower than 3.5 m/s, and the temperature higher than 21.1°C than often high concentrations of PM10, and PM2.5 were detected with reference to a study of PM in Ohio USA (Fraser et * Corresponding author. al., 2003, Arthur andOwen, 2003). Moreover, some emphasised on deriving that pollution parameters are correlated to humidity and wind flow during winter (Elminir, 2005). Hien et al. show that wind speed, and temperature highly control the concentration of particulate matter (Hien et al., 2002). Few studies link pollutant characteristics to the meteorological parameters as, with wind effects and humidity again (Garrett and Casimiro, 2011). Several above discussed studies used smoothening and filtering techniques, ignoring the data noise and modifying the originality of temporal dataset. The comprehensive study of meteorological parameters and their contribution to PM10-2.5, NO, NO2, O3 are poorly understood. Above research suggests that there is still a number of questions that remain to be addressed such as temporal wind nature and pollution parameters correlations, how humidity governs the PM10-2.5, and NO, NO2, O3 relationships for user desired time frame, without modifying the authenticity of the original temporal dataset, remain to be addressed. A better insight into the system by improving human interaction with the meteorological data (Harbola and Coors, 2018) in relationship with pollution parameters. Thus this motivates for this proposed research. The problem of air pollution has caused considerable public concern in Stuttgart (Germany). Therefore, investigations into the spatio-temporal variation of concentrations of PM and gaseous pollutants across Stuttgart are necessary and essential. To keep track of the mass concentrations of PM10-2.5, NO, NO2, O3 have been monitored in all important cities of Germany. Data from provincial and more effective center weather monitoring in Stuttgart were selected. Temporal variations of meteorological and pollution parameters were assessed and their trends of vari-ation between each other with respect to time for Stuttgart were investigated. Thus, unsupervised hierarchical clustering and correlation method which work on the original temporal datasets by taking into consideration the above listed gaps, are still required. Therefore the current study proposes hierarchical clustering and Spearman rank correlation method with the following contributions: (i)in depth temporal analysis of pollution and meteorological parameters using hierarchical clustering method, without applying any smoothing and noise removal technique on the collected temporal dataset, (ii) the time-frame of analysis is userdefined, (iii) dendrogram and heatmap temporal dataset visualisation to highlight the behavior of these parameters and to enhance accuracy, and (iv) comparative study of the pollution parameters and their effects with interactive dashboard view. The proposed work would provide foreknowledge of meteorological parameters nature in relationship to pollution parameters of an area, thereby helping and supporting in optimal selection of green sites with highlighting and tuning the air pollution quality. This would encourage more utilisation of renewable energy for safe and better city planning, which in turn would help for efficient management and development of the city's green resources. The increasing air pollution in big industrial cities would be alarmed and reduced for the future with this analysis. The remaining paper is organised as follows, proposed methods and datasets employed are discussed in section 2 and section 3, respectively, section 4 demonstrates the results and discussion, followed by conclusion in section 5.

METHODOLOGY
The proposed method analysed seasonality in seven parameters using hierarchical clustering and Spearman rank correlation. Initially, values of each parameter are preprocessed before applying the clustering. The preprocessing involves normalising of the data followed by temporal filtering. The mean and standard deviation of a parameter are calculated. The values of a parameter are then subtracted by mean, followed by division with standard deviation, to get the normalised value. Further, the temporal filtering is applied on these normalised values. In the current study, the temporal filtering based on four quarters in a year is applied. First-quarter Q1 is spring (March to May), second-quarter Q2 is summer (June to August), third-quarter Q3 is autumn (September to November), and fourth-quarter Q4 is winter (December to February). These four time quarters division help in depth seasonality analysis of the considered seven parameters. Unsupervised agglomerative hierarchical clustering is applied on the temporal dataset (values) of a quarter (i.e. output of temporal filtering). The proximity matrix in hierarchical clustering helps in identifying the similarity of the clusters and combines most similar clusters hierarchically until the desired number of clusters are obtained. Ward's method in hierarchical clustering minimises the variance within the cluster by using the objective function of the error sum of squares (Ward, 1963). The pair of clusters that leads to a minimum increase in total within cluster variance after merging is searched. This increase is a weighted squared distance (D) between cluster centers (Ai, Aj) as shown in equation. 1 (Cormack, 1971). In order to provide more detailed comparison and seasonality trends analysis, each quarter is considered for all the parameters. This has been divided into two sets of 15 days starting and 15 days back in each quarter of a year. The sum of the squares starting from the clusters found by Ward's method is kept minimised. This gives a hint through the merging cost. The number of clusters is kept reducing until the merging cost increases and then used the cluster number, right before the merging cost increased simultaneously (Paul and Murphy, 2009). Moreover, a dendrogram is used to obtain the final number of clusters as k. The dendrogram is a technique of agglomerative hierarchical clustering that gives a tree-like diagram that records the sequences of merges or splits. In addition Spearman rank correlation analysis between the meteorological and pollution parameters helps to derive the relationship among these parameters. Spearman rank correlation is defined in equation. 2, where d 2 represents square the difference, ρ is the correlation coefficient, n is the number of measurements, and k is the number of clusters.
Moreover, an interactive dashboard is developed to provide an in depth analytic and seasonality patterns clarity in between the meteorological and pollution parameters for user desired inputs in the four time quarters. This dashboard is called as seasonality analysis kit. The user could select the parameters over the desired time frame and compare the patterns interactively. The interactive dashboard is still in the first phase and would be more refined in future work. The developed work provides a comprehensive understanding of the relationship among the pollution parameters like NO, NO2, O3, PM10, PM2.5, and the meteorological parameters wind flow and humidity.

DATASET
Stuttgart pollution parameters and meteorological temporal datasets are used in this study. In the corner of Hauptstaetter Strasse 70173 Stuttgart, the historical data from 2015 to 2019 are taken from central Stuttgart station sensor 1 . This dataset contains the wind speed and direction and humidity along with NO, NO2, O3, PM10, PM2.5, with temporal information attached in a 30-minute time interval. Amongst multiple values of a parameter in a single day, the mean value is considered in this study. The areas dataset is organised separately into an individual month by using time information, with past data first, followed by current data then subdivision into four considered quarters Q1, Q2, Q3, and Q4. This helps to perform pollution parameters and meteorological temporal datasets seasonality test and in depth analysis.

RESULTS
The proposed seasonality analysis was implemented using Python and executed with four cores on Intel ® Core TM i7-4770 CPU @3.40 GHz. Stuttgart's 2015 to 2019 years of historical data with a temporal resolution of 30 min was separated by month to create monthly data over the years for both meteorological and pollution parameters. Figure. 1 and Figure. 2, show the data values recorded in a day over the 2015 to 2019 years in a heat maps representation for humidity and NO2 respectively. In these generated heat maps, the intensity of the color was governed by the  magnitude of parameter values. A similar heat map display existed for other parameters as well. Selected parameter (anyone i.e., wind speed and humidity along with NO, NO2, O3, PM10, PM2.5), having higher values (range) over the time, had been assigned a darker color in the respective heat map. An unsupervised approach was used to perform comprehensive seasonal hierarchical clustering on a series of meteorological and pollution parameters. The comprehensive analysis for seasonality was studied based on four quarters (Q1, Q2, Q3, Q4) over the years. In performing the hierarchical clustering, k was taken as 6. This value of k was found empirically by performing some sensitivity tests, like, (i) if the value of k was higher (i.e., number of clusters was equal to the total values in a quarter) than the clustering outcome was similar to Figure. 1 and Figure. 2, and this was not able to represent the seasonality pattern, (ii) if the value of k was lower (i.e., k = 1, 2, 3, 4), then also there was information loss, and iii) the dendrograms were generated as an output from unsupervised hierarchical clustering with the primary use to allocate objects to clusters in the best possible way. Figure. 3 shows the obtained dendrogram for selecting clusters (possible numbers) in the temporal data set, where in this Figure. 3, e.g., the humidity was considered. Similar parameter analyses were conducted for rest of the parameters. Therefore k was taken as 6 in the present study. The unsupervised hierarchical clustering here aimed at inferring the inner structure and trends presented within the meteorological and pollution data, trying to cluster them into six classes depending on similarities among them.
In order to provide a more detailed comparison and seasonality trends analyses, quarter time frames were considered for all the parameters. Further, a quarter was divided into two parts comprising of the first fifteen days and the last fifteen days in a month. This helped in discovering all the possible changes in the quarter for each of the considered parameter. The obtained outputs of the in depth unsupervised clustering analysis performed for NO2, where Figure    Q2 respectively. Similarly, Figure. 8, and Figure. 9 depict the clustering outputs for NO2 for first and last 15 days in Q3, and Figure. 10, and Figure. 11 in Q4 respectively. Like developed hierarchical clustering, similar outputs were generated for other ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume VI-4/W2-2020, 2020 5th International Conference on Smart Data and Smart Cities, 30 September -2 October 2020, Nice, France Further, the correlation analysis between the meteorological and pollution parameters were done to enhance the probability of deriving the relationships among these parameters. Figure. 12 helps to study the complex relationships among parameters very well. In addition, the user could select the parameters over the desired time frame and compare the patterns interactively with the help of the developed dashboard. The screenshots of the proposed dashboard are shown in Figure. 13, where wind speed (e.g., case) was selected as a parameter with respect to Q1, Q2, Q3, Q4 over the years to visualise seasonality. Similarly more parameters could be selected from the seasonality analysis kit.

Discussion
The hierarchical cluster analyses for meteorological and pollution parameters was to highlight the trends at which any given pair of quarters (over the years) joined together in clustering diagram with each class assigned specific color code. A sequential scale of color brewer blues scale color map used for showing classes (0 to 5) with the color frequency differentiates low values class from high values class. The blended progression using, typically of a single hue, from the least to the most opaque shades, represents   low to high values. Each year dataset for the considered parameter over the four quarters that joined together sooner (in clustering) are more similar to each other than those that are joined together later. The total within-cluster variance is minimised dur-ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume VI-4/W2-2020, 2020 5th International Conference on Smart Data and Smart Cities, 30 September -2 October 2020, Nice, France  ing clustering. At each step, the paired clusters with minimum between-cluster distance are merged. As a result it is observed that NO and NO2 concentrations are high in Q3 autumn, and Q4 winter over 2015 to 2019 respectively ( Figure. 8, and Figure. 9, Figure. 10, and Figure. 11). Both are strongly correlated to each other with similar trends over the years, also same can be seen in the correlation graph in Figure. 12. Comparing Figure.    ing 2015 to 2019. These (above) statements also validates that O3 and NO2 are negatively correlated to each other which also supports the obtained correlation in Figure. 12. Further, O3 concentration analysis for Q1 has been shown in Figure. 14, and Fig-ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume VI-4/W2-2020, 2020 5th International Conference on Smart Data and Smart Cities, 30 September -2 October 2020, Nice, France     Figure. 26, and Figure. 27. This shows that humidity is negatively correlated to O3 however, positively correlated to NO2 which also get justified by the correlation graph in Figure. 12. Humidity clustering output delivered that in Q1 (in Figure. 22, and Figure. 23) from 10 th to 12 th , and 26 th to 27 th the humidity measurements are highest in 2015 to 2019. As shown in Figure. 24, and Figure. 25 for Q2 from 3 rd to 4 th , 9 th to 11 th , and 28 th to 29 th highest humidity measured over 2015 to 2019. Moreover from 10 th to 15 th humidity measured lowest in 2015 with sudden increasing spikes in 2016 to 2019 for Q3 (in Figure. 26, and Figure.        In Q4 from 2 nd to 4 th , and 13 th to 15 th in 2015 to 2019 PM10 concentration measured highest. These interpretations (above analyses conclusions) provide a quick facts-crosscheck supporting the present alarming air quality situation in the Stuttgart city and requirement of probable more control measures. In addition, performed correlation analyses on pollution and meteorological datasets helped to uncover the important interrelationships, and also justified clustering analyses outcomes. Figure. 12 contributes following important points: i) NO and NO2 are 77% positively correlated to each other, with 27% positively correlated to PM10-2.5, and negatively correlated to wind speed by 53%, ii) O3 is 50% positively correlated to wind speed, 77% negatively correlated to NO and NO2, and 27% negatively correlated to PM10-2.5, iii) Humidity is 27% positively correlated to NO and NO2 and 50% negatively correlated to O3, iv) wind speed is 27% negatively correlated to PM10-2.5, and v) PM10, and PM2.5 are positively correlated to each other with more than 87%. Moreover, the developed seasonality analysis kit is used to provide interactive selections of considered meteorological and pollution parameters to analyse the concurred pattern in the dataset, in a time based frame over the years. Currently, the designed dashboard is in it's first phase with color based clustering display for each quarter over the years. This has helped in making the seasonality analyses tests easy, user interactive and comparable in the time domain.

CONCLUSION
Meteorological data have been the attention of the researchers of the smart city planning for thorough utilisation and management of resources, which help in effective government management, convenient public services, and sustainable industrial development. Using renewable energy supply would provide a healthy and amiable city, and increased welfare in more general terms.
To ensure incorporation into the planning process, the renovation of the existing planning is indeed the most promising field for climate-related intervention. From a designer's perspective, the authors have stressed the need to include energy-conscious strategies to improve environmental quality. The integration of new knowledge, innovative technologies in sustainable transformation is the motive of this paper. The interpretations (above analyses conclusions) provide a quick facts-crosscheck supporting the present alarming air quality situation in the city and requirement of probable more control measures. The interactive dashboard seasonality analysis kit of meteorological and pollution parameters would help to plan the future with more green policies. Designed dashboard in this work could be further improved with the ensemble of more parameters with more historical data set. The future focus for the authors would be to improve the analysis and utilising the outputs on interactive visual analysis dashboard web applications. Moreover, use of other methods like decision trees, neural networks and association rules, would be explored in subsequent research works for in-depth understanding of temporal relationships amongst the parameters. Meanwhile, the devised seasonality analysis of meteorological and pollution parameters over the years has the potential for selecting better government supported green policies and creating environmental awareness among humankind, and moreover, provide a foreknowledge for better city planning.