THE INFLUENCE OF LEVELS OF DETAIL ( LOD 0-2 ) AND BUFFER SIZES ON PARAMETER EFFECTIVENESS FOR FINE DUST DISTRIBUTION MODELLING

Building models represented in CityGML Level of Detail 0 to 2 were used to calculate urban morphological parameters to test their effectiveness of correlation with measured total number concentration of fine dust in Berlin. Land use regression modelling as an alternative to physical based models explains the distribution of urban fine and ultrafine particles applying a multi linear regression model. Descriptive parameters are identified by high correlations with measured fine dust values. Here, different height information and geometry representations from LoD0-LoD2 were used to calculate six parameters associated with the ventilation and advection capacity of an urban environment (‘averaged heights of buildings’, ‘height-width ratio’, ‘porosity’, ‘frontal area index’, and ‘building surfaces’ for wall surfaces and for wall and roof surfaces). Parameters were correlated with measurements of the total number concentration of fine dust in the city of Berlin. Initial results show ambivalent correlations for both, different buffer sizes and implementation of the parameters with building representations in different levels of detail. * Corresponding author


INTRODUCTION
Exposure to traffic generated fine and ultrafine particles (UFP) has been proven to significantly increase health risks in urban areas (WHO, 2013).Daily UFP concentrations are associated with morbidity, cardiovascular diseases, hospital admissions, and respiratory symptoms (Stolzel et al., 2007).This subject was recently brought to the focus of public attention by the socalled Volkswagen emissions scandal (germ.: Dieselaffäre).UFP mitigation measures range from vehicle software updates to comprehensive traffic bans that have been demanded by the Deutsche Umwelthilfe (DUH -German Environmental Relief).The latter, however, would have tremendous negative economic impacts on diesel vehicle owners and the automobile industry.Therefore, locally differentiated optimization strategies for the reduction of emissions are necessary calling for a fast modelling of fine dust distributions.Land Use Regression Models (LUR) for the distribution modelling of fine and ultrafine particle distribution offer a practicable alternative to scientific, physical based aerosol models.While the latter need complex input data and reasonable computing time, LUR are multi linear regression models that use spatial explanatory parameters to calculate pollutant concentrations at specific locations (Hoek et al., 2008;Mercer et al., 2011).They may explain the small-scale variation of air pollutants equally well as dispersion models (Marshall et al., 2008;Beelen et al., 2010).With the advantage of limited needs for explanatory parameters and a short computing time, LUR is also able to explain the spatial distributions of urban UFP concentrations for entire cities (Wolf et al., 2017;Abernethy et al., 2013).On the other hand one may analyze intra-urban variation of UFP concentrations (Henderson et al., 2007) or variation in particle number size distributions (Ghassoun et al., 2015b).LUR model output quality strongly depends on the relationship of each explanatory variable with the response variable, i.e. the UFP concentration at a certain point.Different 2D, 3D, semantic, and wind parameters were used to develop LUR models (e.g.Ghassoun et al. 2015a, Ghassoun andLöwner, 2017a;Shi et al., 2017).The spatial parameters are extracted using different buffer sizes around each site (Hoek et al., 2008;Vienneau et al., 2010).Ghassoun et al. (2015b) presented a further improved LUR approach.They decomposed the analysis of particle concentrations into a 'process chain' of particle emission, dilution and deposition.Therefore, they classified the most relevant explanatory parameters into the specific process domain.The emission process is related to the emission of UFP, e.g. by traffic, while the deposition process represents removal of UFP on respective areas like green areas or the area of wall surfaces within a buffer.The process of dilution is related to the ventilation capacity of a city determined by the interaction of wind direction, wind speed, and urban morphology (Hang et al., 2009;Edussuriya et al., 2014).These interactions will lead to specific spatial distribution patterns of urban UFP concentrations.Dilution parameters, next to explanatory variables of emission, seem to be the most important descriptive parameters for the development of LUR models in an urban area (Ghassoun et al, submitted).Therefore, urban morphological parameters, i.e. geometry should be explored in greater depth to determine the effectiveness of descriptive parameters including their spatial reach and their parameterization.However, parameterization of a specific descriptive concept may be performed in different ways.These include first, dimension and second, the level of detail (LoD) of analyzed geometries, i.e. ground surface representations, blocks models or building models with modelled roof structures.Therefore, a data structure representing an urban area in different LoDs has to be utilized for such an investigation.The City Geography Markup Language (CityGML) is an open application model for the representation, storage, and exchange of virtual 3D city models.It has been published as an Open Geospatial Consortium (OGC) encoding standard as version 2.0 in March 2012 (Gröger et al. 2012).CityGML is implemented as an application schema of the extensible Geography Markup Language (GML 3.1.1)(Cox et al. 2004).Until now, CityGML has been used in a wide range of applications (Biljecki et al., 2015;Löwner et al. 2013b), e.g.energy estimation (Kaden and Kolbe, 2013;Bruse et al. 2016) or fine dust distribution modelling (Ghassoun and Löwner, 2017).A main quality of CityGML is its Level of Detail concept (LoD) offering the possibility to generalize CityGML features from very detailed description (LoD4) including indoor features to a less detailed representation of a building by a planar ground surface (LoD0).It is well described in Löwner et al. (2013a).The most important changes are an increase of dimension from two to three from LoD0 (ground surface representation) to LoD1 (blocks model) and the change of shape and, therefore, surface area of a building from LoD1 to LoD2 and LoD3, respectively.As well as the parametrization of descriptive parameters, these specific changes in the representation of buildings could directly influence the correlation between measurements and descriptive parameters.However, the amount of influence from LoD representations on the result of fine dust distribution modelling has never been investigated.The same accounts for the change of influence of a certain parameter within different buffer sizes.
Here we present first analysis results of the influence of the level of detail of CityGML building representations on selected parameters in different buffer sizes for the modelling of fine dust distribution in urban environments.

Data basis
Two kinds of datasets, a measurement dataset of fine and ultrafine particle concentration and a CityGML2.03D-City model were used to analyse the influence of the Level of Detail and the buffer size on fine dust distribution modelling.
Figure 1.Research area in the middle of Berlin.Black dots indicate measurement sites and buffer centres, respectively (adopted from Ghassoun and Löwner (2017b)).
Mobile measurements of particulate air pollutants were conducted on 25 sites in an area of 1×2 km in the City of Berlin, almost characterized by office buildings and residential area (rf.Fig. 1).At each site, the average of the total number concentration was calculated over 1 minute.Measurements were carried out during six campaigns in winter (January 2015) during stable weather conditions without rain and wind speed below < 4 ms -1 (Ghassoun and Löwner, 2017b).Total number concentrations (TNC) were measured with a hand-held particle counter device (TSI 3007) detecting particles ranging in size from 10 to about 1000 nm with a range between particle per cm 3 0 to 100,000.
For the research area, LoD1 and LoD2 CityGML data was acquired from the open data portal of the city of Berlin (https://fbinter.stadt-berlin.de/fb/index.jsp)to analyse the descriptive variables.LoD0 representation was extracted from LoD1 data.

General data analysis
Data extraction was performed directly from CityGML files on pure Python scripting with no interconnected database or spatial packages but applying the xml.etree package.Parameter calculation was performed either in ArcGIS for LoD0 and LoD1 representations or with self-programmed Python Scripts for LoD1 and LoD2 representations.Different height information was gathered to reconstruct LoD1 building representations from ground surfaces.These are first, the 'measured height' taken from the CityGML LoD1 dataset (LoD1_meas) representing the highest point of the real world building, second, the 'envelope height', calculated from the building's LoD1 envelope by subtracting the z-value of the lower corner from the z-value of the upper corner (LoD1_env).
According to metadata, the latter is expected to represent the median of photogrammetrically derived height points of the building's roof structure.Third, a 'LoD2 calculated' value from the LoD2 dataset (LoD2_calc) was calculated by identifying the highest and lowest z-value of all related surfaces of a building and subtracting them.Although these different height values are expected to be highly correlated, all of them had been analysed if reasonable.
Except the height information analysis in sec.3.1 and the 'frontal area index' in sec.2.3.4,all parameters were analysed in buffer sizes of 50 m and 100 m-1000 m (steps of 100 m) and, correlated with measurements of the total number concentration from Ghassoun and Löwner (2017b).Hereby measurement sites represented the buffer centres.

'Averaged heights of buildings':
'Averaged height of buildings' near a measurement site are used as a simple description for the dilution capacity of an area.Averaged heights coming from different height information (LoD1_meas, Lod1_env, and LoD2_calc) of all buildings in a respective buffer were correlated with the measured TNC values in the research area.

'Height-width ratio':
The 'height-width ratio' represents a further descriptive value for the ventilation potential of a street canyon (Oke, 1987).It is the averaged quotient of the heights of buildings enclosing a respective street and the width of this street for all buildings within a buffer.Following Ghassoun et al. (submitted), this parameter is strongly dependent on the wind direction (east, here) that influences the width of the street measurement.
The calculation of the height-width ratio was performed for all buildings within a respective buffer using a fishnet (parallel lines within the buffer) with an resolution of 2m.The fishnet was deployed parallel to the wind direction and intersected it with the corresponding buildings.Calculation was performed in ArcGIS for the LoD1 representation of buildings with different height information (LoD1_meas, Lod1_env, and LoD2_calc).2.3.3 'Porosity': 'Porosity' can be viewed as a measure of how penetrable the area is for the airflow (Gàl and Unger, 2009) in a certain buffer.It represents the relation between the penetrable and impenetrable parts of an urban environment.
For 2D data, it is calculated from the total area of ground surfaces within a corresponding buffer divided by the area of this buffer.
In 3D space a 3-dimensional buffer was calculated with the height of the highest building within the 2D buffer.'Porosity' then was calculated as the quotient of the cumulated building volume and the volume of the 3D buffer (rf.Fig. 2).'Porosity' was calculated for different buffer sizes for LoD1 data, only.Different height measures were used to calculate building volumes and the height of the 3D buffer (i.e.LoD1_meas, LoD1_env and LoD2_calc).
Figure 2. 3D buffer and corresponding buildings for the calculation of the 'porosity'.

'Frontal area index':
'Frontal area index' (also called frontal area density) is a descriptive concept to describe the effect of buildings shielding the wind.Therefore, it can be utilized as descriptive parameters for the dilution capacity of an urban area.The higher it is the less the wind is expected to pass through the buffer's area.As a result, less fine dust is expected to be removed by the wind.It was calculated in 2D as well as in 3D using LoD1 with different height measures (i.e.LoD1_meas, LoD1_env, and LoD2_calc) and LoD2 data.
In 2D (i.e. for LoD0 data) a 'frontal area surface' represents the sum of all non-overlapping projection lines of all buildings within a buffer.Projection is performed to a line perpendicular to the wind direction with a length of the respective buffer (rf. Fig 3).'Frontal area index' was then calculated as the quotient of this line with the respective buffer diameter.
In 3D the 'frontal area surface' represents the non-overlapping projected area of building surfaces, which are exposed perpendicular to the wind direction related to a vertical surface perpendicular to the wind direction with the width of a given buffer and the height of the highest building within this buffer (rf.Fig. 4).The 'frontal area index', then is the same projected area of buildings related to the buffer area (Grimmond et al., 1999;Burian and Ching, 2009).
Figure 5. 'Frontal area index' calculation within a buffer using LoD1 data.
For LoD1 data, an Urban Roughness Toolbox developed by René Burghardt that is designed for ArcGIS 10.5 in 2018 was used to estimate the total frontal area in the projected plane normal to the wind direction within the buffer.The program generated parallel lines to the wind direction with 2m increment, horizontally.Only the frontal areas that intersected with the line first were calculated.Hence, blocking areas perpendicular to the wind direction on blocked building did not account for the calculations (rf.Fig. 5).Finally, the 'frontal area index' was calculated by dividing the frontal area by the buffer area.Again, different height measures (LoD1_meas, LoD1_env and LoD2_calc) were used to generate LoD1 blocks models.
For the LoD2 dataset, a 3-dimensional point array was generated with a resolution of 1 meter to analyse the 'frontal area' and the 'frontal area index'.Width was taken from the buffer size, height from the spread of the respectively lowest and highest z-coordinate of all buildings within the buffer.Finally, the point array was moved to the edge of the buffer parallel to the wind direction (rf.Fig. 4).Shadowed points were identified by applying a brute-force algorithm, identifying all buildings within the specific buffer, getting all the surfaces from those buildings and them into triangles applying a fan triangulation.After that, every point in the point array was completed with a direction vector towards the wind direction.
The revealing half ray line was computed against every triangle applying the Möller-Trumbore intersection algorithm (Möller and Trumbore, 1997).

'Building surfaces':
Since fine dust particles are expected to adhere to surfaces, 'building surfaces' as other surfaces are regarded as sinks for fine dust and, therefore, serve as a parameter for the deposition process.Surfaces were calculated for 3D data only.Distinction was made between the calculation of all wall surfaces and the total surface area as the sum of all wall and roof surfaces within a buffer.For LoD1 data, 'building surfaces' were calculated for all height measures (LoD1_meas, LoD1_env, and LoD1_calc).The wall surfaces calculations were simply performed by multiplying the perimeter of the buildings with the respective heights.The footprint area of the buildings was added as roof surfaces to calculate the total surface area of the buildings.For LoD2 data, all IDs of buildings within corresponding buffers were identified.For the selected buildings wall resp.roof surfaces were collected by tag identification.Area of polygons was calculated applying the Stoke's theorem and using an implementation of (Bull, 2012).However, some of the polygons did not enter the cumulated area calculation because of errors when calculating its determinants (i.e.NaN results caused by a division by zero).Apparently corrupted polygons were not inspected any further.

Comparison of different height information
4132 buildings that occur in the research area were analysed concerning their measured height (LoD1_meas), their envelope height (Lod1_env) taken from the LoD1 dataset, and their calculated height (LoD2_calc) taken from LoD2 dataset.Correlation of height information was relatively high (rf.Correlation of LoD1_env and LoD2_calc revealed the highest value of 0.83 (rf.Fig. 7).Again, higher values of the LoD2_calc height information was interpreted as an effect of protruding building parts not represented in the LoD1 dataset (rf.Correlation of LoD1_meas and LoD2_calc revealed a high value of 0.785 (rf.Fig. 8).However, according to meta data description the measured height in LoD1 represents the highest point of a building's roof construction.Deviations of measured height (LoD1) and analysed height (LoD2) imply more an error in geometry reconstruction than a weakness of LoD2 representation possibilities.

'Averaged height of buildings' calculated from LoD1
'Averaged height of buildings' coming from different height information (LoD1_meas, Lod1_env, and LoD2_calc) of all buildings in a respective buffer are correlated with the measured TNC values in the research area (rf.Fig. 9).Results show that the average height values from the building LoD1 envelope have the highest correlation with the measured TNC values.That means that high buildings support higher TNC values in a specific buffer.Highest correlations of about 0.46 was detected in buffer of 400-600 m in size.However, averaged building sizes derived from LoD1 measured height and from LoD2 calculated height show negative to weak correlations.As the two latter height values are expected to be dominated by protruding building parts, main parts of a building and its respective heights seem to be more relevant.

'Height-width ratio' calculated from LoD1
A more effective parameter for the empirical modelling of fine dust distribution is the 'height-width ratio' (Ghassoun and Löwner, 2017b), reflecting the proportion of the width of a street and the height of its surrounding buildings.The 'heightwidth ratio' is also referred to as the street canyon.

'Porosity' calculated from LoD0 and LoD1
'Porosity' as a value of how penetrable for the wind an urban area is was expected to have a negative correlation with high fine dust values.Figure 11 depicts the correlation coefficients of 'porosity' values calculated first, with one LoD0 representation and, second, with LoD1 representations using different height measures.LoD0 representation seemed to be best performing at a buffer width of 700 m with a correlation of 0.322.However, this representation showed positive correlation on buffers of 300 m-600 m in diameter and of 900 m-1000 m in diameter.'Porosity' calculated for LoD1 blocks models applying the measured height (LoD1_meas) and the calculated height from LoD2 (LoD2_calc) data showed comparable results with a high negative correlation in buffer 200 m and 400 m with a maximal negative correlation of 0.435.However, correlation of calculations of LoD1 buildings created from the envelope height measure (LoD1_env) revealed different behaviour with a maximum negative correlation of -0.485 in buffer 1000 m.Again, missing building parts may be responsible for this different behaviour but seem to be compensated in wider buffers.

'Frontal area index' calculated from LoD0, LoD1 and LoD2
The 'frontal area index' was evaluated as a very effective parameter in Ghassoun et al. (submitted).In this context it is important to note that negative correlations are the anticipated results.Because height values for the 'frontal area index' represent wind protection, less dilution and therefore, higher fine dust value are expected.However, for the smallest buffer of 50 m, an LoD0 parameterization of the 'frontal area index' seems to be reasonable.
Correlations of FAILoD1_meas and FAI_LoD2_calc generated blocks models with measured TNC values in different buffers revealed similar behaviour.The best (negative) correlation for this height information was -0.56 for the FAILoD1_meas in the 300 m buffer and -0.49 for the FAILoD2_calc, both in the 300 m buffers.Remarkable low negative correlations were observed for the LoD1 blocks models with envelope heights.
Missing protruding building parts seem to have a major influence on the results.Correlations of the 'frontal area index' calculated with the LoD2 geometry information (FAILoD2_real) showed lower performance than expected.They reached their highest negative correlation in the 300 m buffer with a correlation coefficient of -0.32.Due to the simple algorithm applied, computing time was very high.As a result, only buffers with a diameter of up to 500 m were analysed, here.However, bad performance of the FAI with the best geometry information calls for further investigation on this parameter at all.

'Building surfaces' calculated from LoD1 and LoD2
'Building surfaces' were calculated according to sec.2.3.5 for buffer sizes of 50 m and 100 m-1000 m (steps of 100 m).
Correlations of measurements and respective building surface were performed and synoptically plotted in Figure 13.Surface area and fine dust concentration are expected to be correlated in a negative way. Figure 13 depicts correlation coefficients of different wall surface calculations (i.e.extruded ground surfaces with different height information and the calculated sum of all wall surfaces from the LoD2 dataset) with the measured total number concentration of fine dust.Correlation of surface calculations using LoD1 data with the measured height (LoD1_meas) and the envelope height (LoD1_env) reveal high positive values at the 100 m buffer and the 700 m buffer.At all, these two curves show similar behaviour.However, wall surface calculation from extruded ground surfaces with the calculated height from LoD2 (LoD2_calc) shows better (negative) correlations with the TNC measurements.This is, however not the case for buffer sizes of 700 m.At this point correlation coefficient reaches nearly 0.5 like the other two wall surface calculation values.Correlation coefficients of the WallSurfeceLoD2_real values with the TNC values, however, followed the curve of the correlation with the WallSurfaceLoD2_culc values but revealed not such a peak correlation at buffer sizes of 300 m and 400 m.This finding may be explained by the existence of apparently corrupt surfaces in the LoD2 dataset (rf.sec.2.3.5).
All visible enveloping surface, i.e. wall and roof surfaces were calculated and correlated with the measured TNC data for different buffer sizes (rf.Fig. 14).Although being slightly different in total numbers, courses of the curves are comparative to those of the wall surface calculations.A remarkable negative peak and therefore a good performance of this parameter can be observed for the correlation of the AllSurfaceLoDs_real calculation.However, absolute correlation is slightly lower including roof surface calculations.That reflects that adhesion of fine dust takes place at street level altitudes.Addition of roof surfaces seems to blur the correlation.At all, since the existence of surfaces is expected to lower the TNC values by adhesion, it also stands for the existence of buildings serving as obstacles for the wind.Comparing those two parameters, blocking the wind, expressed by the parameters of 'porosity' and 'frontal area index' seemed to be the more effective parameters.

DISCUSSION
Different height information and geometry representations from LoD0-LoD2 were used to calculate six parameters that are used in empirical fine dust distribution modelling applying the Land use regression method.These were the 'averaged heights of buildings', the 'height-width ratio', the 'porosity', the 'frontal area index', as well as 'building surfaces' for wall surfaces and for wall and roof surfaces.All parameters were calculated applying different height measures on LoD1 ground surfaces.The 'frontal area index' and the 'porosity' were additionally calculated for LoD0 data representation.Ancillary calculations of the 'frontal area index' and both building surface calculations from an LoD2 dataset were performed and analysed.
No clear conclusion can be drawn concerning a dimension or level of detail representation for 3D data that reveal constantly height, either negative or positive, correlations with a dataset of measured total number concentrations of fine and ultrafine particles in Berlin.While on the one hand the LoD1 representation with calculated height information of the LoD2 dataset performed very well for the parameter of wall surfaces in the 200 m and 300 m buffer, on the other hand, the parameter of all, the wall and the roof surfaces from the LoD2 dataset showed very good results for the 700 m buffer.The same accounts for different buffer sizes.The LoD0 representation of the 'frontal area index' showed a high negative correlation in the 50 m buffer while LoD1 representation with different height information exhibited better results in larger buffers.
However, these preliminary results show that in some cases lower dimensional representations of parameters may serve as good as parameters derived from data of higher dimension, i.e. one 2D vs. 3D or higher level of detail (LoD0 vs. LoD1 or LoD2).That is somehow promising when searching for the most effective way to calculate parameters for now casting in real time.
In general, it has to be noted that strength and foresing of correlations of discussed parameters and measured TNC values do not depend on the way of parameterization, dimension or geometrical representation of input data, only.Still, we are dealing with a highly complex system of fine dust emission, dilution and deposition that also includes physical or chemical conversion and further on that geostatistical analysis seeks to explain.Therefore, analyses carried out serve rather as operation procedures for practitioners in the field of fine dust distribution modelling than to evaluate the quality of geometric references of 3D city models (r.f.Biljecki et al., 2016).However, more investigation needs to be performed to confirm our preliminary results.Work has to be extended concerning first, more parameters since only few relevant for the fine dust modelling approach could be tested here.Second, more urban environmental related measurements like temperature or noise should be investigated.Third, more variability concerning the urban morphology should be taken into account.The latter refers to a limited research area that does not exhibit all the variation of a full city.

Figure 3 .
Figure 3. 'Frontal area surface' calculations with the green lines representing building surfaces blocking the wind.The red line represents the projected frontal area surface of the buildings considered in the calculation of the 'frontal area index'.The dashed green line represents the unblocked part of the buffer diameter.

Figure 9 .
Figure 9. Correlation coefficients of the 'averaged height of building' from LoD1 with different measures and measured TNC values in different buffer sizes.

Figure 10
Figure 10 depicts the correlation coefficients of the 'heightwidth ratio' derived from different height information with the measured TNC values for different buffer sizes.It can clearly be seen that correlation coefficients are higher in comparison to the average height of buildings (sec.3.2) reaching their peak of 0.658 at a buffer size of 700 m for the LoD2_calc height information.That reflects the fact that a narrow street canyon leads to decreased ventilation capacity.Next to the different correlation coefficient in different buffer sizes, different height information seems not to influence this positive correlation.

Figure 10 .
Figure 10.Correlation coefficients of the 'height-width ratio' from LoD1 with different measures and measured TNC values in different buffer sizes.

Figure 11 .
Figure 11.Correlation coefficients of the 'porosity' from LoD0 and LoD1 with different measures and measured TNC values in different buffer sizes.

Figure 12 .
Figure 12.Correlation coefficient of the 'frontal area index' from LoD0 (FAILoD0), LoD1 with different height measures (FAILoD1_meas, FAILoD1_env, FAILoD2_calc) and LoD2 (FAILoD2_real) with measured TNC values in different buffer sizes.The FAILoD0 revealed high negative correlation of -0.45 for the 50 m buffer but increased in the 100 m buffer and continued to show bad correlations in the greater buffers (rf.Fig 12).However, for the smallest buffer of 50 m, an LoD0 parameterization of the 'frontal area index' seems to be reasonable.Correlations of FAILoD1_meas and FAI_LoD2_calc generated blocks models with measured TNC values in different buffers revealed similar behaviour.The best (negative) correlation for this height information was -0.56 for the FAILoD1_meas in the 300 m buffer and -0.49 for the FAILoD2_calc, both in the 300 m buffers.Remarkable low negative correlations were observed for the LoD1 blocks models with envelope heights.Missing protruding building parts seem to have a major influence on the results.Correlations of the 'frontal area index' calculated with the LoD2 geometry information (FAILoD2_real) showed lower performance than expected.They reached their highest negative correlation in the 300 m buffer with a correlation coefficient of -0.32.Due to the simple algorithm applied, computing time was

Figure 13 .
Figure 13.Correlation coefficient of different wall surface calculations with the measured TNC values in different buffer sizes.

Figure 14 .
Figure 14.Correlation coefficient of different wall and roof surface calculations with the measured TNC values in different buffer sizes.

Table 1 .
Correlation of three different height measures from LoD1 and LoD2 datasets for 4132 buildings within the research area.