EXPLORATION OF OPEN DATA IN SOUTHEAST ASIA TO GENERATE 3D BUILDING MODELS

: This article investigates the current status of generating 3D building models across 11 countries in Southeast Asia from publicly available data, primarily volunteered geoinformation (OpenStreetMap). The following countries are analysed: Brunei, Cambodia, Indonesia, Laos, Malaysia, Myanmar, Philippines, Singapore, Thailand, Timor-Leste, and Vietnam. This cross-country study includes multiple spatial levels of analysis: country, town, and micro-level (smaller neighbourhood). The main ﬁnding is that authoritative data to generate 3D building models is almost non-existent while building completeness in OpenStreetMap is highly heterogeneous, yielding location-dependent conclusions. While in general just a fraction of mapped buildings has height information and none of the administrative areas provides sufﬁcient information to generate 3D building models, on a micro-level some areas are fully complete, providing a high potential to generate 3D building models on a precinct scale, which may be useful for certain spatial analyses. Furthermore, some areas have high building completeness, requiring only half of the work necessary for the extrusion: the collection of building height attributes. As a part of this work, a semantic 3D building model of a selected set of buildings in Singapore has been generated and released as open data (CityJSON), and the developed code was open-sourced.


INTRODUCTION
3D city models can be derived using a variety of acquisition techniques involving different stakeholders, under open or commercial arrangements. Much of the work done in the 3D GIS research community relies on the availability of open data. Besides the open data 3D city models released by local authorities (Schrotter and Hürzeler, 2020;Lehner and Dorffner, 2020), an increasing number of 3D city models are derived by conflating open datasets (Stoter et al., 2020;Kolbe et al., 2015). Such datasets originate from authorities and volunteered geoinformation, and have proven to be valuable in various spatial analyses (Saretta et al., 2020;Biljecki et al., 2016).
The developments involving open 3D data are mostly focused on Europe. The goal of this paper is to investigate the state of open building data in Southeast Asia to understand the current potential of generating 3D building models from free and public sources. While there are 3D city modelling developments across several countries in the region (Dissegna et al., 2019;Lagahit and Blanco, 2019;Aditya and Laksono, 2017;Stouffs et al., 2018;Ujang et al., 2018;Anh et al., 2018;Laksono and Aditya, 2019), rarely any work is based entirely on open data.
As there is a lack of 3D city models openly released by local authorities in the region, the focus of the paper is understanding currently open building and urban data that may aid the generation of 3D building models. Therefore, the research question that I seek to answer in this paper is: what is the current potential of generating 3D building models from open data in Southeast Asia? Section 2 briefly overviews different approaches to generating 3D building models with a focus on open data. In Section 3, * Corresponding author at filip@nus.edu.sg the method of this research is described. Most of the work examines the regional potential of OpenStreetMap as the main source of building data to generate 3D building models. As the potential of generating 3D data in the study area turns out to be highly heterogeneous, the paper turns on understanding the completeness of data and patterns on different levels that could signify the potential (Section 4). This research includes some new aspects at looking and understanding the completeness of features in volunteered geoinformation. The analysis covers a vast scale (11 countries), resulting in large-scale insights and a cross-country comparison. The results and limitations are discussed in Section 5. Contributing to the availability of 3D building models in the region, the work also generates a 3D building dataset in Singapore (Section 6).

BACKGROUND
One of the most popular methods to generate 3D building models is extrusion (Ledoux and Meijers, 2011), which requires building footprints and information on the building height. Building footprints are often sourced from open government datasets or OpenStreetMap, while the building height may be derived from analysing airborne lidar data, among other options. Many -including country-wide -3D city models have been generated entirely from open data combining different sources of building information and geospatial data .
While building footprints are becoming increasingly available around the world, and many such datasets are open, airborne lidar datasets are not so widespread . Thus, a portion of 3D models has been generated by combining different attributes that indicate the height of the building, e.g. the number of floors has been used as a proxy for the building's height (Cheng et al., 2020). Figure 1. Example of editing the geometry and attributes (e.g. levels) in OpenStreetMap for a building in Telisai, Brunei. The surrounding of the building shows that the building completeness in this area is partial, which will be a topic of the later sections of the paper. Imagery: (c) Bing 2020 Microsoft Corporation, (c) 2020 DigitalGlobe, (c) CNES (2020) Distribution Airbus DS.
A popular source of building data is OpenStreetMap (OSM), and it has been deemed promising for generating 3D city models (Goetz and Zipf, 2012;Youssef et al., 2020). Besides providing building footprints that can be coupled with remotely sensed data (Bagheri et al., 2019), OSM allows capturing information on the vertical extent of the building (Figure 1), also enabling extrusion and serving as a complete 3D source on its own (Hadimlioglu and King, 2019;Over et al., 2010). There are two key pieces of information that can be associated with a building footprint using the OSM's tag-value pair system: number of floors (building:levels) and height (height). The first one is intended to describe the number of storeys above the ground, while the latter is designated to include the actual building height in metres above ground.
While building completeness in OSM is very high in some parts of the world, and it can be considered to be on a par with government datasets, the main obstacle is the availability of building heights. For example, the research of Fan et al. (2014) shows that while building completeness reached nearly 100% in Germany, attributes on the height or number of storeys are filled for less than 1% of features. Recent research such as theirs is uncommon, so when replicating research on completeness for a particular country or region at this point, it is difficult to gauge whether it fares better than in other parts of the world. Completeness of building attributes must have somewhat improved since then, but a quick inspection in random places around the world reveals that it still lacks for the majority of buildings.

METHOD
This research first investigates the availability of building and topographic data from the authorities in the region, which may be useful to derive 3D building models. Afterwards, it focuses on OpenStreetMap and it provides a statistical analysis to understand the potential for generating 3D city models, and patterns in availability of data.

Study area: 11 countries in Southeast Asia
The focus of this research is Southeast Asia, which accounts for 8.5% of the global population, and about 3% of the earth's land area. The eleven countries investigated include those that are members of the Association of Southeast Asian Nations (ASEAN): Brunei, Cambodia, Indonesia, Laos, Malaysia, Myanmar, Philippines, Singapore, Thailand, and Vietnam; and Timor-Leste which is not a member of ASEAN but it is geographically in Southeast Asia. The region is quite diverse, as the covered states have substantial economic, geographical, political, and cultural differences, presenting an interesting selection for a research such as this one.

Authoritative data
An investigation into open building data in the 11 countries did not yield comprehensive results. Not a single country was found to release open data on buildings covering their entire territory. Some building data sources were found, e.g. on the Open Data Portal of the Singapore Government 1 , but including only a subset of buildings, such as public housing buildings.
Searching for open lidar datasets was similarly fruitless. The OpenTopography 2 portal reveals that there is no open lidar data in Southeast Asia. An extended search unveils that there is a dataset covering the Philippines 3 , but it requires submitting a data request, rather than allowing a direct unrestricted download, which would be considered as open data.
Arguably, with almost each country having a different official language, the language barrier provides a challenge in searching for the data. However, a literature review reveals that research that uses building data in the region either relied on OSM, closed cadastral sources, or own collection of data.

Analysing OpenStreetMap data
Because of the lack of authoritative data, but also owing to its cross-country reach, this research mostly focuses on OSM. The research looks into the completeness of building levels and height attributes, their relation, and their spatial variation.
It should be mentioned that most of the related research assessing the quality of the building data is based on validating the data against authoritative data and are focused on a single country (Brovelli and Zamboni, 2018;Senaratne et al., 2017), which in this case is not possible nor feasible. Also, it should be noted that besides the completeness of the buildings, the correctness of the attributes is as well out of scope, as it would require an immense cross-country effort comparing it against other datasets, that in many cases would not be available (e.g. Google Street View).
A copy of the OSM database was downloaded from Planet OSM 4 , and it was loaded in a local PostgreSQL/PostGIS database using osm2pgsql 5 . The world was partitioned in cells (tiles) of the size of about 1 square km each (as inspired by the Worldpop project (Lloyd et al., 2019)). For each tile, the number of buildings was counted and the availability of the height attributes was noted, enabling an analysis ranging from regional and country level down to the micro-level of a neighbourhood. The eleven countries in Southeast Asia are covered by 358k such squares with at least one building, containing almost 38M buildings. To enable a cross-city and cross-country comparison, for each tile the name of the corresponding administrative unit was derived (thanks to the Global Administrative Areas dataset 6 ). The analysis will be made in three steps: country, town, and tile level (Section 4), and it will be discussed together in Section 5.
An example of this division is shown in Figure 2. The square in this particular example in Myanmar contains about 900 buildings, and it is highly complete in terms of building features (the completeness was checked against a recent satellite image).

UNDERSTANDING OPENSTREETMAP BUILDING
CONTENT IN SOUTHEAST ASIA

Country-level
The first layer of the results is at the country level (Table 1). These results reveal that, while dozens of millions of buildings are mapped, the completeness of their attributes that can be used for extrusion is tiny (albeit this not necessarily lower than in other parts of the world), but also that there is a notable difference between the countries, suggesting highly heterogeneous levels of OSM quality across countries in Southeast Asia. The analysis also exposes that the tendency to store building heights is much lower than building levels. This is not surprising, as mappers may find it much easier to record the number of storeys of buildings rather than measuring their heights.
Before proceeding further with the analysis, an attempt is made to understand the relation between the two attributes. For each cell, the percentage of the completeness of building levels and heights were plotted against each other, for each country separately ( Figure 3). To avoid bias and small non-representative instances, only tiles with more than 20 buildings have been taken into account.
While completeness of the data on building heights and levels remains low on the country level, and some countries have few or no tiles with high completeness of heights/levels, the scatter plots indicate many interesting insights: (i) in some countries there are many tiles with near 100% completeness of either building levels or building height; (ii) in some cases the completeness between the two attributes is quite different (e.g. high building level completeness with an almost-zero availability of building heights); (iii) a number of tiles have an equal level of completeness between the two; but (iv) the comparable levels are often an exception rather than a rule, and this depends on the country: the correlation coefficients vary significantly between them (from 0.06 in Myanmar to 0.35 in Singapore).

City-level
In the next level, towns are analysed. First, for each country, the largest city by population ( Figure 4) the building level completeness was analysed and mapped. The largest city was considered instead of the capital (e.g. in Myanmar, Yangon is not the capital, but it is the most important commercial centre and the largest city by population). While none of the largest cities is fully covered with building levels data, this insight also reveals that there is a high variation of completeness within each town, and that there are substantial differences between the cities (this is in line with the general completeness by country). The high heterogeneity is particularly apparent: some parts of a city have zero availability of height attributes, while others have pockets with near 100% completeness. There is no particular pattern that was noticed in understanding why the completeness of some areas is high, so it may be highly dependent on the mappers in the area.
ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume VI-4/W1-2020, 2020 3rd BIM/GIS Integration Workshop and 15th 3D GeoInfo Conference, 7-11 September 2020, London, UK  Second, to understand the availability of height attributes in cities, a scatter plot similar to the one in Figure 3, was generated: Figure 5 shows 1382 settlements (only those that contain more than 1000 buildings are included) suggesting overall completeness on the local administrative level. The plot reveals that virtually all cities have much more buildings without height attributes than with them, and that even outliers have 30-40% completeness at best. None of these is a major city. While this plot Figure 5. Understanding height/level attributes' completeness on town level, and identifying the settlements with the highest rate.
also echoes the fact that building levels are available at a much higher frequency than heights, there are some settlements with completeness of building height higher than the building levels.
It is worthwhile to have a closer look at the two towns that stand out.
Kediri is an Indonesian city with a population of 250k, located on the island of Java ( Figure 6). The town contains a small district thoroughly mapped with a few thousands of buildings, with an almost 100% completeness of both the levels and heights. A few steps away, in the same city, there is almost zero completeness of buildings, and those that are mapped have no information on the building height (resulting in average attribute completeness of slightly less than half at the city level, as indicated in Figure 5). This example suggests that it may be much more meaningful to look into the data on the tile-level, disregarding administrative boundaries, and indicates that completeness is concentrated into clusters covering smaller extents.
The other example is Phan Thiet, a city with 335k residents on the southeast coast in Vietnam. The city is covered by 65 tiles, with more than half of them having higher than 90% completeness of height attributes. Such ratio signals a spatial extent with high-quality data. However, a closer inspection reveals that ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume VI-4/W1-2020, 2020 3rd BIM/GIS Integration Workshop and 15th 3D GeoInfo Conference, 7-11 September 2020, London, UK Figure 6. Highly heterogeneous mapping completeness in the town Kediri, Indonesia, suggesting that this research is more meaningful on the neighbourhood-scale.
the city, with only 2000 buildings mapped, suffers from poor building completeness, meaning that those buildings that are mapped are carefully modelled including attributes. This example reveals the main limitation of this study: understanding the potential of generating 3D building models is intertwined with building completeness, which due to the effort required is out of scope of this preliminary paper. This shortcoming will be discussed in Section 5.

Tile-level
The goal of this section is to understand the height attribute completeness on the micro-scale (tile-level). The highly heterogeneous availability of data and examples from the previous sections puts this scale under the spotlight. For this part, about half of the tiles in the dataset -those that contain 20 or more buildings have been taken into account -resulting in 174k tiles for the analysis.
A query on the database indicates that only 1% of them contain at least 10% of building height attribute completeness. However, there is a number of tiles that contain more than 50% completeness, including some that reach 100% (Figure 8). The number of tiles that have near complete level of building height Figure 8. Completeness of height attributes on the tile-level (logarithmic scale). Most tiles have none or almost none building height attributes, but there is a number of those that are nearcomplete, indicating high potential for generating 3D building models on micro-scale, provided the building completeness is also high.
attributes is a few hundred, so together with the assumption that there are tiles that are mapped thoroughly (as it is the case in Figure 6), we may conclude that a 3D building model may be automatically generated for a few hundred square kilometres of area in SE Asia without or with some minor additional work in areas that have 80-90% completeness.

DISCUSSION
There are multiple findings from the data exploration in the previous section. Conceptually, OSM is highly suited for generating 3D models. In reality, the potential depends on the availability of the data, which is often lacking.
There are two angles at looking at the potential. First, considering only the completeness of the height attributes, answering the research question on the potential of generating 3D building models from OpenStreetMap renders a highly heterogeneous answer. Availability of the attributes is highly localised and vary from place to place. No city can be generated entirely in 3D, but the generation of 3D building models may be well viable on a micro-scale at some locations, which may be useful for 3D GIS analyses that can be performed on a precinct scale, such as noise pollution estimation (Stoter et al., 2020), understanding the urban heat island (Hofierka et al., 2020), shadow simulations (Southall and Biljecki, 2017), and energy predictions (Zirak et al., 2020). Second, half of the data required for extrusion (building footprints) are of high quality in many places in the region, so 'only' the heights need to be collected, which in some areas are not entirely absent and are partially available.
However, on a broader scope, the potential for generating 3D building models is an inseparable function of two arguments: building completeness and height attribute completeness. In this analysis, the latter has been investigated, with only some hints of the former. The attribute completeness has been derived as the share of buildings that have a height attribute divided with the number of buildings. However, the actual height attribute completeness is a fraction that includes all buildings in reality: Buildings with either levels or height attribute Buildings in OSM + Missing buildings not mapped in OSM ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume VI-4/W1-2020, 2020 3rd BIM/GIS Integration Workshop and 15th 3D GeoInfo Conference, 7-11 September 2020, London, UK Therefore, the main limitation of this study is that it is impeded by the lack of systematic information on building completeness. I am afraid that research considering building completeness would require a comprehensive undertaking given the vast area covering several countries, and it might be arduous considering the lack of authoritative data. Studying building completeness is a topic on its own, and it is mostly confined to one or a few small countries at a time (Goldblatt et al., 2020;Fan et al., 2014;Girres and Touya, 2010;Hecht et al., 2013;Brovelli and Zamboni, 2018). Thus, analysing building completeness for 11 countries is way out of the scope of this paper. While OSM quality has been studied in some countries in the region (Husen et al., 2018;Aditya et al., 2012), research is either limited or outdated.
While the lack of a building completeness study is a limitation of the work, some manually checked areas have indicated to have an exceptionally high building completeness (e.g. see Figure 2 and Figure 6). And since building footprints present half of the ingredients required for the extrusion, generating 3D building models may still be viable, as it would require manually populating missing attributes of the heights, which in some parts may not actually require much work.
This exploration also reveals some interesting patterns when it comes to completeness, mirroring findings of other researchers that completeness is heterogeneous and it highly depends on the different types, motivations, and effort of OpenStreetMap contributors (Quinn, 2015). For example, there are clusters of high completeness of buildings and their attributes ( Figure 6). These may be explained mostly by a mapping effort of local enthusiast(s). Then, there are areas of exceptionally high completeness of footprints, but they have no height information at all. For example, see Figure 2: despite the near-perfect building completeness, not a single building in the spatial extent contains the attribute on its height or number of storeys. Such situations may be a result of humanitarian mapping (the region was hit by a natural disaster several years ago), exhibiting less priority on the attributes. Further, quite some mapping is done using aerial images (Figure 1), from which it is difficult to collect the number of floors, explaining the lack of such information. There are also cases of high completeness of height attributes, but mostly because of the low completeness of footprints (see the above equation), meaning that those buildings that are mapped, are mapped quite well (situation occurring in the area shown in Figure 7). Finally, it was noticed that some tourist hotspots in Southeast Asia, e.g. Phuket (Thailand) and Ubud (Indonesia), have high levels of attribute completeness, potentially suggesting OSM contributions by visitors.
A manual exploration of the data also suggested that the largest and most urbanised areas are not necessarily the ones with the highest building footprint completeness. For example, it was found that some rural Burmese areas (see the second pane in the Figure 2) have a complete or near-complete building completeness, which is higher than the completeness found in both the country's largest city (Yangon) and capital (Naypyidaw).
The statistical analysis has indicated that there is no correlation between the density of the buildings in an area and attribute completeness.
More research is needed to understand the patterns of building and attribute completeness, and this is one of the main suggested paths for future work. For example, it might be worthwhile investigating the relationship between completeness, and pop-ulation density and GDP, similar to the research of Barrington-Leigh and Millard-Ball (2017), which is focused on roads.

IMPLEMENTATION IN SINGAPORE
A 3D city model in Singapore was generated and released as open data as part of this work to contribute to the increase of open 3D data in the region. While OpenStreetMap building footprint coverage in Singapore appears to be near-complete, and the share of building height attributes is highest in SE Asia (Table 1), this combination is still not sufficient to generate an urban scale 3D city model.
As an alternative, a 3D building model was generated maximising the combination of volunteered and governmental geoinformation ( Figure 9). Singapore's Housing and Development Board (HDB), which is in charge of the public housing in Singapore, maintains an open dataset containing information on their buildings 7 . While the dataset does not contain the geometry of the building footprints, it includes the number of storeys of each building, together with a rich set of information. The tabular information of buildings was associated to building footprints in OpenStreetMap using the Government's geocoding API 8 , after which the footprints were extruded in 3D, and preserving the attributes ( Figure 10). The dataset was generated in CityJSON , and it is available publicly at the Github repository of the Urban Analytics Lab of the National University of Singapore 9 , together with the code (in Python) developed to generate it 10 . Figure 9. A semantic LoD1 building model of residential buildings in Singapore generated in the frame of this research, conflating OpenStreetMap data and an open dataset released by a Singapore Government agency. The resulting data (CityJSON) and the code are released openly.
Public housing accommodates the predominant majority of Singapore's population, so the dataset covers most of the nation's residential buildings. While many of the same buildings in OSM contain the same building levels attributes, so Open-StreetMap alone could be sufficient to obtain a 3D building model, a decision was made to include the government dataset to take advantage of the extended set of information such as the year of construction. This example exposes another particularity -varying attribute completeness and heterogeneous potential for generating 3D building models is not necessarily only a geographic matter, as it can be rather available on other dimensions such as building type: this dataset has a countrywide coverage but it is partial -it covers a particular subset of buildings.

CONCLUSION
Generating 3D city models from open data such as extruding footprints of OpenStreetMap is not a new topic, but not much documented in Southeast Asia. The main conclusion of this study is that the potential for large-scale generation of open 3D building models is low, both due to lack of open government data and patchy OpenStreetMap completeness (which actually may not be much different from many other regions around the world). On the other hand, generating data on the neighbourhood scale may still be a viable option, as there are some extraordinarily well mapped areas and pockets of high completeness.
While generating 3D building models from open data on a wider scale may be difficult, it may not be impossible when considering methods such as the one introduced by , which predicts heights of buildings from 2D data, enabling the generation of 3D building models without elevation data. Some of the cities with a level of completeness of 10-20% may provide a sufficient amount of data for training a regression model to predict the heights of the remaining portion of the buildings that do not contain height information. In fact, the feasibility of this idea was demonstrated by a replication of the method in the same study area of this research- Anh et al. (2018) implemented the approach in Hanoi, Vietnam.
Another contribution of this paper is considering OSM attribute completeness through a more detailed statistical analysis, which may also be useful for understanding building height attribute completeness in general regardless of the study area. Further, the study resulted in a tangible contribution -a 3D building model was generated and released openly. Although simple and relatively easy to reproduce, the release of the dataset attracted substantial interest suggesting strong demand for open 3D city models in Singapore and Southeast Asia, and it is already in use in research efforts by other researchers.
This research is of preliminary nature, setting the scene for a more comprehensive work: it focused on the availability of the data, but not its accuracy. For future work, building completeness should be considered as well, together with checking the quality of the heights and it may be worth scaling the work on the global scale to understand the worldwide potential of generating 3D building models from open data.