LINKED BUILDING DATA FOR CONSTRUCTION SITE MONITORING: A TEST CASE

: The automation of construction site monitoring is long overdue. One of the key challenges to track progress, quality, quantities is to integrate the necessary observations that make these analyses possible. Research has shown that semantic web technologies can overcome the data heterogeneity issues that currently halt the automated monitoring, but this technology is largely unexplored in the construction industry. In this paper, we therefore present a tentative framework for Linked Data usage on construction sites. Concretely, we combine observations of Lidar scans, UAV and hand-held cameras with the as-designed BIM through RDF graphs to establish a holistic analysis of the site. In the experiments, a proof of concept is presented of the structural building phase of a residential project and how remote sensing data can be managed during project execution.


INTRODUCTION
The construction industry is increasingly adopting Building Information Modeling to improve design, lower failure costs, improve worker efficiency and so on (Nguyen, Choi, 2018). While BIM models are already extensively used during design, the routines for construction site monitoring are still lacking including methods to estimate progress, quality and quantities of objects and materials on site (Golparvar-Fard et al., 2015). Additionally, the as-designed models must be updated to as-built models upon project completion, which also requires detailed remote sensing data (Jadidi et al., 2015).
The automation of construction site monitoring requires three aspects. First, there is the semantically rich 3D BIM that is now available on construction sites. The BIM serves as the reference for the analyses and contains the planned location and geometry of each element on the site. Furthermore, the attached planning yields detailed information about when an element is to be completed. Class tolerances are also present, typically described in the BIM protocol or based on international specifications such as the Level-of-Accuracy (LOA) (U.S. Institute of Building Documentation, 2019). Second, there are the sets of detailed and accurate observations. To this end, remote sensing observations are periodically acquired of the site i.e. Lidarbased point clouds and/or images from UAVs, hand-held cameras or eXtended Reality (XR) devices . These observations are (geo)located on site and cover enough of the site to provide a reliable analysis of each target element. These observations can also be processed to derivative products such as textured meshes and orthomosaics. Third, there are the analyses themselves. For every reference element in the BIM, its corresponding observations must be associated with the object parameters and isolated for interpretation. Next, the observations are processed to establish the relevant outcomes for the quality, quantity and progress. * Corresponding author This research targets the third aspect. More specifically, we focus on linking the periodic observations to the reference BIM data with respect to the different coordinate systems, information repositories and meta data parameters. Our goal is establish the framework and the codebase to combine BIM element information (Geometry, planning, textures, and so on) with remote sensing inputs (point clouds, images, meshes, and so on) in a standardized manner. To this end, we will harness semantic web technologies that can interlink the different information repositories without fully integrating them. Concretely, we present a Linked Building Data testcase so consecutive data acquisitions on site can be linked to the BIM information and be jointly analyzed. In summary, the main contributions are: 1. An extensive literature study on the Linked Data framework for construction site information management 2. An empirical study of the concepts, relations and data structure needed for multi-temporal construction site monitoring 3. A practical case study where periodic images and point clouds are linked to the as-designed BIM The remainder of this work is structured as follows. The background and related work is presented in Section 2. In Section 3, the methodology is presented. The experiments results are discussed in Section 4. Finally, the conclusions are presented in Section 5.

Image/Lidar analyses
Both Lidar-and photogrammetric analyses must be considered since their contributions to the monitoring are complimentary. Lidar-based systems, such as Terrestrial Laser Scanners (TLS) Figure 1: Overview of the remote sensing inputs for construction site monitoring and the Linked Data framework that will be used to exchange and jointly analyse the remote sensing data. and indoor Mobile Mapping systems (iMMs) generate highlyaccurate point cloud data that is essential to any spatial analyses. In contrast, photogrammetric techniques based on handheld, UAV or even tower crane cameras use advanced triangulation to reconstruct overlapping pixels in multiple images, resulting in high quality textured geometries such as polygonal meshes and orthomosaics (Kim et al., 2019). Especially on construction sites, the resulting geolocated images are essential as they are the most detailed observations on site. Furthermore, during processing, the geolocated images can be linked to infrared, multi-spectral or hyper-spectral imagery that are essential for specialized quality control inspections (Bonifazi et al., 2019).

Construction Monitoring
Three types of analyses i.e. progress, quality and quantity can be achieved with remote sensing. The assessment of the progress is considered as a binary query i.e. whether the construction is element completed or not. This analysis is typically conducted on segmented point clouds of which the number of points falling within an object geometry are evaluated (Roh et al., 2011). Machine learning methods are also presented that attempt to distinguish between consecutive stages of the building process i.e. by training a classification model to look at the point signatures and distribution in various directions . Geolocated images are also a key input as computer vision algorithms have matured to the point that they can reliably distinguish between (un)-completed objects (Cuypers et al., 2021). Where progress estimation is more of a state estimation task that requires sufficient observations, quality control requires significantly more accurate data. As such, Lidar-based point clouds are a must-have for spatial quality control in buildings. For horizontal objects, such as in road construction, photogrammetric point clouds can achieve LOA30 accuracy (errors ≤ 0.015m) but this is in isolated cases. Geolocated images again can be used i.e. to match the outlines of objects to a set of detected outlines in the images . Furthermore, cracks and damages can be more reliably detected from imagery than in point cloud data (Valença et al., 2017). Lastly, quantity estimation requires both visual and geometric inspections i.e. surface areas of pavement, facade finishes and so on. As such, geolocated imagery, textured meshes and orthomosaics are the preferred input alongside Lidar point clouds.
Overall, all 5 inputs (Lidar point clouds, geolocated images, photogrammetric point clouds, textured meshes and orthomosaics) each have their use in construction monitoring. Future monitoring frameworks should therefore combine and integrate these datasets to formulate reliable and accurate analysis procedures.

Linked Building Data (LBD)
A Linked Data approach to construction site monitoring is the next logical step. Linked Data offers significant benefits in terms of data propriety for different stakeholders since information repositories are linked rather than integrated (Pauwels, Terkaj, 2016, Schulz et al., 2021. Linked Data also benefits from a variable scalability that can stretch over the entire life-cycle of an asset, where proprietary data formats typically do not span across construction domains and are meant for singular purposes which limits the data's lifeexpectancy. While Linked Data is relatively unexplored for construction site monitoring itself, there are several construction industry projects that already leverage Linked Data for planning and collaboration. For instance, in the iCONS project, Aalto University has developed location-based planning and controlling approaches, and integrated them through Linked Data with collaborative planning, real-time tracking of resources and using images for reality capture (RECAP project) (Seppänen et al., 2019). VIPRA-C links IoT systems for smart real estate management. e-COGNOS aims to develop construction domain ontologies for knowledge management in infrastructure. InteliGrid developed an interoperability platform that allows semantic collaboration among large-scale engineering industries. DiCtion is a closely related project that developed digital workflows for collaborative planning and controlling using the digital situational awareness empowered with Linked Data (Peltokorpi, Zheng, 2021). DRUMBEAT is a very promising project in which IFC models are converted into Web-of-Data assets where each element with a GUID is given unique URI address that can be accessed with authorization over the Web (Törmä, 2017). V-Con specifically targets road construction and maintenance and attempt to establish a common standard for this industry (Delft, 2015). However, individual regions have also de-(a) BIM object selection per phase or by user selection and the BIM mesh export as .obj.
(b) Point cloud segmentation per BIM mesh geometry based on a Euclidean distance filter.
(c) Point cloud export per BIM as .ply conform with Opend3D and the update of the RDF graph.
(d) Image raytracing between BIM objects and images to estimate visibility. veloped their own Linked Data frameworks such as the Agency of roads and traffic (AWV) in Belgium (Agentschap Wegen en Verkeer, 2019) which already uses standardized assets for road design. All these projects have in common that they emphasize on information exchange, planning and collaboration. In the methodology, the Open Source ontologies of these projects are closely inspected whether they can be easily extended for the remote sensing-based construction monitoring itself.

Remote sensing inputs
As discussed above, four types of remote sensing inputs should be considered for a holistic construction monitoring (Fig. 1). In this first section, the theoretical framework for the data structure and analysis is outlined. Concerning the data storage, the raw remote sensing data is stored per session and per sensor. Notice that this chronological data structure is storage-independent and can used with cloud-based solutions, dedicated databases or conventional folder structures. First, the sensory data is stored per sensor along with several project folders to process the data. For instance, the raw TLS data of the Leica P30 is stored under "Session X P30". In that same folder, a subfolder REG-ISTER360 contains the project folder of the Leica Register360 that the data was processed with including the project archive and temporary files. The final remote sensing data sets are stored in separate folders (PCD,MESH,ORTHO,IMG) where the individual assets are assigned URI references and typically also contain more information about the processing settings that were used to generate them e.g. "tls raw", "tls unified 2cm" and "sfm dense high". Each group of assets has a accompanying Resource Description Framework (RDF) graph that governs its relations to other assets, its coordinate system and the metadata of the assets such as its properties and processing parameters. The following metadata is stored for every input: Geolocated imagery: The LBD concepts for the geolocated imagery are taken from the Arpentry (ARP), (EXIF) and our own V4Design (V4dImage) ontologies . First, an initial RDF graph is created from the EXIF data of the raw image files including camera information, pixel values and also the tentative location (Carl, 2018). To store the location, we use GEOSPARQL to define and store the WGS84 lat, long and height values. Once the images are processed by the photogrammetric pipeline, an XML file is stored with the updated image locations and Structure-from-Motion metadata. Concretely, we parse the RealityCapture and Agisoft Metashape output formats to extract the updated localization and camera information incl. the coordinate system, the coordinates, the adjusted focal length, and so on. Point clouds: The LBD concepts for the point cloud data are currently being left as the concepts from the E57LIB API (Ackley, Coleby, 2010). Using the C++ extraction tool, each point cloud's metadata is extracted including the coordinate system, the density, whether it contains a structured or unstructured dataset, the type of sensor that was used, the bounding box, the creation date, and so on. Once the data is processed, the OMG, FOG and our own V4Design (V4d3D) ontologies  are used to govern the coordinate systems, cartesian transform and offsets with relation to the project coordinate system.
Orthomosaics: Much like the scripts that extract the RDF graph of the images during and after the structure-from-motion pipeline, an RDF graph is created for any orthomosaic that is produced. Most of the same metadata is used as with the Geolocated imagery. However, the Ground Sampling Distance (GSD) and the blending method are also stored as well as the bounding box in the corresponding coordinate system.
Polygonal Meshes: Analogue to the other photogrammetric outputs, the RDF graph for the meshes is generated from the project folders of RealityCapture. The same meta data as the point cloud data is stored along with the face count, vertex count, presence of uv mapping, materials, texture relations and the file extension.

Monitoring data
In a second stage, the initial four types of remote sensing inputs (Lidar and photogrammetric point clouds are considered as the same type) are processed to monitoring data (Fig. 2).
To this end, a set of Rhino Grasshopper scripts is developed that exchange, segment and link the individual assets in RDF graphs. Rhino Grasshopper is used as the main processing platform along with the Rhino.inside.Revit API to directly manipulate the Revit geometries. As also described above, the RD-Flib (Carl, 2018) and RDFsharp (De Salvo, 2016) APIs for the Linked Data and libE57 (Ackley, Coleby, 2010) for the point cloud functionalities are integrated with our own Scan-to-BIM API  and the Open3D and Volvox API (Ochmann et al., 2019).
BIM geometry extraction A first Rhino Grasshopper script is developed that converts a set of selected BIM objects to polygonal meshes and exports that geometry to .obj files given a new URI (Fig. 2a). A new monitoringGraph RDF graph is constructed along with the exported assets that stores the export path of each geometry, the link to the ifc or Revit file, bounding box, ifc class e.g. (IfcWallStandardCase) and coordinate system information for each asset. Additionally, placeholders resources are created for state estimates, quality metrics and quantities. These resources will be further elaborated upon once the analysis methods start to take shape. Similarly, the four types of remote sensing inputs are embedded as child relationships of the BIM geometry resource i.e. v4d:hasPCD, v4d:hasMesh, v4d:hasImage, v4d:hasOrtho.
Point cloud segmentation A second Rhino Grasshopper script segments and exports the point cloud P per BIM object n ∈ N in the session (Fig. 2b). To this end, a nearest neighbor analysis is conducted between the selected point cloud and the set of BIM objects. For every point pi ∈ P , the Euclidean distance ∥pi − pn∥ to the nearest point pn on the boundary surface of each BIM object n ∈ N is established. Conditioned on a distance threshold t d , subsets Q ⊂ P are isolated (Eq.1) Figure 4: Overview of the monitoring data structure and RDF graphs generated by week 22 remote sensing data and as-design BIM.
At this stage, only a Euclidean distance filter is applied of 0.5m to isolate the neighborhood of the objects. These subsets will then be fed to the progress, quality and quantity estimation methods, such as proposed in previous works . For each segmented Q, a new URI resource is created to which the export path, point count, bounding box, source point cloud (URI), t d , accuracy and processing time is assigned by triples. Per input point cloud, a new pcdnode resource is constructed and serialized in the same Graph as the exported point clouds. Furthermore, the monitoring RDF graph is enriched with the URI of the segmented point clouds (Fig. 2c).

Image selection
The image segmentation is split into two consecutive steps: (a) an initial selection of the images that depict a certain object and (b) a pixel segmentation of each image to isolate the observations of that object. The former is established by evaluation the RDF graph of the images together with the BIM geometries of already constructed objects. For every BIM object n ∈ N , a set of rays Rn ∈ R is established between each image's focal point c ∈ C and the nearest point pn on n (Fig. 2d). Each ray is tested for collisions with N \ n and also Euclidean distance threshold tGSD to only select images that are within a certain range of the object (Eq. 2). R = −→ cpn n ∈ N, c ∈ C, n(pn) : argmin pn ∥c − pn∥≤ tGSD I = ic ∈ I rc ∈ Rn, Rn ∈ R, n ∈ N : rc ∩ N \ n = ∅ with I the sets of images of which each image ic ∈ I depict the object n. Analogue to the point cloud segmentation, the monitoring RDF graph is updated with the URIs of the matching images. In a second step, the image masks will be established of each image along with a virtual image of the BIM with the as-design texture of the objects. This second step will be part of the continued research and will be a core feature of the monitoring analyses.

RESULTS AND DISCUSSION
To narrow the scope, we focus on building constructions but foresee the expansion towards industrial and infrastructure projects as well. Concretely, we develop the monitoring framework on a residential project in an urban area (Fig.3). The site (100m x 60m) is comprised of three apartment buildings. 4 general databases are stored and linked: 1 for each apartment block and 1 for the central underground parking. 18 periodic measurements/time series were captured of the site, mostly during the structure phase of the parking and building 3. A combination of UAV flights (DJI phantom 4), Lidar measurements (Leica P30 and BLK) and handheld-images (CANON EOS 5D Mark II) were captured. During the experiments, the monitoring data inputs of week 22 were structured and generated.
In week 22, both the hand-held imagery from the Canon (IMG RGB) and Leica BLK (PCD BLK) are captured and stored on a local server. The BLK data was exported both as separate e57 scans and as a unified dataset along with their respective .ttl RDF Graph in the PCD folder. The metadata of both files was extracted and stored as XML using the e57xmldump function from the lib E57 API (Ackley, Coleby, 2010) including the bounding box, normals, variable types, timestamp, point count, and so on.
The imagery was processed with RealityCapture after which the image and localization information was stored. RealityCapture provides an exporter that stores all the metadata of the alignment as XML which is similar to the LIBE57 XML data structure. In case no SfM pipeline was run or the metadata of the imagery is sufficiently accurate (such as with RTK-positioned imagery), the EXIF metadata is directly parsed and stored as a turtle file using the RDFLIB API (Carl, 2018) in the same folder. The resulting SfM point clouds are stored under PCD and are serialized using the same e57xmldump function or by their file header. The exported mesh is stored in the MESH folder. Currently, no additional parsing of the metadata of the mesh is implemented as the .OBJ files can be directly parsed and all its metadata properties are inherently present in the Rhinocommon mesh class. Finally, the orthomosaic is stored in ORTHO and its RDF graph stems from the RealityCapture XML.
To establish the monitoring data, the exported BIM objects are stored as separate meshes under their ifc guid as OBJ. The central monitoring RDF graph is stored in a central location. From the point cloud segmentation procedure, the BLK and photogrammetric point clouds are stored as .ply files in along with their RDF graphs. From the image selection procedure, only the images that see at least one object and thus contain location information (either from their initial EXIF or from the SfM process) are reexported. For both processes, the monitoring graph is updated with the resource URIs of the point clouds and the images. Fig. 4 shows an overview of the resulting file structure and an example monitoringGraph.tll. Notice that images can contain multiple objects and thus the image URIs are nonexclusive over the selected BIM objects.

CONCLUSION
This paper presents a novel Linked Data framework for construction site monitoring. The methods presented discuss the cadre in which periodic remote sensing can be acquired from construction sites and interlinked with the BIM models to asses the progress, quality and quantities on site. The goal of this research is establish a holistic Linked Data framework in which Lidar-and photogrammetric inputs can be jointly processed to conduct construction site monitoring in an unsupervised manner. The presented procedure is the first step in a series that achieves this standardized interlinking and analyses. The main contribution is the theoretical requirements for the Linked Data framework along with the existing and new ontologies that will be used/created.
A real testcase is set up as a demonstrator for future analyses and the expansion across construction domains. Concretely, the data from a residential project is organized and serialized to RDF conform with Linked Data standards. The framework for the expansion of the V4Design ontology is also presented for the processed inputs such as segmented point clouds, photogrammetric reconstructions and polygonal meshes. Special attention is given to the geospatial and temporal aspects of the monitoring. i.e. the coordinate system information of each repository and the consecutive data acquisition sessions.
In future work, we will commence implementation of the standardized Linked Data ontology along with the methods for analyses to interact with both the BIM and remote sensing inputs. The codebase will be based on both C# and Python and will build upon Rhinocommon, Open3D, CV and for the Linked data on RDFsharp and RDFlib. Once this pipeline is established, a range of advanced tracking algorithms will be developed to help better manage construction sites.

ACKNOWLEDGEMENTS
This project has received funding from the VLAIO COOCK programme (grant agreement HBC.2019.2509), the FWO Postdoc grant (grant agreement: 1251522N) and the Geomatics research group of the Department of Civil Engineering, TC Construction at the KU Leuven in Belgium.