Data modeling for operation and maintenance of utility networks Implementation and testing

: The organisational data models that support the information needs of utility network managers are proprietary and domain-specific, while the emerging national standards in this field often lack lifecycle data representation capabilities. However, multiple types of utility networks can be comprehensively represented with the free and open-source Utility Network Application Domain Extension (ADE) of the international standard CityGML. The Operation & Maintenance (O&M) Domain Ontology is a proposed extended version of the Utility Network ADE that allows for consistent and comprehensive processing, storage and exchange of O&M-related utility network data. So far, this ontology has not yet been implemented in a spatial-relational database. Consequently, the support it offers during routine utility asset management tasks has remained untested. This paper, therefore, tests the support of the O&M domain ontology for asset management and proposes a database implementation of this data model. To this end, it models and loads two utility networks from the campus of the University of Twente, the Netherlands. It tests the ontology’s support for asset management by simulating a street reconstruction project and retrieving necessary project information in relation to a utility’s (a) maintenance history and performance, and (b) site conditions and valve locations. Results show that the implemented model supports projects with rapid, comprehensive, and consistent information about semantic details of utilities. Such data needs yet to be collected and registered systematically to enable future data-driven asset management practices.


INTRODUCTION
The supply and disposal of the commodities that sustain society are realized through utility networks. The lifecycle management of utility networks is fragmented vertically, horizontally, and longitudinally. First, vertical or inter-functional fragmentation happens as companies focus on their core competences and outsource specialized tasks (Steenhuisen et al., 2009). In-house integrated ownership, management and execution of Operation & Maintenance (O&M) of utility networks thus becomes increasingly infrequent. Second, horizontal or inter-disciplinary fragmentation occurs because utility networks transporting distinct commodities are owned by different parties, and they are constructed with the combined efforts of several trades (e.g. design, piping, surveying, systems, etc.). Finally, longitudinal fragmentation exists while information and knowledge about a utility network and its components do not flow seamlessly through the different life phases and stakeholders that manage the network. This may be because of the inability to integrate historical asset records.
As a result, asset information is dispersed among multiple organisations and captured in different data models, which are often proprietary and, possibly, closed-source. There is a great number of such models at national, industry and organizational level, and each has its own conceptualization of reality (Becker et al., 2011) and specific data storage format. Consequently, utility data are often not interoperable, and organizations must transform formats (e.g. shapefile to CAD to database) and semantics (i.e. ontology to ontology) when exchanging asset * Corresponding author information. Because of longitudinal fragmentation, asset information is not easy to retrieve and sometimes even lost. An example of this is the scarce availability of the z dimension in utility datasets, due to the lack of a placeholder in the 2D-oriented historical cadastres (Ossko, 2002).
These fragmentation issues form a barrier to the growing societal pressures to safeguard the reliability of public infrastructure. Therefore, infrastructure owners increase levels of service, maintain aging infrastructure, and reduce costs by moving towards the lifecycle asset management (AM) paradigm (Wijnia and Herder, 2010). This paradigm requires that asset owners make decisions that increase infrastructure quality while also minimizing costs. It requires them to mindfully register their utilities' lifecycle data in comprehensive models.
Comprehensive data models can integrate asset data and minimize integration issues with disperse datasets, to eventually support data-driven AM-decisions. Examples of data models are the IMKL, which is the underground utility standard for the Netherlands (Geonovum, 2019), and the one in development in Singapore (Yan et al., 2019). These models do not, however, include the detailed information needs that asset owners and managers have while making decisions about Operation and Maintenance of a utility. Models, for example, lack support for representation of performance or maintenance history. Further, the models do not support the representation of the surrounding soil and groundwater levels that are necessary to plan the trenching and dewatering tasks. We thus posit that the sector lacks a standardized data model that supports lifecycle asset management of utility networks.
One solution to this problem may be the open-source and free O&M Domain Ontology. This model is an improved version of CityGML Utility Network ADE 0.9.2 as it adds asset management concepts to the base model (ter Huurne, 2019). All of CityGML, the Utility Network ADE and the O&M Domain Ontology are based on the ISO 19100 standards family. The content of the O&M-model is based on different data models that practitioners currently use to represent Dutch utility networks, and it incorporates elements of IMKL, such as related party and component identifiers. The model, however, lacks a technical database implementation and testing and hence has not demonstrated its practical value as a standard for asset managers.
To address this, we present (a) the workflow that implements the O&M Domain Ontology in a PostgreSQL-based database, (b) the data pre-processing steps, (c) the workflow that semantically transforms and enriches the data and populates the database, and (d) the formalization of database queries. We demonstrate a test case of two utility networks that are located at the campus of the University of Twente to show how the model supports a typical street reconstruction project. We show the planning support that the O&M Domain Ontology enables with its Utility-Network-ADE-inherited topological module.

THEORETICAL BACKGROUND
Utility network asset information is typically stored digitally using formats varying in complexity (e.g. scanned records, pdf files, CAD files, geo-databases, etc.). The representation and storage of the utility networks requires the use of ontologies to achieve a standardized output (Xu and Cai, 2020). An ontology can be described as an underlying data modelling standard that is specific to a domain. In the past two decades, several technologies have been developed to help bridging interoperability issues through unifying ontologies.
One of these technologies is associated with the Geographic Markup Language (GML). GML is a vendor-neutral standard from the Open Geospatial Consortium (OGC). It defines the way in which spatial features should be represented digitally, without describing specific features (Lake et al., 2004). CityGML is an application schema of GML that digitally represents cities, including a great variety of above-ground objects (buildings, bridges, tunnels, etc.), their appearance, geometry, and other semantic attributes (Gröger and Plümer, 2012). CityGML currently has three encodings, namely GML, JSON, and a SQLbased spatial-relational database called 3D City Database or, in short, 3DCityDB (3DCityDB Development Team, 2016). The CityGML data model can be extended modularly to add concepts from the utilities domain by using the Application Domain Extension (ADE) mechanism. Several ADEs exist (Biljecki et al. 2018), ranging from energy  to augmented reality (Zamyadi et al., 2013). The Utility Network ADE is an extension of CityGML that provides the necessary classes and relations to represent different utility network types (e.g. water, electricity, gas, etc.) both topographically and topologically (Becker et al., 2012).
Further, the Utility Network ADE has a database encoding that extends the 3DCityDB. The latest version of the 3DCityDB tools allow for an automatic derivation of the ADE-related database using the ADE's XSD (XML Schema Definition) file (Yao et al., 2018. By spring 2020 the Utility Network ADE is still in development and has no formal documentation. The consequence is that there are no clear recommendations on best practices to model network topology. As the Utility Network ADE is not developed for Asset Management specifically, it lacks operation & maintenance concepts. Similarly, those O&M related attributes also lack in the IMKL data model of the KLIC-WIN program in the Netherlands (ter Huurne, 2019). Table 1 compares the most important capabilities and supported features of Utility Network ADE, the O&M Domain Ontology, and IMKL. We based the selection of elements on the requirements for the Utility Network ADE, as specified by Becker et al. (2011), and on the identified lifecycle asset management needs from the empirical observations that ter Huurne (2019) conducted in a utilities contractor firm. We established if a capability is covered by checking the presence of a class and its attributes that capture the knowledge about the properties / capabilities as listed in the table. For example, we checked whether the models supported multiple utility types by checking whether their classes included representation of water, data, etc. The table shows that the Utility Network ADE can represent network and component hierarchies, store topographies, represent topology in detail, and connect to city models. IMKL can represent attributes such as depth, related party, physical labels, and precautionary measures. The O&M ontology includes most of the representation capabilities of both the Utility Network ADE and IMKL. Topography is only considered partially supported by IMKL 1.2 because this model only stores the topography of the utility network components, and not a Digital Elevation Model (DEM) of the terrain's surface. Topology is also considered only partially covered by IMKL 1.2 because it lacks the 'feature graph' concept used by UN ADE. Mapping components to feature graphs allows for more faithful representations of the topology of a network (Becker et al., 2011). In addition, the O&M Domain Ontology adds new classes and relations that provide additional capabilities for the missing asset management concepts. Figures 1 and 2 provide an example of a selection of these classes, and serve as visual aid for readers unfamiliar with the Utility Network ADE: • 'RelatedParty' (Figure 1) is a new class that is related to 'Network', 'AbstractNetworkFeature', and 'Maintenance Activity'. This class stores the name of an organisation and individual, their contact information, and the role of this party in the maintenance of the network or its components. • 'MaintenanceActivity' (Figure 1) allows the standardized storage of the maintenance records related to components such as pipes, cables, and appurtenances. • 'SurroundingSoilProperties' and 'GroundWaterProperties' (Figure 2) characterise the soil surrounding a utility using terms like type, strength, permeability, and groundwater level. With this data the cost and safety measures during trenching can be estimated.
Additional classes and attributes store: the milestones in the lifecycle of a component; its colour, tags and other visual information to help practitioners onsite to recognize a component; quantified environmental, societal and economic impacts of a component; and how components perform based on types (e.g. engineering and environmental), service level target scores, and actual scores. Finally, 'MeasuredDepthProperties' adds the possibility of recording surveyed depth, measurement location, the reference level and the survey date.
To date, the new classes from the O&M Domain Ontology have been, however, neither implemented nor tested. Thus, their asset management support has remained untested.

CASE STUDY AND METHODOLOGY
Approximately 300 km of utility networks lie in the 146-hectare park-like campus of the University of Twente, the Netherlands. Its department Campus & Facility Management (C&FM) manages them. C&FM consolidated all its real-estate assets into a spatial-relational database using PostgreSQL/PostGIS as backend and QGIS as frontend. This system substituted their old 2D CAD files holding all their utility data. Even though the GIS database system is a substantial improvement over the previous, C&FM still lacks a broader database to store their O&M-related information. This spurred the development of the O&M Domain Ontology (ter Huurne, 2019). This case study offers the chance to encode the O&M Domain Ontology into a database to populate and test it. In order to facilitate the reading and understanding of the several steps defined by the proposed methodology, the theoretical part and the implementation part based on the case study are presented and described together in the following.
The proposed approach consists of four steps. First, we derived the spatial-relational database that serves as backend of the system using the Unified Modelling Language (UML) class diagram of the O&M Domain Ontology (Section 3.1). Second, we pre-processed C&FM's utility shapefiles to correct digitization mistakes and draped these over a Digital Elevation Model (DEM) of the campus (3.2). Third, we transformed the data semantically and loaded them into the database (3.3). Finally, we formalized queries related to two use cases -and visualized the results in a GIS application (3.4).

Database derivation from the class diagram
The O&M Domain Ontology's original class diagram contains an unnecessarily large number of many-to-many relationships between classes. To avoid complexity due to an overload of association tables in the proposed database, we first simplified the defined relationships of the class diagram based on preliminary tests and dialogues with stakeholders. We enforced stricter rules on how to derive new classes from the featureType or dataType stereotypes to reduce the cardinality of many relations in the revised model. This was expected to lead to a minimal loss of functionality.
ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume VI-4/W1-2020, 2020 3rd BIM/GIS Integration Workshop and 15th 3D GeoInfo Conference, 7-11 September 2020, London, UK The automatic database derivation with the 3DCityDB tooling resulted in a too complex output, so we derived database tables manually as a next step. Thus, we created (a) an empty database in PostgreSQL, (b) extended it with PostGIS, (c) installed the 3DCityDB v.3.3, and (d) installed the database utilities package and metadata module of 3DCityDB "Plus" (Agugiaro, 2019) to add stored procedures to 3DCityDB.
Then, we adapted the database encoding of the Utility Network ADE 0.9.2 by extending it with O&M-related relations. To this end, we designed the database manually following the general principles of 3DCityDB (3DCityDB Development Team, 2016), and the ADE-specific guidelines available for the database encoding of Energy ADE (Agugiaro and Holcik, 2017). In order to keep the number of tables in check we merged classes and relations into one table where reasonably possible. Figure 2 provides an example of this, where the classes 'SurroundingSoilProperties' and 'GroundWaterProperties' are mapped to the table 'uom5_soil_and_groundwater'. We also created stored procedures to facilitate data insertion and deletion.

Pre-processing of utility networks and DEM
C&FM provided data of 8 different network types via Web Feature Service. These are: gas low pressure, electricity low voltage, electricity medium voltage, telecommunications, gravity sewers, pressurized sewers, district heating, and fresh water. The data consisted of two main geometry types and associated attributes: point features, consisting of appurtenances, and linear features, consisting of pipes, cables, and protective elements. We used the gas low pressure (GLP) and district heating supply and return (DH) networks as test case for this study, since these were recently reconstructed on campus, and thus could help to test a typical and actual utility maintenance use-case.
To facilitate the initial visual inspection of the downloaded data and allow for simple editing in QGIS, we created several shapefiles containing the linear and point features. We carried out two types of pre-processing tasks to correct data and conform it to a simple topological representation. First, we corrected digitization issues such as: line over-and undershoots for pipes (Figure 3), and points that were not placed exactly over a line; pipes that were classified with an incorrect network or functional status; and, duplicated lines. Second, we partitioned continuous pipes into segments if: (a) one pipe segment was protected by more than one protective element, and (b) a pipe segment was intersected by another pipe halfway rather than at its end. As a result of the initial evaluation of the datasets, we also found that many attributes were empty in the original datasets, and there was a lack of O&M-related ones. Finally, we generated a CityGML-compliant, TIN-based DEM of the study area for later import into the 3DCityDB.

Semantic transformation and database populating
We created an FME workbench that transforms the utility network data to the O&M data model and imports these in the extended 3DCityDB. It also transforms and imports data representing the streets and trees at the campus. This is possible because the department C&FM also manages a database with topographic features such as buildings, trees and streets. The O&M-database can store a representation of those objects natively because it extends the 3DCityDB of CityGML. The workbench contains circa 300 transformers and writes data into both the "normal" CityGML tables and O&M-derived ones. This process assigs IDs and maps the input attributes to the corresponding ones in the O&M Domain Ontology and performs the transformation processes that are explained below. Given the relative sparse documentation regarding the Utility Network ADE on how to generate topology and the different topological configurations from existing datasets, some of the steps that we followed in this work are listed and documented here. We generated and stored the connectivity of the network according to the structure of the O&M (and of the underlying Utility Network ADE) data model. The Utility Network ADE allows to use internal nodes and auxiliary interior feature links to model a single pipe, which in turn permits the use of complex multi-element 'feature graphs' while keeping the pipes in the original system unpartitioned ( Figure 4B). However, we decided to partition the geometries representing pipes ( Figure 4A) at every node and at every intersection with another pipe before generating the topology and obtained a simplified topological representation with fewer links and nodes ( Figure 4C). The practical processing steps in FME are: (1) Partitioning pipes into independent segments at intersections with other pipes and at intersections with nodes by generating one node at the beginning of every loose end and at every intersection; (2) matching existing appurtenances to the generated nodes, and classifying unmatched nodes (those with no information available in the original dataset) as auxiliary appurtenances; (3) multiplying the nodes corresponding with appurtenances to generate external nodes of pipes; (4) generating interior feature links between the exterior nodes of each pipe; (5) generating inter-feature links between exterior nodes of pipes and nodes corresponding to appurtenances; (6) finally, draping all 2D features onto the DEM to give them an elevation, and adding an offset to the underground by -1 m. We left the geometrical representation of the links and nodes of the topology at 0 m elevation ( Figure 5), as suggested by Boates et al. (2018).

Figure 4. A) Pipe and appurtenance representation in the input shapefiles. B) Complete topological representation possible in
Utility Network ADE (Kutzner et al., 2018). C) Simplified topology, adopted in this work Figure 5. Geometric (red) and topological (blue) representations of the gas low pressure network at different heights.

Use-cases and derivation of queries
A typical and recent operation and maintenance case for C&FM is a street reconstruction project. C&FM decomposes such a project into two phases, each associated with different information needs. These are as follows: 1. Phase I. Identification & Decision: What network components are in the project area? What actions should be taken with respect to the existing utilities? 2. Phase II. Execution. What are the site conditions? Where are the relevant valves of each network? Due to the lack of values for maintenance, surrounding soil, measured depth, related parties and performance we first enriched the dataset with dummy (but realistic) attributes. We chose QGIS as database frontend for data inspection and visualisation, and prepared SQL queries as follows.

Simulation of a street reconstruction
In the first step of phase I, we define the extents of the construction area by selecting a street at the University of Twente campus in QGIS. The second step performs a spatial query using the QGIS plugin 'DB Manager' to retrieve the pipes and appurtenances located inside the street, and to load the result as a QGIS layer ( Figure 6). The results of this 'identification' part of phase I match the type of information that would be obtained via a KLIC-request, the compulsory system for requesting utility location information in the Netherlands. Next, we retrieve the records needed by C&FM to decide whether to inspect, rehabilitate, replace or do nothing with the existing components within the construction area.
Phase II represents a planned maintenance work. The asset manager uses the system to extract relevant data stored in the model and send detailed assignments to the contractors that are responsible for the street and utility reconstruction work. The addition of operational on-site aspects such as the soil type and the presence of groundwater allows the contractor to better estimate the costs and plan the works, in terms of employing adequate shoring for the laterals of the trenches and estimating dewatering requirements. Furthermore, a priority in this stage is to avoid excavation damages and minimizing the consequences of an eventual pipe strike. Thus, to reduce the chances of a pipe strike the asset manager needs to identify the pipe location and depth, and the colour of the pipes. Further, to minimize the consequences of a pipe strike the asset manager needs to determine which valves must be closed to isolate the components from the source of the commodity.

RESULTS
The results of phase I are summarized in tables 2 to 4. Table 2 shows data that allows to characterize pipes, such as id, colour, diameter, depth, etc. A similar query could be done for appurtenances.
The next queries exemplify the case when the asset manager uses supporting maintenance and performance records to decide on the future actions on the existing infrastructure. As the maintenance history and performance information are non-geometrical, they are presented in simple tables. Table 3 presents how the maintenance history results from the query. It contains the components' ID, maintenance timeline, the type of maintenance, the executed maintenance activity, task, dates, and the related party. Table 4 shows performance information such as the dates of installation and of performance measurement, the required and actual performance, an indication on whether the performance is sufficient, and extra information.
To obtain the information necessary during phase II, we developed two queries. The first one retrieves the site conditions, location, depth and pipe characteristics (colour, shape, size) that are necessary for the contractor to better estimate the costs and to execute the work safely. Table 5 shows the non-graphical part of the query's output, which also includes the geometric representation of the elements in a map. The second query exploits a variant of the common Dijkstra algorithm in the pgRouting extension of PostgreSQL to locate the first valve in all the different paths that lead back to the source of the commodity (Figure 7). The query (shown in the appendix) selects the distinct valve appurtenances that are located in the 100 shortest paths from the pipe to the source by using the pg_KSP (Dijkstra with 'K' shortest paths) function. It looks for the connectivity information in the custom views uom5_view_pipe_topology and uom5_view_appurtenance_topology. Table 2. Identification information of pipes in the study area. Subset of columns from the full query. Table 3. Maintenance history of pipes in the study area. Subset of columns from the full query. Table 4. Performance history of pipes in the study area. Subset of columns from the full query. Table 5. Site conditions and information necessary for a safe excavation.

DISCUSSION
This study implements and tests the O&M Domain Ontology extension of the Utility Network ADE. This ontology contains new concepts from the domain of asset management and was tested using datasets from the campus of the University of Twente. As described in the previous sections and shown in the accompanying Tables and Figures, we simulated a street reconstruction project and formalized the network manager's information requirements through the phases of such a project into queries. The added value of this work is twofold.
First, we demonstrate how, through its implementation, the O&M Domain Ontology can support typical and real-life asset management tasks for utility networks. We show how systematic registration of maintenance history and performance can support asset owners in their planning of work. Further, we show how data needs -such as the characteristics of the surrounding soil, the presence of groundwater, the depth of the components, and the means to identify them such as their colour for the on-site execution of construction work can be retrieved from the O&M database.
Second, this study performed a topological analysis enabled by the underlying topological module of the Utility Network ADE. It demonstrates how, besides geographical information, also topological information is relevant for the planning of operational construction work by showing that a routing analysis enables the identification of the nearest valves that restrict commodity flows to a specific component.
In addition, large quantities of data are needed for supporting network operation and maintenance. Even though the O&M Domain Ontology already considers these data, our case study shows that the data present in the existing datasets were not sufficient to perform all O&M-based queries. Some of the O&Mrelated attribute data used in this study was assumed due to the current lack of information held by C&FM. Further, there are various alternative use cases, such as those related to impact and risk data retrieval, that could also have been tested. So, although this study is a proof of concept that shows how the O&M-model enables faster and systematic generation of critical information for maintenance, and construction work planning, it remains critical for practitioners to fill databases with this information. The O&M-model implemented in this study provides the guidance to how and what type of information asset owners should additionally collect and store.
The experience made so far allows us to recommend two more practical steps towards the implementation of the O&M-model.
First, we suggest that practitioners implement and iteratively refine the classes in the current O&M-model by using the outlined implementation processes in this study. Second, we suggest that asset managers define their AM-decision needs specifically, so that we can better identify how the database can support these decisions. A Decision Support System could be finally developed to automatically use data from the database to automate tasks and decision processes. One possible approach to this would be to perform DB-queries and compare queried output with defined performance level thresholds to identify components that need maintenance or replacement.
Further refinement and testing (by means of more use cases) of the O&M-model would be helpful to strengthen claims about its utility. One other relevant step would be to compare this model with others regarding its ability to represent information about the surrounding environment of the networks. One comparable model could be the Model for Underground Data Definition and Integration (MUDDI) (Liebermann, 2019), however little detailed information is available at the time of writing on how MUDDI plans to represent surrounding soil characteristics such that they support utility asset management meaningfully. Nevertheless, a sound comparison seems one of the most reasonable next steps to take.

CONCLUSIONS
The aim of the project was to test the support for asset management enabled by the O&M Domain Ontology and to encode it into a database. The test dataset contained real data of two utility networks, as well as dummy attribute values for maintenance, performance, and surrounding soil information.
We implemented a free and open-source database encoding of the O&M Domain Ontology data model, which contributes to overcoming some limits of closed-source and domain-specific existing standards. The tests showed that the O&M-model not only enables the representation of asset management related concepts but also helps identifying the missing data that needs to be collected.
C&FM requested a decision support tool to help them in defining what to do with the existing ageing utilities at campus when they are uncovered during street reconstructions, among other uses. During the simulation of the street reconstruction we showed how the database can provide information to support the decision process in the form of maintenance and performance history of components. Moreover, the O&M-model and its database also allow to represent the location and characterization data in the Dutch IMKL data model which is used for exchanging utilities' information via a (compulsory) KLIC-request. Thus, the proposed system provides utility owners with functionalities that are required by law in the Netherlands.
Overall, this study contributes to the ongoing developments in data-driven management of the underground space by showing a utility network owner's information requirements and a proposed technical solution.

RESOURCES
All necessary data to recreate this work are available online and comprise: the raw data from campus, the UML class diagram and the XSD file of the data model, the database configuration files, the database diagrams, the FME workbench, and the Python scripts to insert dummy data into the database (Fossatti, 2020).