Using a space filling curve approach for the management of dynamic point clouds

: Point cloud usage has increased over the years. The development of low-cost sensors makes it now possible to acquire frequent point cloud measurements on a short time period (day, hour, second). Based on the requirements coming from the coastal monitoring domain, we have developed, implemented and benchmarked a spatio-temporal point cloud data management solution. For this reason, we make use of the ﬂat model approach (one point per row) in an Index Organised Table within a RDBMS and an improved spatio-temporal organisation using a Space Filling Curve approach. Two variants coming from two extremes of the space - time continuum are also taken into account, along with two treatments of the z dimension: as attribute or as part of the space ﬁlling curve. Through executing a benchmark we elaborate on the performance -loading and querying time-, and storage required by those different approaches. Finally, we validate the correctness and suitability of our method, through an out-of-the-box way of managing dynamic point clouds.


INTRODUCTION
Since the introduction of the LIDAR technology in the 1960s, the volume of point cloud data has seen a rapid increase and it is anticipated that it will continue to increase exponentially in the years to follow.This growth is mainly the result of the developments in the point cloud acquisition technologies, the most important of which are: terrestrial and airborne laser scanning, mobile mapping, multi-beam echo-sound techniques (Otepka et al., 2013).
Over the last years many easy-to-use and inexpensive sensors mounted in mobile devices have become widely available.Examples of these devices are: Microsoft's Kinect sensor (Izadi et al., 2011), Google's Project Tango (Schöps et al., 2015), Structure from Motion (SfM) techniques (Westoby et al., 2012), etc.The advent of these technologies has allowed repeated scans of the same area on a regular basis, leading to massive spatio-temporal point clouds having both very high spatial and temporal resolution.However, the management and querying of these massive point clouds is a challenge (van Oosterom et al., 2015).The reason for this is the generally unstructured nature of the points (compared to raster data) and the multiple attributes that can be attached to them.Depending on the acquisition technique, point clouds can contain: time information, intensity, return number, number of returns, classification, colour etc.These attributes can be present in different combinations making even simple storage and selections non trivial.
In the majority of today's applications point clouds are managed using file solutions.Typical file-based solutions include desktop applications (usually vendor-specific) and command-line executables, like Rapidlasso's LasTools (mixed -source) or the Point Cloud Abstraction Library (PDAL) (an open -source project).Within these solutions, the work-flow includes reading one or more files, processing the data and writing files back to the user.
The database community, commercial and open source, provides point cloud specific data structures.In particular, Oracle (Spa-tial and Graph) and PostgreSQL (PostGIS) follow a similar organisation technique for point cloud data.Their storage model is based on the physical reorganisation of the data into groups of spatially close points, called blocks (Ravada et al., 2010;Ramsey, 2014) and provides efficient management and query response times (van Oosterom et al., 2015;Cura et al., 2015).However, the data structures available in the Database Management Systems (DBMS), are not designed for applications with dynamic nature.This means that they consider point clouds as static objects, not including time as part of the organisation.This is a very important limitation as for specific applications time is as selective as the spatial component or needed in integrated space -time selections (change detection).These requirements, suggest that storing time as an attribute does not offer efficient query response times.Finally, the structures do not scale good with the accumulation of time dependent data.With such voluminous data, performance -in terms of loading and query time, as well as storage-is very important.
In this paper we investigate how effective time-varying pointclouds can be stored and queried in a relational DBMS.More specifically, we investigate different options of using a Space Filling Curve (SFC) to capture both time and space in one efficient data-containing index.
of point clouds (Wijga-Hoefsloot, 2012;Höfle et al., 2006).Later on, it was argued that these approaches present significant drawbacks in terms of storage overhead, memory intensive operations and difficult updates and insertions (Ott, 2012).
The database community currently provides specific data structures suitable for the the management of point clouds.But apart of the blocked model available in Oracle and PostGIS, a second organisation is also possible.This is the flat model, where each point is stored separately in one row.This model is easy to be implemented in all database systems (van Oosterom et al., 2015), relational or not (Martinez-Rubi et al., 2014).Compared to each other, the block-based approach provides better scalability, less overhead per point and potentially good compression (van Oosterom et al., 2015) that is directly related to the block size.However, blocks are less efficient in terms of updating, and further insertions of points lead to overlapping blocks.This is not ideal when managing dynamic point clouds, as within the current structures the indexes are used independent from each other and the query optimiser will first filter on space and then on time.On the other hand, the flat model is easier with updates and insertions.An improved organisation of the flat model uses spatial clustering techniques and specifically SFCs.SFCs have the ability to cluster points close in reality, close on the curve.The improved organisation is introduced in van Oosterom et al. (2015) and extended in Martinez-Rubi et al. (2015) for the two spatial dimensions.

Spatio-temporal point cloud management
The organisation of point clouds has in its majority been focused only on the spatial dimensions of the points.This means that time is stored as an attribute, not taking part in the organisation of the points into blocks or into the SFC.However, point clouds are used for spatio-temporal analysis and therefore, an integrated space and time approach is needed when organising the points.Integrating space with time is, nonetheless, not an easy task.The challenge lies in the different semantics and nature of the two concepts.In the design phase, two aspects should always be taken into account: 1. the resolution of time, meaning how the line of time is partitioned.2. the granularity of time, meaning at which level of the spatial phenomenon (point data) the time dimension is added.For example, time can be attached to the whole or subset of a dataset, or can be part of each spatial object (point).
Implementations of spatio -temporal point cloud databases can be found in the academia.More specifically, Fox et al. (2013) used a NoSQL database for managing time dependent point data.Their implementation is based on interleaving parts of the geohash, with parts of the string representation of time.A geohash is an implementation of the Morton SFC, ultimately generating a recursive quadtree of the world.The derived string is used as key for indexing the point data lexicographically.One disadvantage of the method is that it is very platform specific as it accommodates the column key requirements of the Accumulo database (key value store).At the same time, however, the system can provide efficient insert and update operations.Tian et al. (2015), following also a clustering approach, interleave the bits of the x,y and time dimensions to derive the Morton key which is later used to create a 3D Morton R-Tree inside a relational database.Their developed prototype gives efficient query response times, also under concurrent queries.However, their approach lacks the ability to further insert new data.Finally, a different approach is followed by Richter and Döllner (2014).Their developed system handles point cloud updates by using an "incremental database update process".For this they use change detection techniques to determine which parts of the new point cloud have changed since the previous state.Only the changed parts are then being inserted into the database.Their implementation although has reduced storage requirements and allows efficient change detection from one moment to another, makes it very hard to restore what happened at a specific moment as a number of change entries have to be applied to the initial state.
From the literature it becomes clear that a SFC approach is a logical way to proceed when managing dynamic point clouds.However, different options need to be investigated to find an optimal solution; specifically treating time (and z) as dimensions used in the SFC computation (or not) and the scaling of space and time.

METHODOLOGY
As explained in Section 2.1, the flat storage model is a very flexible solution for the management of point clouds, dynamic or not.The method can either be used as a final storage model or as an intermediate stage in order to efficiently create blocks.However, being able to efficiently ask questions to the database is directly proportional to the data structure and access methods used.Therefore, a clustering technique, and more specifically a SFC can be used for the efficient sorting of the points.A SFC has the ability to apply a linear ordering to a multi-dimensional domain.Many SFCs have been developed through the decades, all of which preserve a different degree of proximity in the data.Two very commonly used orderings are the Morton and the Hilbert curve.The curves are respectively shown in Figure 1a and 1b for the 2D case.Both curves have the characteristic of being Quadrant recursive, which indicates that the cells in any sub-quadrant have consecutive SFC values.SFCs, which can be extended to the nD space, have been proven to be very relevant for multidimensional storage (Lawder, 2000).A 3D implementation of the Morton curve is given in Figure 1c.One important advantage of utilising SFCs is that the derived one dimensional values can be indexed using a B-Tree.

Storage Model
Diverging from the blocked models as commonly used by the DBMS, we explore the flat model with an improved spatio -temporal organisation.For this we use an integrated space and time approach and a SFC, the Morton curve, for the organisation of the points.Instead of using a (heap) table with a B-Tree index, we use an Index Organised Table (IOT) which avoids storing a large, separate index, thus not requiring to perform a join during query execution (between index and data).
The Morton curve (also called Z-order or N-order curve) is based on interleaving the bits of the binary representation of the ncoordinates.For the SFC calculation, we define the curve for the full resolution of the point cloud domain.This allows us to avoid storing the x, y (,z) and t values, since those can be recovered back from the key.The above storage model leads to significantly less disk space but requires a decoding function that can recover the original dimensions.
One very important aspect when using SFCs is keeping in mind that they are based on hyper cubes and that all dimensions present in the curve should have the same cardinality i.e. be of the same size.For the spatial components that have the same nature, this assumption is not detrimental to assume.However, time has a different nature compared to the spatial components; it is measured in years, months, days, hours, minutes or seconds.Space, on the other hand, is measured in degrees, meters, centimetres or millimetres.The correspondence of those two (the relative scaling) can be considered as the factor of how much time is integrated with how much space.This integration should, nonetheless, be constrained by the fact that additional data need to be added into the database.Therefore, space on the hypercube should be reserved for new data to be added in the future.
Structuring space and time to support dynamic point clouds is not a trivial problem.The reason for this is that two contradictory requirements need to be taken into account; (1) points close in space and time should be stored close together (clustered) for fast spatio -temporal retrieval, but, (2) in a way that already organised data is not affected by new data (to achieve fast loading).The clash of these two requirements takes place when adding new data.The new points, in order to preserve space-time locality, will have to be inserted between the already stored points.As a result, the data already organised will have to be moved which is a very costly operation.For this reason, two integrations of space and time are explored: the integrated space and time approach, where space and time have an equal part in the Morton key calculation (Figure 2a) and, the non -integrated space and time approach, where time dominates over space and is stored as a separate column (Figure 2b).The two options actually represent two extremes of a continuum which is achieved by appropriately scaling the time dimension relative to space.

Loading procedure
The loading procedure of our methodology is divided into two phases.The data are physically structured according to their position on the space filling curve and organised with a data containing B-Tree.The steps followed are: 1. Preparation: the data are read from the files and converted to the Morton keys.This conversion depends on the type of integration of space and time, the dimensions used in the Morton key calculation and the scaling of time.The data are bulk loaded into a normal heap table.2. Loading: The data is read from the heap table, sorted based on the key and stored in the IOT.In this way, the index is created once, which will assure that the data are clustered.An incremental approach can replace this step.Incremental means that the data are added in batches and the index will be reorganised with every batch.

Query procedure
Since our use case (presented Section 5.1) comes from the coastal monitoring domain, we carried out a research of the most important queries used in those applications (de Boer et al., 2012;Lodder and van Geer, 2012), which are (Figure 3): (1) Space only queries, that request all the spatio -temporal objects located in a specific area.(2) Space-time range queries, that request all the spatio-temporal objects located in a specific area during a specific time range.(3) Time only queries, that request all the spatiotemporal objects existing during a specific time range.This means that the query geometry needs to be translated into a number of continuous runs on the curve.Because all the abovementioned queries correspond to a kind of multi -dimensional range query, within our method, we make use of the relationship between the Morton curve and the Quadtree ( van Oosterom and Vijlbrief, 1996;Gargantini, 1982) or 2 n trees for higher dimensions.The maximum depth of the tree affects, (1) the number of Morton ranges that compose the query, and (2) the approximation of the query geometry.Only requesting higher levels will give coarser 2 n tree cells, resulting in additional points.The query procedure used is as follows: 1. Filtering: The 2 n tree cells that intersect with the query region, up to a specific depth, are identified (Figure 4a).Note the mixture of big and small ranges returned, with the smaller ones located mostly near the boundary.The cells are then translated into the equivalent Morton ranges and the neighbouring ranges on the curve are merged (Figure 4b).Note that the direct neighbour merging can either have no effect on the cells or create non-rectangular ones.Unless differently defined, the ranges are further merged in case they exceed a specified maximum amount.Merging of non-direct neighbours will always result in additional tree cell space added to the original situation.The returned ranges are used for fetching the data.The result is an approximation of the query being asked because the ranges are not formed from the finest 2 n tree cells.Two examples with a different degree of merging (30 and 20) are present in Figure 4c and 4d respectively.The number of the tree cells after the direct neighbour merging is 42.With the application of the merging one can observe two things: First, that the 2 n tree approximation is very accurate, especially when compared to the commonly used Minimum Bounding Rectangle.Second, when applying a merging there is a certain loss in the accuracy of the approximation (Figure 4e and 4f) but, also a gain when the number of ranges is a bottleneck.2. Decoding and storing: The previous result is decoded back to the original x, y, z and time dimensions and stored temporarily in a table.This is needed because there are not yet functions inside the database to perform the decoding on-the-fly.For the implementation of our methodology we have developed sets of Python scripts that perform the loading and querying procedures according to the specified parameters.The source code can be found at: https://github.com/stpsomad/DynamicPCDMS.Selected parts of the loading scripts along with some explanations can be found in the Appendix of this paper.

Database selection
The system chosen for the validation of the methodology is the Oracle Database.The version that was used for the tests is the Oracle Database 12c Enterprise Edition Release 12.1.0.1.2-64 bit Production.The system is chosen because of the availability of the IOT.With an IOT, the data itself is stored in a B-Tree index structure and physically clustered.This means that contrary to the usual way, IOTs do not store the table and the index separately.
Another reason for choosing the Oracle database was that the full Morton keys can very easily become larger than 64 bit integers.The Oracle database includes the NUMBER type that can handle numbers up to 38 decimal digits, enough for 128 bit keys.

The representation of space and time
One of the issues faced when using time in general and inside SFCs specifically, is its unique nature.Time is usually represented with the date format inside the database.However, integers can be sorted more efficiently, a characteristic that is very important for our methodology.Furthermore, SFCs are implemented with integers.Therefore, both space and time need to be converted to and represented by integer values.
For space this issue is solved by applying a linear transformation to the spatial coordinates (translation and scale).In order to represent time as an integer many different ways can be identified that correspond to different time resolutions (seconds, days, years).We can express time simply as an integer of format yyyymmdd for day or yyyy for year resolution.This expression, however, as the resolution becomes finer, leads to time gaps.Another option is to use the Unix time that gives the the number of seconds since 00:00:00, 1/1/1970.This option can be very useful for datasets that are streamed every second, but is very verbose for day or year resolution.For day resolution we can chose to store the days since a specific epoch, e.g.1/1/1990.

Hardware and Software
For the tests described in this document we have used a server with the following details: HP DL380p Gen8 server with 2 x 8core Intel Xeon processors, at 2.9 GHz, 128 GB of main memory, and a RHEL 6 operating system.The disk storage which is directly attached consists of a 400 GB SSD, 5 TB SAS 15K rpm (\work) in RAID 5 configuration, and 2 x 41 TB SATA 7200 rpm in RAID 5 configuration (\pak1 and \pak2 respectively).To minimise mixed read/ write operations on the same disk, especially during the loading procedure, we have distributed our data files and database tables over different disks.Within the DBMS this is achieved using different tablespaces.The chosen distribution is available in Table 1.The decision was made as follows: The data are stored in the /pak2 file system.To avoid disk contention, the /pak2 file system (INDX tablespace) should not be used for loading the data into the heap table.Therefore, the USERS tablespace is used (=/pak1).For the creation of the IOT data is read from USERS tablespace, and sorting will take place in the TEMP (=/work) tablespace.To avoid disk contention in the final stages of this phase (the writing of the resulting IOT) the INDX tablespace is used for storing the IOT.

Loading
In order to test the performance of our storage model, we designed and executed a benchmark.The benchmark is designed to measure the performance in terms of storage space, loading times and query response times.In addition to that, it includes the test datasets used and the description of the queries both in geometry and time.
The data (provided by Deltares) originate from the Sand Engine use case, 21 million cubic metres of sand deposited at the coast of the province of South Holland in the Netherlands.The region has a size of 4.5 x 4.5 km.The purpose is to investigate how nature spreads this amount of sand along the coast as the years go by.For this reason, the area is measured at irregular periods (mostly after the occurrence of storms).So, although it is not measured every single day, the time resolution will be in days in order to offer the best possible organisation.In Table 2 we present the details of the benchmark stages.The data itself is available as a set of LAZ files, with approximately 100,000 points per file.It is important to mention that the spatial extent remains more or less the same, while the time extent is increasing.In addition to that, years 2000 to 2011 are artificially created from the subsequent measurements.With this type of dataset we aim to compare the scaling in size of the stored data and the effect of adding new temporal data in batches between the benchmarks, as well as, the query response times.In addition to comparing our storage model between the three benchmark sizes, the two integrations of space and time have to be compared with each other.In addition, for both the integrated and non-integrated approach, the z dimension can be part of the SFC value (z added) or not (z attribute).This leads to 4 different loading options, for each one of which the benchmark is repeated as mentioned previously.Finally, as mentioned in Section 3.1, SFCs are defined on hypercubes and thus the relative scaling between the space and time dimensions needs to be defined.Scaling, however, only makes sense for the integrated approach.In our implemented system, the user can choose to implement different degrees of integration, from a complete to a less deep integration.
All the tests are carried out in order to gain insight into which organisation is the most optimal.The notation used throughout the tables in the next subsection is: xy for the non-integrated with z as attribute, xyz for the non-integrated with z in key, xyt for the integrated with z as attribute, xyzt for the integrated with z in key.

Queries
The queries which are executed are described in detail in Table 3.
Type is the type of query as defined in Section 3.3.Start and End are respectively the start and end date requested for retrieval.The time type, indicates whether the previously mentioned start and end date are continuous (i.e. between start date and end date) or discrete (i.e.only start date and end date).Finally, the area gives an indication of the space covered.The spatial representation of the queries used within our benchmark is shown in Figure 5.
For testing purposes we query areas of different geometry like rectangle, polygon, line with buffer and point with buffer.We also test different sizes of the those geometries.This will give us an insight as to whether certain geometries behave differently using the same storage model.Table 3: The description of the queries.In type: s-t stands for space -time, t for time and s for space.In time type: c stands for continuous and d for discrete.

BENCHMARK RESULTS
In this section we present the results of loading the datasets and performing the above-mentioned queries according to the three benchmark stages, two integrations of space and time and two treatments of z.Space is encoded in mm and time in days since 1/1/1990.It must be noted that, when testing the performance of the various scalings of time for the integrated approach, we realised that the scaling of 10,000 gave the best performance for this specific use case.Such a scaling means that for a certain day, the area grouped close in disk is 10m by 10m.In the following tests only the results of this scaling are presented.

Loading results
During the benchmark, the files are processed one by one both for their conversion to the Morton key and the loading to the heap.
To take into consideration the growing nature of our scenario, the medium and large benchmarks do not include a fresh reloading of the previous stage, but the new data is added to the already stored points.For our benchmark execution each approach is tested separately from the others (until both loading and querying are completed).
In that the code that we use so far is non-optimised Python code.Also, it is easy to see that from approach to approach (xy → xyz or xyt → xyzt) the conversion gets more costly.This can be explained because the complexity of the algorithm increases with the addition of more dimensions.The loading inside the heap tables is more or less in the same magnitude for the four approaches.As for the loading in the IOT, the treatment of z as an attribute seems to be more expensive in terms of time.This may be because one more column needs to be organised (compared to the treatment of z as part of the Morton key).Finally, comparing the storage requirements of the different approaches, it is easy to see that the treatment of z as an attribute requires in general more space.However, the differences are not big.Also, using a separate attribute for the time or integrating it in the key appears not to influence the storage.

Query results
The queries introduced in Section 5.2 are executed for each of the four storage organisations for all three data sizes (12 combinations).For all the queries we run both cold and hot runs.In contrast to the cold run, the hot run means that the query has already been executed and caching takes place.Within our benchmark each query is repeated 6 times before moving to the next one.The execution order of the queries (as presented in Table 3) is as follows: ST-A, ST-B, S-A, ST-C, T-A, ST-D, S-B, S-C, ST-E, T-B, S-D, ST-F.For the results presented here we ignore the most and least expensive response and calculate the average from the rest.As a result, the presented number correspond to hot runs.
Only the first fetching (filtering) of the data (along with the number of points in the refinement stage and the percentage of the extra points obtained between the two querying steps) is given in the following table.This step is the most crucial because it is directly related to the depth of the 2 n tree and the maximum number of ranges specified (degree of merging).The rest of the steps can be optimised further and are, therefore, of secondary importance.Finally, it is very important to mention that two different methods of posing the queries are used between the two integrations.In the integrated approach we load the ranges into a separate IOT and perform a join to fetch the data.This method, however, does not provide efficient response times for the non-integrated approach (where the index is composed out of 2 columns) and as a result the keys are specified in the WHERE predicate (see A.2.2).But because there is a limit to the number of ranges we can ask when using a SQL query, a limit of 200 ranges is applied.This is not the case for the integrated approach where in theory there are no limits when performing joins.Nevertheless, a number of 1 million ranges is set as maximum for practical reasons.
The query results (filter step) for all integrations of space and time, treatments of z and benchmark stages can be found in Table 5.All queries are executed using only one process (no parallel).This is based upon the observation that, in certain test cases, although parallelisation in Oracle was enabled, during query execution one core was actually being used.Only queries that do not use the primary key ([time, ]Morton) to fetch the data i.e. space queries in the non-integrated approach seem to enable the parallel option.However, for consistency within our results, no parallelisation is enabled.Table 5: Query response times, the percentage of false hits compared to the actual number of points and the points returned by the queries.
• In general rectangles and polygons have faster response times in the non-integrated approach.
• In the non-integrated approach there are some differences in the response times between the two treatments of z, especially for rectangles and polygons.Adding z in the key without using it slows down the query execution.
• In the integrated approach having z as an attribute or as part of the key does not have a big effect on the execution time of the data.
• Response times of time queries are of the same magnitude per treatment of z.
• Clearly the results present good scalability (constant response times) for the benchmark stages used, since doubling the size of the data does not affect the query execution time.However, we must keep in mind that the specific use case is not massive and therefore not a good indicator for scalability.For this reason, we also tested our proposed methodology with a dataset of 2 billion points.The results (not presented here, but in a soon to be published MSc thesis) confirm the constant scalability of the method.
• Comparing the % of extra points received from the filtering step between the four approaches, we can observe that line queries receive the most extra points.Further, adding z in the key comes at the cost of more extra points.This can be solved by moving deeper in the 2 n tree, which requires a more dynamic algorithm for identifying the maximum depth.Clearly the non-integrated approach presents the largest amount of extra points mostly because of the way that the query is posed (WHERE clause with 200 maximum ranges).
Moving on to space queries, it is important to specify that they are different from space -time and time queries in that the number of points is increasing between the benchmarks, simply because more data is added.From the results we can observe that space queries perform better in the integrated approach while the worst case is for the non-integrated when z is part of the key.Comparing the percentage of extra points we can realise that there is not a distinct pattern (i.e.line buffers do not necessarily give more extra points).However, in the integrated approach and specifically the large benchmark the number of extra points doubles.Although not present in this table, the number of ranges with which the join is performed is less than in the small and medium benchmark.Again, this can be resolved by developing an algorithm that identifies the maximum depth in a more dynamic way.
During our tests we have also investigated the effect of the parameters: (1) depth of the tree and (2) the maximum number of ranges used.For this we present Figure 6 which shows the effect of going deeper in the 2 n tree on the number of extra points received for the three different types of queries in the integrated approach.It is clear that using more ranges, significantly decreases the number of extra points.
Finally, to show the reason why we limit to 200 ranges in the WHERE clause of the non-integrated approach, we present Table 6.For this, we use the same ranges in order to derive a maximum of 10, 100, 1000 and 10,000 for posing the SQL statement.
There it is clear that adding more ranges in the SQL statement significantly slows down the fetching of the data, especially for space queries.However, increasing the number of ranges decreases the percentage of extra points received during the filtering phase.To have a balance between time and extra points, the maximum number of 200 is chosen.

Validation and comparison
To have a kind of validation that our implemented prototype returns the right amount of points and for comparison purposes, the out-of-the-box approach of using Oracle spatial and date data types is implemented.For this we use a 3-D SDO POINT and a 2D R-Tree for fast spatial access.To, also, be able to have fast access in the time dimension, a B-Tree index is built on the time column.The benchmark defined before, is executed for this case as well.An overview of the SQL codes used can be found in A.3 and A.4.
The results of the loading procedure are presented in Table 7 in terms of time and, Table 8 in terms of storage requirements.The same incremental loading as before is used.By comparing them with any of the proposed storage models, we can see that the total execution time of the loading procedure in the validation case is 3 to 6 time more expensive.The process is mostly affected by the R-Tree generation, while building the B-Tree index is the least expensive operation.Moving to the storage requirements, the Oracle spatial approach requires 5 times more space when compared to the proposed storage model with the highest storage requirements.
The query procedure is executed with the same configurations as the proposed methodology.Each query is executed 6 times and the results presented in process is transparent to the user.For this reason, the results presented in this Table 9 represent the total query execution time (both filter and refinement step).Note that our implemented prototype studies only the fetching of the filter step.With a close inspection we can confirm that our implemented prototype is indeed retuning the correct amount of points.In addition to that, our implemented prototype gives more expensive response times, when considering all steps in the query procedure (not shown here).This has to do with the type of programming language used (Python) for such intensive operations and the movement of data between the application and the database.Both parts can be considerably be improved (see Future work).However, this "naive" approach does not provide constant execution times between the benchmark stages, mostly because of the presence of two indexes and the nature of the 2D R-Tree.This is a crucial observation that makes the method not suitable for managing dynamic point clouds.

Conclusion
In this paper we have presented the design and execution of a benchmark appropriate for the data management of dynamic point clouds.We have investigated two integrations of space and time, that are essentially two extremes of the space -time continuum.
Within this, we have also tested two treatments of the z dimension: using it as an attribute, or as part of the SFC calculation.
We have, also, validated and compared our method with the outof-the-box approach of using POINT and date data types.The ultimate goal is to investigate the most appropriate structure for managing time evolving point clouds.For this we have considered a use case coming from coastal monitoring domain.All the developed code can be found at https://github.com/stpsomad/DynamicPCDMS.
The main findings from our work is that the integrated approach has in general better query response times compared to the nonintegrated.Both treatments of z are also appropriate for the specific use case.Further, in the non-integrated approach having z as part of the key significantly slows down query execution and increases the number of extra points received.A key aspect in our solution is the use of the IOT.

Future work
For the future, several issues need to be investigated and addressed.These include: • implementing functionality inside the database (encoding, decoding SFC, range generation) or using a compiled language (e.g.https://github.com/kwan2004/SFCLib).The former would minimise the data movement now taking place between the database and the application during query execution.With the latter C++ library our preliminary results for the loading phase, show significant improvement (6 times faster) during the SFC preparation phase of the xyzt case (see • investigating parallel processing in all of the steps.
• using even more massive point cloud data for executing benchmarks.
• investigating the value of higher dimensional SFC keys, meaning what is the benefit of including more dimensions in the key e.g.Level of Detail, colour etc.
• investigating delta (change detection) queries.Delta queries are very important for coastal applications monitoring change.
• using a different SFC.For this, comparisons between the Hilbert and Morton curve are the most appropriate, since the former is considered to have higher clustering capabilities.
Investigating the number of ranges during the query process would give insight into this, as well as to whether there is a price to pay for the better clustering during encoding or decoding.
• improving the refinement step.Although the refinement stage is not examined within this paper, we believe that it would be significantly aided by following a different approach after the filter stage.The approach includes having two sets of ranges: 1. completely inside the query geometry, that are This contribution has been peer-reviewed.The double-blind peer-review was conducted on the basis of the full paper.doi:10.5194/isprs-annals-IV-2-W1-107-2016 for sure part of the final points and do not need to be further refined (Figure 7, white cells) and, 2. ranges partly inside the query geometry that need to go to the refinement stage (Figure 7, grey cells).
• investigating the creation of blocks using the same space and time integrations will allow more efficient storage and compression.Within this, researching the degree of overlapping/ non-overlapping blocks and the percentage of full/ underfull blocks when dealing with time evolving point clouds would add insight to to the current point cloud data management storage models.A.2.2 Space only queries in the non-integrated approach filter on the Morton keys during the filtering step.Here, to avoid congestion in the paper we represent only 2 Morton ranges out of the 87 that were needed for this space query.This particular query requests all the time information for a polygonal geometry.In the integrated approach the same JOIN procedure as with time queries is used.
Because no functions are available inside the database to perform the decoding of the Morton keys, this procedure is performed

Figure 1 :
Figure 1: The Morton and Hilbert space filling curves.

Figure 2 :
Figure 2: The two integrations of space and time.

Figure 3 :
Figure 3: An overview of the important queriesMultidimensional selections using SFCs require a modified query algorithm that takes into account the space filling organisation.This means that the query geometry needs to be translated into a number of continuous runs on the curve.Because all the abovementioned queries correspond to a kind of multi -dimensional range query, within our method, we make use of the relationship between the Morton curve and the Quadtree (van Oosterom and  Vijlbrief, 1996;Gargantini, 1982) or 2 n trees for higher dimensions.The maximum depth of the tree affects, (1) the number of Morton ranges that compose the query, and (2) the approximation of the query geometry.Only requesting higher levels will give coarser 2 n tree cells, resulting in additional points.The query procedure used is as follows: 1. Filtering: The 2 n tree cells that intersect with the query region, up to a specific depth, are identified (Figure4a).Note the mixture of big and small ranges returned, with the smaller ones located mostly near the boundary.The cells are then translated into the equivalent Morton ranges and the neighbouring ranges on the curve are merged (Figure4b).Note that the direct neighbour merging can either have no effect on the cells or create non-rectangular ones.Unless differently defined, the ranges are further merged in case they exceed a specified maximum amount.Merging of non-direct neighbours will always result in additional tree cell space added to the original situation.The returned ranges are used for fetching the data.The result is an approximation of the query being asked because the ranges are not formed from the finest 2 n tree cells.Two examples with a different degree of merging (30 and 20) are present in Figure4cand 4d respectively.The number of the tree cells after the direct neighbour merging is 42.With the application of the merging one can observe two things: First, that the 2 n tree approximation is very accurate, especially when compared to the commonly used Minimum Bounding Rectangle.Second, when applying a merging there is a certain loss in the accuracy of the approximation (Figure4e and 4f) but, also a gain when the number of ranges is a bottleneck.2. Decoding and storing: The previous result is decoded back to the original x, y, z and time dimensions and stored temporarily in a table.This is needed because there are not yet functions inside the database to perform the decoding on-the-fly.3. Refinement: The final query result is obtained by performing a point-in-polygon operation or filtering out time and z.

Figure 4 :
Figure 4: The different steps in the preparation of the filter step: Tree cell identification, direct neighbour merging and, merging to maximum number.Cases (c) and (d) depict different degrees of merging applied to the tree cells of case (b).The expansion of the area according to the two degrees of merging (30 and 20) is shown in cases (e) and (f) respectively

Figure 6 :
Figure6: The effect of increasing the depth of the tree on the percentage (%) of extra points obtained during the filtering phase of the integrated approach.The x axis is logarithmic.

Figure 7 :
Figure 7: Separating internal ranges and ranges on boundary.White cells are inside the query region.Grey cells are partially inside the query region.

Table 1 :
The distribution of files and tables in the available disks according to the specific purpose.ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume IV-2/W1, 2016 11th 3D Geoinfo Conference, 20-21 October 2016, Athens, Greece This contribution has been peer-reviewed.The double-blind peer-review was conducted on the basis of the full paper.doi:10.5194/isprs-annals-IV-2-W1-107-2016

Table 2 :
Benchmark stages of the Sand Engine use case.The size represents the size of the LAZ files in the filesystem.

Table 4 :
The incremental loading times for the two integrations of space and time and the two treatments of z.
Table 4 the loading times are presented.No parallel processing is present in any of the steps.From the table it is easy to see that, the conversion to the Morton key (SFC prep.) is in general the most expensive step in the procedure.The reason for this isFigure 5: The spatial representation of the queries used within the benchmark

Table 6 :
Using different magnitudes of maximum ranges in the WHERE statement of the non-integrated approach.

Table 7 :
Table9are calculated by excluding the most and least expensive response times from the average.Because of the existence of the R-Tree index, the Oracle database internally follows a two step approach during query execution.However, contrary to the implemented methodology, this two step The incremental loading times of the validation.Preparation refers to the reading of the LAZ files and their transformation to suitable representations for loading.

Table 8 :
The storage requirements of the validation.

Table 9 :
The query response times and the number of points returned by the queries of the validation benchmark.table when querying).Our final conclusion is that as the flat table model is indeed a very flexible solution for managing dynamic point clouds.
This storage structure significantly reduces I/O operations as the data are contained in the index (compact and no effort/time to combine index and data from

Table 10 :
Table 10).Using C++ code for the SFC transformation ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume IV-2/W1, 2016 11th 3D Geoinfo Conference, 20-21 October 2016, Athens, Greece on-line code.The setup for the treatment of z as key is different only in the absence of the z column.Starting with the preparation phase, we create the heap table where the data will be stored temporarily.For the bulk loading of the data we use the SQLLDR utility of the Oracle database.SQLLDR requires the creation of a set of files, including the control file which specifies how to load the data.The conversion from the files to the Morton keys is initialised by our developed Morton converter.The conversion is then pipelined with the SQLLDR utility (command line).After the data have been inserted, unsorted, into the table, it is possible to proceed to the loading phase.The heap table is used to populate the IOT using the following command:In case more data need to be added into the database, the new data are inserted into a heap table and then the two sources (old and new data) are combined together: Within our python scripts we have implemented a work-flow that maps the spatio-temporal queries to Morton ranges, time or height predicates.The procedure depends on the specified integration and type of query.Again the SQL codes are given for the nonintegrated and integrated approaches (respectively) for the treatment of z as attribute.The setup for the treatment of z as a key is sightly different.The queries are loaded into a table called QUERIES.When a certain query needs to be executed the implemented scripts read the required information from the QUERIES table.The table contains the following information: 1. query ID, 2. dataset, 3. type of query, 4. geometry, 5. start and end date, 6. minimum and maximum height (if any).A.2.1 Time only queries in the non-integrated approach do not require the identification of Morton keys.This is because time is stored as a separate column.The first filtering gives directly the refined points.This query requests points between two moments in time.These are then decoded back to their original dimensions and stored in a table.SELECT time, morton, z FROM IOT xy WHERE (time BETWEEN 4681 AND 4682);In the integrated approach time queries follow the two step query process.First, the Morton ranges are loaded into an IOT named RANGES.The data table and the RANGES table are joined.This concludes the filtering step.The data are decoded to their original coordinates and stored into a table.This tables is then used to proceed to the refinement step, by imposing predicates on the time dimension.