A VOMR-TREE BASED PARALLEL RANGE QUERY METHOD ON DISTRIBUTED SPATIAL DATABASE

: Spatial index impacts upon the efficiency of spatial query seriously in distributed spatial database. In this paper, we introduce a parallel spatial range query algorithm, based on VoMR-tree index, which incorporates Voronoi diagrams into MR-tree, benefiting from the nearest neighbors. We ﬁrst augments MR-tree to store the nearest neighbors and constructs the VoMR-tree index by Voronoi diagram. We then propose a novel range query algorithm based on VoMR-tree index. In processing a range query, we discuss the data partition method so that we can improve the efficiency by parallelization in distributed database. Just then a verification strategy is promoted. We show the superiority of the proposed method by extensive experiments using data sets of various sizes. The experimental results reveal that the proposed method improves the performance of range query processing up to three times in comparison with the widely-used R-tree variants.


INTRODUCTION
Distributed spatial database technology with the advantages of powerful data management capability, expansibility and low location constraints, dominate an important part of geodatabase applications.It improves data process efficiency by separate data into pieces (Yang C, 2009a).However, the following problem in the spatial query of distributed geodatabase is generally occurring.
(i) Spatial query relies on spatial relationship of the entities.A part of topological relation information will be lost owing to the distributed storage.(ii) It takes too much hardware resources and time when the mass data is processed by centralization calculation after a complex transmission.(iii) The spatial index of distributed geodatabse cause a large number of data redundancy so that processing efficiency will be reduced.Consequently, a high-efficiency method of range query is studied in this paper.
A range query is a common database operation that retrieves all records where some value is between an upper and lower boundary.As the basic operation of spatial query, range query can be regard as the first step of distributed spatial analysis.Rtree and its variants (the modified R-tree) are widely used in spatial queries.R-tree (Guttman, 1984b, Huang, 2001a), R*-tree (Beckmann, 1990b), R + -tree (Sellis, 1987b), M B-tree (Li, 2006b) and MR-tree (Yang Y, 2009a) are typical examples.The most suitable index for a range query is M R-tree which augments the standard R-tree by computing hash the concatenation of the binary representation of all the entries in a tree node.
The M R-tree combines concepts from M B-tree and R*-trees.Figure 1 illustrates the tree structure.Leaf nodes are identical to those of the R*-tree: each entry P i corresponds to a data object.A digest is computed on the concatenation of the binary representation of all objects in the node.Internal nodes contain entries of the form (p i , MBR i , H i ), signifying the pointer, minimum bounding rectangle, and digest of the child, respectively.The digest summarizes child nodes' M BRs (MBR 1 -MBR f ), in addition to their values (H 1 −H f ).When performing a range query Q (shows in Figure 2), the procedure runs a deep-first traversal from root node.If M BR of N i does not overlap Q, all the children of N i need not be traverse.This effectively reduces redundant processing.Figure 2. Range query on MR-tree There are still shortcomings in the M R-tree as a result of hash concatenation.The higher the randomness, the more impossible the area size of each N i is to minimize.It increases the likelihood of overlap between Q and N i , so that more nodes would be accessed and the efficiency is reduced.This paper introduces a parallel spatial range query algorithm, based on VoM R-tree index, which incorporates Voronoi diagrams into M R-tree, benefiting from the nearest neighbors.The organization of the paper is as follows.Section 2 briefly introduces the VoM R-tree index and its constructing method.Section 3 proposes an algorithm for range query processing using VoM R-tree in distributed spatial database, and discusses a strategy for verification of result set.Section 4 presents the experimental results estimating the performance of the proposed algorithms.Finally, Section 5 summarizes and concludes the paper.

S ubset Partition
The efficiency of depth-first traversal is impacted seriously by the ratio of overlap.When the query Q overlaps the M BR of nodes frequently, the query process will be costly as a result of a much deeper traversal.To reduce the probability of overlap between Q and M BR, A better idea is to minimize the area of each M BR.The major drawback of MR-tree is summarized as the subset partition depending on a hash sorting.So that the area size of each M BR cannot be restricted as minimal as possible.And it also has not a better way to sort dataset by axes in multidimensional space.
An optimal partition method in the range query shows in Figure 3. Compared with the case in Figure 2, the M BRs in Figure 3 are smaller.When M BR of N 2 does not overlap Q, the process will not traverse to N 3 , N 5 , P 1 , P 2 , P 4 , P 7 , P 8 , and P 9 .In another words, half of the nodes in M R-tree do not need to be traversed.In this method, the nearest neighbors are partitioned into a subset, for instance P 1 , P 2 , and P 4 into N 3 .It is an effective method to reduce the cost of Euclidean distance calculation.We use a term "VoM R-tree" to present the structure Voronoi diagram on M R-tree.

Distance Judgement
Constructing a Voronoi diagram promotes the queries of nearest neighbours.In a dataset partition process, it is possible that all the nearest neighbours of a spatial point has already partitioned into an exist subset.For example, show in Figure 7, assume each subset contains four leaves, objects P 1 has two nearest neighbours P 2 and P 4 , how about the fourth point in this subset.
In this case, we should choose a nearest point P x in neighbours of P 2 and P 4 as the fourth point.To minimize the M BR of this subset, P x must be close to either P 2 and P 4 or P 1 .The problem comes down to a geometric centroid.
Consider identifying the nearest point q i (i=1, 2, …, n) of spatial objects p, the geometric centroid of the region Q of q i can computing as follow: . where x, y = coordinates of geometric centroid q i .x,q i .y= coordinates of the nearest point The direction vector of geometric centroid can be the auxiliary judgment in nearest point search.The partial derivative of distance between p and the geometric centroid Q of q i is computed.According to formula 2, the direction vector is computed as following: where , x y   = the direction vector of geometric centroid adist(q,Q) = the Euclidean distance between object q and region Q x i , y i = coordinates of the q i x, y = coordinates of p Computing according formula 3 at point p 1 , to get a direction vector d 1 .Drawing a ray r 1 originating from p 1 in direction d 1 enters the Voronoi cell of p 2 intersecting its boundary at point x 1 .The direction d 2 at x 1 is computed and the same process is repeated using a ray r 2 originating from x 1 in direction d 2 which enters VC (p) at x 2 .Now, as we are inside VC (p) that includes centroid q, all other rays consecutively circulate inside VC (p).Detecting this situation, we return p as the closest point to q.

Initial Index
Given a spatial dataset P, the first step to initialize VoMR-tree index is constructing a Voronoi diagram.Next, we sort all the objects in dataset by x coordinate, and choose the minimum object as the start point P 1 .Assume objects' count in each subset is k, a k-1 nearest neighbors search result for P 1 constructs a subset N 1 .If the count of this subset less than k, we process a distance judgment which mentioned in 3.3.Then remove the objects in N 1 from source dataset P and repeat the steps above.When all the objects are removed from source dataset P, re-initialization of P is promoted as follow: P={N 1 , N 2 , …, N n }.To repeat the steps above till P contain only one object.Finally, a VoMR-tree construct to store all the values by leaf node recording the M BR and Voronoi diagram of entity P i while the parent node recording the M BR of corresponding subset N i .6 illustrates the structure of VoM R-tree which is based on M R-tree and augments the storage content to record Voronoi diagram.For a same range query, compared with MR-tree (Figure 1), the deep-first traversal goes through much less nodes (highlight in shadow) in the tree structure.

Update
As the storage content augmenting, it is necessary to discuss about the update of VoM R-tree because the Voronoi diagram was stored synchronously in database when the index initialized.
Insertion Deletion The next step is to iterate over all the neighboring objects that share Voronoi edges with p 1 in a counter-clockwise fashion (e.g., starting with p 2 ).Then, it draws the bisector between p 2 and p 4 and retrieves the next intersection at w 3 .To repeat this process until all the edges of the new Voronoi cell (of p 4 ) are computed.
The above process can also be performed in a clockwise manner.Finally, the new object is augmented with authentication and neighborhood information, and server simply inserts it in the corresponding spatial index.
Deleting an object (e.g., p 1 ) follows a similar approach, as shown in Figure 7.A procedure firstly locates p 1 and its neighbors, and divide VC(p 1 ) with the bisectors between the neighboring pairs of p 1 .Next, it updates the Voronoi neighbors of p 1 with new neighboring information, and transmits all affected objects (with their new signatures) to server.Additionally, the server removes p 1 from the corresponding spatial index.

Range Query Algorithm
For a range query Q, algorithm includes the following steps.Firstly, the process traverses a deep-first path in the tree from the root node.For each node which traversed, if its M BR is contained by Q, traversing all the children of this node are stopped.And if its M BR has not intersected with Q, it is also stopped to travelling the children of this node.When the M BR overlaps Q, access the node's children and repeat the steps above.

Input:
Query Q, VoMR_nodes Nodes Output: Set of Candidate Objects CS 1.
For each entry N in NodeCursor 5.
If N is leaf 6.
If Q contains N.MBR 7.
Remove N from Nodes; 10.
ElseIf N.MBR contained by Q 14.
Append all leafs in N to CS; 15.
Remove N from Nodes; 17.
Return CS; Consequently, redundant process in range query is reduced and efficiency of process is improved.

Data Partition
A single procedure to process range queries of mass data is seemed unreasonable.Distributed computing should be available.
In distributed computing environment, the chief issue is to distributed data to each computing node.
Given a set of data points sorted by x coordinate, each procedure reads an input split in the format of <data_point, d_value> (i.e., <key, value> pair) (Akdogan, 2010b).Note that the d_value does not have any purpose, i.e., it is used to follow the input format of distributed computing.Subsequently, each procedure generates a data unit for the data points in its split, marks the boundary polygons to be later used in the merge phase and emits the generated data unit in the form of <key, value> pair where key denotes the split number.The constant key is common to all data units, so that all units can be grouped together and merged in the next subsequent step.When the query process completes in computing node, a unique procedure collects all the splits, and merges them into a whole.Figure 9 shows the data partition of a VoM R-tree.To process mass data in highly efficiency, distributed computing could be took into consider.The dataset is partitioned into several parts and sends to each computing node for processing.There was a new problem that how to balance the data distributed.The simplest way is that partitioning the dataset by a single level in VoM R-tree.However, if the count of this level's nodes is more than the count of computing nodes, there will be idle equipment.On the contrary, Equipment is not enough for distribution.
A solution of this problem is attempting a tentative deep-first travel from root node.When it is impossible to balance data distributed in this level of nodes, access the next level of nodes and process a deeper travel.Because several nodes would be stop to visit in each level, data distribution should be balanced in the end.When it balanced, cease the travel process immediately, and then send each unit data to distributed computing node for a parallel process.Figure 10 shows the algorithm of the parallel process.

Input:
Query Q, VoMR_nodes Nodes, Computing Node Count C; Output: Set of Candidate Objects CS 1.
For each entry N in NodeCursor 5.
If N is not leaf 6.

Verification
In VoM R-tree and all the R-tree variants, objects are typically approximated using minimum bounding rectangles, which require less storage space than the full object, resulting in faster processing and less expensive I/O operations (Jacox, 2007a).As the approximation of spatial object, we should discuss the verification strategies when process a range query in VoM R-tree.
VoM R-tree augments the storage content for Voronoi diagram.It is considerable that using a Voronoi diagram to remove the incorrect result from the candidate result set.According to property 3 and property 4 of Voronoi diagram, the nearest neighbor Voronoi cells of each object less than six.Therefore, if all nearest neighbor Voronoi cells are contained by query Q, the object must be the correct result.This lemma helps us to reduce the processing cost because a Voronoi cell constructed at most six points which less expensive I/O operations as M BR.When it is unable to verify the result set, the process finally reads all the points of the object for a topological operator.Figure 11 illustrates the algorithm of verifying a range query result set.RS={φ}; 2.
For each entry e in CS 3.
For each VN in e.Neighbors 6.
If VN overlaps Q 7.
Append e to RS; 10.
For each point in e.points 12.
If all points contained by Q 13.
Append e to RS; 14.
Remove e from CS; 16.
Return RS; Figure 11.The algorithm of verifying a range query

EXPERIMENT AND ANALYS IS
We deploy a simulation of the distributed service system and extract terrain and traffic data in total 739,744 elements with 535 M bytes for the experiments.

Cost Models
The important performance metrics for spatial index structures are (i) index construction time, (ii) index size, (iii) query processing cost, (iv) size of the VO, and (v) verification time.Table 1 shows the costs calculated by the above-mentioned equations using the typical values of Table 1.The R-tree incurs about 2 time the overhead of the M R-tree for computing the query process information (in the entire tree), and is 9 times larger.The VoMR-tree is also significantly better in terms of result set and verification cost.The latter is particularly important because the clients are distributed devices with limited computing power.The only aspect where the two structures are similar is in size of index.In the following section we present a test of parallel efficiency Table 1.Symbols and values in analysis

Parallel Efficiency
The efficiency of the algorithm is evaluated in four symbols as following: Object repeat rate:  The experimental result is showed in Figure 12 and Figure 13.As it show in line graph, the VoM R-tree keep a more stable result repeat rate in value of 0.2 while R-tree and VoM R-tree are close to each other in object repeat rate.R *-tree has the effect of parallel processing.So we consider to contrast R *-tree with VoM R-tree for redundancy of processing.Figure 14 illustrates the sampling result of the task from 0 to 10000.It is obvious that R*-tree keeps a more stable task of fluctuation.However, fluctuation of VoM R-tree is reduced when the task more than 2000.Thus, VoM R-tree shows a lower redundancy in process of mass data.Figure 15 shows the result of testing transfer ratio for VoM Rtree and M R-tree.In the line graph, both VoM R-tree and M Rtree have similar transfer ratio in the value between 1 and 2. When the throughput increased, the transfer ratio of M R-tree begins to float while that of VoM R-tree seems more stable.

CONCLUTION
VoM R-tree augments the MR-tree, by computing hash values on the concatenation of the nearest neighbors and M BRs of all the entries in a tree node.The leaf node is sorted by nearest neighbor relationship while the construction of spatial index.In the process of query, a depth-first traverse is executing.While the M BRs overlapping, the procedure replaces the recursive query of the nearest neighbor search in order to dominate the space-cost.VoM R-tree is more advantages in the aspects of processing cost and average size of result set and is high efficient both in time and space parallelization.As future works, we plan our solutions to the problem of other types of spatial queries.We will improve a universal spatial index for a space database.

Figure 3 .
Figure 3.An optimal partition of a dataset Figure 5. Voronoi diagram

Figure 8 .
Figure 8. Range query algorithm on VoMR-tree Figure 9. Data partition 3.3 Paralleling Process of Query
4) where r o = object repeat rate C i = the count of objects in a subset C= the count of objects in a spatial dataset result repeat rate C i = the count of result of each computing node C= the count of objects in a result set of fluctuation M = maximum count in process of each node M= minimum count in process of each node u= average count in process of each node transfer ratio Q r = the size of source dataset Q s = the size of result set Q i = the size of result set in computing node We extract the data in equal size to evaluate the object repeat rate, and process a unique range query to check result repeat rate of R-tree and VoM R-tree.

Figure 14 .
Figure 14.T ask of fluctuation

ISPRS
Figure 15.T ransfer ratio Figure 1.Depth travel and structure of MR-tree The average number of Voronoi edges per Voronoi polygon does not exceed six.That is, the average number of Voronoi neighbors per generator does not exceed six.Let p 1 , p 2 ,…, p k be the k (k > 1) nearest neighbors in P to a query point q.Then, p k is a Voronoi neighbor of at least one point p i ∈{ p 1 , p 2 ,…, p k-1 } Figure5shows a Voronoi diagram constructed on a spatial dataset.Voronoi diagram is extremely efficient in searching a nearest neighbour region.It divides the two-dimensional space into several parts, and each pair of neighbour parts share a unique edge.The nearest relationship is recorded with this edge.
(Hu, 2010b).Delaunay triangulation2.2VoronoiDiagramGivenaset of distinct objects P={p 1 , p 2 ,…, p n } in space R, the Voronoi diagram of P, denoted as VD(P), partitions the space of R into n disjoint regions, such that each object p i in P belongs to only one region and every point in that region is closer to p i than to any other object of P in the Euclidean space.The region around p i is called the Voronoi cell of p i , denoted as VC(p i ), and p i is the generator of the Voronoi cell.Therefore, the Voronoi diagram of P is the union of all Voronoi cells VD(P) = {VC(p 1 ), VC(p 2 ), …, VC(p n )}.If two generators share a common edge, they are Voronoi neighbors.If we connect all the Voronoi neighbors, we get the Delaunay triangulation DT(P), which is the dual graph of VD(P)(Hu, 2010b).Property 1.Given a set of distinct points P={p 1 , p 2 ,…, p n } in R, the Voronoi diagram VD(P) and the corresponding Delaunay triangulation DT(P) of P are unique.Property 2. Property 3. Given the Voronoi diagram of P, the nearest neighbor of a query point q is p, if and only if q∈VC(p).Property 4.