A STRUCTURE-BASED APPROACH FOR MATCHING ROAD JUNCTIONS WITH DIFFERENT COORDINATE SYSTEMS

Effective matching algorithm of multi-source road networks is of vital importance for integration, updating and maintenance. In traditional matching methods, buffering and geometrical distances are basic strategies for potential matches searching, and the coordinate systems need to be similar. These methods may be invalid if the coordinate systems are unknown. Therefore, a novel approach of comparing the structures of urban road networks based on skeleton extraction is proposed in this paper. Firstly, the similarity measurement between junctions is described by a cluster comparison consisted with its neighbour junctions. And then the hierarchical strokes are recognized as a global structure to match, which eliminates the effect of the different coordinate systems. Finally, graphs converted from hierarchical skeletons of road network in different coordinate systems are compared and the most probable junction correspondences are established with maximum common subgraph algorithm. Based on the corresponding junctions, an affine transformation is able to be established between two unknown coordinate systems, and the remaining junctions matching will be conducted by traditional geometric methods. An experiment of matching road networks is carried out without any other geographic positional information. The result shows that no matter how significant the difference of coordinate systems are, it is still able to find the correct matches, which is impossible by traditional methods. Xuechen Luan (1985): Ph.D candidate, majors in data matching, modeling and LoD representation in urban road networks.


INTRODUCTION
Nowadays, volunteered geographic information (VGI) is changing the way of producing road map data.Data users are also producers, who can make crowdsourcing, mass and up-todate road datasets.But these datasets are unstructured and heterogeneous.Road matching is to establish correspondences between points/lines representing identical roads of the real world from different datasets.It is an important preparation for road data integration, updating and maintenance, which will ensure road network databases remaining valid and up to date to modern application requirements, such as location based services (LBS), geographical analysis, and multi-resolution/ representation database etc.Previous road matching approaches concentrated on matching road networks coming from same coordinate system and having approximate identical geographic positions.However, there are some cases that the geographic system may be inaccurate, such as disparate producing processes, data encryption, or even in different and unknown coordinate systems between datasets.It is a challenging work to match road network datasets from these different sources accurately.Different coordinate systems lead to failure of geometrical methods such as buffering and geometric distances (e.g., Euclidean, Hausdorff, and Frechét), and it will become a complex work that must rely on other properties of road networks.Most of current road matching researches concentrate on matching networks from the database with different levels of details (LoDs) (Volz 2006;Safra et al. 2006;Zhang andMeng 2007, 2008;Mustière and Devogele 2008) but few on different and unknown coordinate systems to the authors' knowledge.
Reprojection would allow using also traditional matching systems: only in the case of different and unknown become critical.Therefore, we concentrated on matching networks with different coordinate systems in the literature.Xiong and Sperling (2004) presented a cluster concept to match nodes in different road networks without accurate geometric adjustment.The cluster is defined as an associated sub-network spanning from a seed node within a distance threshold, and the node matching is converted to a comparison of two clusters.Based on that, they established correspondences from nodes to edges and segments between two networks.The cluster matching approach relies on the same scale and orientation between compared networks, and it is not able to get accurate matching result in completely different coordinate systems.Chen et al. (2006) proposed a method to automatically match road networks in unknown coordinate systems.Each two point pairs are used to achieve a transformation, and the transformation with maximum overlap between two compared networks is the optimum one.Instead of comparing all possible corresponded point pairs, the method uses spatial attributes such as point connectivity, angles of points, angle between points and distance to reduce the searching range.However, two-point correspondence can only deal with stretching transformation by calculating scaling parameters on x and y axes between different coordinate systems, and the compared road networks should not have rotation or shear distortions.To solve the problem of matching road networks with completely different coordinate systems, we propose a structure-based approach in this paper.

JUNCTION SIMILARITY MEASUREMENT
Before the matching of structures in urban networks, the similarity of junctions should be designed to reveal the structural relationship between compared junctions.In the context of lacking uniform positioning information between multi-sources datasets, geometric distances are not effective to tell the similarity of two junctions.We therefore implement another structural pattern named junction cluster proposed by Xiong and Sperling (2004) to guide road matching process.The basic idea of junction cluster comparison is from manual matching operation.When matching two networks manually, we first tend to search for corresponding road junctions according to their shapes.In human's mind, the shape of a junction does not only depend on the angles of roads connecting to it, but also on the local network combined by neighbouring junctions together.Therefore, the cluster is consisted with several nodes and edges associated with a specific junction.Based on junction cluster, a structural measure named Minimum Cluster Edit Distance (MiCED) is used in this paper to evaluate the similarity between two road junctions.

Structural pattern of junction cluster
Junction cluster (JC) is defined as a local network consisted with a road junction and its neighbour junctions connecting to it.Based on the JC, each junction in road network is described as a unique network and the similarity of two junctions can be converted to two clusters comparison around their central junctions.Theoretically, the JC can uniquely describe a specific junction in the whole urban network if the spanning distance is large enough.In practice, a three-step extension is sufficient to describe a junction in local area.

Minimum cluster edit distance for road network matching
Once the local network or cluster associated with each junction is built, we use a cluster changing measurement named Minimum Cluster Edit Distance (MiCED) proposed by Luan et al. (2011) to evaluate the similarities between compared junctions.The main concept of MiCED is to measure the shortest sequence of editing operations when transforming one junction cluster into another.The distortion model is to measure dissimilarities of two junction clusters JC 1 and JC 2 , including three editing operations named insertion, deletion, and substitution of an edge.Given a set of editing operations together with their costs, MiCED is defined as follows, Here, dr and weight are two cost functions.dr is a distance calculated with length and orientation difference and between matched edges, and weight is a constant value.In road matching, insertion and deletion operations have more importance than substitution to indicate the difference of two junction clusters.Hence, weight is always set larger than dr, which means that the costs of insertion and deletion are larger than substitution operation.The range of MiCED is normalized from 0 to 1 according to the total road length length(JC 1 ) + length(JC 2 ) of compare clusters.Obviously, the shorter this edit sequence is, the more similar the two junction clusters are.Thus MiCED is very flexible and suitable to measure the similarity of junctions in different networks.
To effectively find the substituted points between clusters, we use a tree search approach to establish the substitution pairs.The searching process is described as following steps, 1) Match the central junctions and adjacent roads between two clusters as initial corresponding junctions and roads; 2) For each corresponding road pair adjacent to the central junction of one cluster, search for end junctions of each road.If the distance of two end junctions is small enough, match them as corresponding junctions.Regard the matched junctions as new start ones and continue to search for next corresponding junctions; 3) If the distance of two end junctions is larger than the threshold, the searching process will extend the shorter road along the minimum deviation angle to compare the followingup possible corresponding points; 4) The searching process will be conducted iteratively until there is no junction to search in the local network.

GOLBAL STRUCTURE EXTRACTION FROM URBAN ROAD NETWORK
Based on the MiCED measurement, we can compare each two junctions between datasets, and take the most similar junction as the matched one for a specific junction.However, exhaustive searching of two road networks is time consuming and actually unnecessary.If a certain amount of corresponding junctions are found, we can establish a transformation between two urban road networks.The remaining junctions matching will then be conducted by traditional matching strategies.Generally, in urban extent, there is seldom irregular spatial distortion as reliable datasets.Hence a global affine transformation is sufficient to register two coordinate systems.The next problem is to define these key junctions and match them with different coordinate systems.In context of unavailable positional information, the road matching in our approach is based on other structural information to establish the correspondences between two road networks.The main concept of our approach is to transform the road network matching problem into the maximum common subgraph (MCS) problem in graph theory.
To achieve that, the global structure of each urban road network should be first extracted to constitute graphs.The purpose of structure-based matching is to establish some key junction correspondences between the compared global urban structures derived from different network datasets.These corresponding key junctions are then used to derive complete node matches between the two road networks.
In our structure-based matching process, we use the road hierarchies to extract the global structure.It is based on the characteristic of urban road network that although the details of road network are various in different databases, their high hierarchical roads or strokes are similar to a certain extent (Jiang 2009).Here stroke is defined as the natural functional units of a network (Thomson, 2006).It is constructed by aggregating street segments according to a number of different properties such as street name or the angle between neighbouring street segments.This hierarchies are able to be extracted through network analysis (Jiang and Claramunt 2002;Jiang and Harrie 2004;Tomko et al. 2008;Porta et al. 2006aPorta et al. , 2006b)).We use the approach proposed by Yang et al. (2011) to calculate road stroke length, degree, closeness and betweenness centralities as parameters to select high hierarchical roads.Degree centrality is calculated as the number of strokes connected to a given stroke.This is a local value reflecting the connecting ability of one stroke.Closeness centrality is calculated as: (2) Where n is the number of strokes in street networks, and d ij is the smallest sum of the edge numbers from stroke i to j. Betweenness centrality is a global value and measures the mediator effect of one stroke in an urban street network.Tomko et al. (2008) proved that betweenness centrality is a credible way of exploring the hierarchical characteristics of urban street networks.It is calculated as: (3) where n jk is the number of shortest paths from j to k and n jk (i) is the number of shortest paths from j to k that pass through i.
The integration function is described as follows, Level=0.2*Length+0.2*Degree+0.2*Closeness+0.4*Betweenness(4) where Length, Degree, Closeness, and Betweenness are normalized values which range from 0 to 1.The weights are determined on the basis of the operator's expertise regarding to the influence of the different geometric and topological similarity values on the total similarity.The aggregation of similarity measures is a difficult problem that can be further optimized within our approach, for example by using CRITIC criterion (see e.g.Yang et al. 2011).
Based on strokes, degree, closeness, betweenness centralities and stroke length are integrated to evaluate the hierarchies of strokes.High level strokes constitute the skeletons of the road network.Road junctions are then also classified into several levels by the connection relationships of hierarchical strokes.

GLOBAL URBAN STRUCTURES MATCHING
The following step of structure-based matching is to find the optimum correspondence of high level junctions between two compared global urban structures in different road networks.It can be convert to a graph matching problem named maximum common subgraph (MCS).Before global urban structures can be compared and matched, the graph of each global urban structure must first be created.Junctions are regarded as nodes and the roads are represented as edges between nodes.In this urban network, dual road and complex junction patterns have been abstracted as single edge and node by semantic or geometric information (Yang et al. 2011).there is single edge connecting these patterns to other junctions in the graph.Moreover, if there are junctions located in the same stroke, they are also connected by edges in the graph.For instance, junctions 23 and 31 are not connected directly in figure 1, but both of them located in the continuous strokes 13-23-27-31.They are therefore also connected by an edge in the graph.Although structure-based matching compares the similarity of global structures derived from urban networks, the relative positions of junctions are also of vital importance for road matching.Hence the orientation and the length differences of roads are stored in bivectors as the weights of corresponding edges in the graph.

Figure 1. Global structure of road networks
After the graph is created on each global urban structure, a MCS search for optimum correspondence between two graphs then follows.MCS is a classical graph matching in graph theory.MCS is defined as the subgraph which cannot be extended, in other words, a maximal subgraph cannot be a real subgraph of another maximal subgraph.Mathematically, this matching problem of finding the MCS of two graphs can be reduced to the problem of finding the maximum clique (a fully connected subgraph) in a suitably defined association graph (Koch 2001).In particular, the most famous clique detection algorithm, developed by Bron and Kerbosh (1973), which is also cited as ACM Algorithm #457, is adopted in this paper for the MCS finding.The algorithm is based on tree search with backtracking.Junctions' pairs are first chosen to construct a product graph with respect to their MiCED similarity measure and the nodes connection relation in each graph.Figure 2

Figure 2. Product graph of two graphs
To find out the maximum clique, a partial match set is initialled empty and the algorithm use heuristic condition of label and connection relation to prune conflicted search paths.In the case that the algorithm finds a complete clique, it backtracks until finds another partial matching.The algorithm reaches an end point in case of all the possible searching paths have already been tried.

CONTEXTUAL RECTIFICATION OF STRUCTURAL MATCHING
Most paired junctions after the MCS finding process are actually correct matches.However, in some cases, fake pairing may also be established if some structures of datasets in the different regions are similar to another region.It therefore needs some other strategies to rectify these fake matching pairs.
In this paper, we use a robust estimation approach named Mestimators to rectify the fake matches (Huber 1964).It is under the assumption that most structure-based matching pairs are correct.Therefore, the incorrect matching pairs can be rectified with a statistical analysis of the geometry distortions.
Let r i be the residual (i.e.Euclidean distance) between the corresponding junctions' pair i detected by MCS finding as mentioned before.The standard least-squares method will minimize the sum of all differences of corresponding pairs  , which is unstable if there are incorrect matches distorting the estimated functions.The M-estimators will eliminate the incorrect correspondence by replacing the squared residuals r i 2 by a function described as follows, min ( ) where ρ is a symmetric, positive-definite function with a unique minimum at zero, and it is chosen to be less increasing than square.We can define a weight function ω(x) and then implement it as an iterated reweighted least-squares one.According to M-estimators approach, this is exactly equal to formula (5) if we solve the following iterated reweighted leastsquares problem,   ) should be recomputed after each iteration in order to be used in the next iteration.
The weight function ω(x) measures the influence of a corresponding high hierarchical junction pair on the value of the coordinate system transformation parameter estimate.Structural matching shows that most of the corresponding junctions are correct, and the incorrect matching is just a little part.Hence, in robust estimation we should use the reliable information of structural matching and keep the initial weight.For suspicious matches, we should decrease the weight, while abnormal results should be excluded from the iteration.Therefore, three parts should be included in the weight function.Based on this concept, a weight function proposed by Yang (1994) is then chosen as follows, When the residual r is smaller than a, the weight should equal to 1.It means the next step iterated r k is equal to r k-1 in previous iteration, and we call it the normal interval; When r is growing larger than a but smaller than b, the weight should decrease.We call it suspicious interval; When r is larger than b, it is considered abnormal significantly, we should assign the weight zero to exclude them from iteration, and we call it eliminative interval.
Based on the MCS finding algorithm and robust estimation strategies, the high level junctions between two road network datasets can be matched automatically.An affine transformation will then be established by using these matched junctions as control points.Eventually, the two unknown coordinate systems will be projected into a uniform one, and the remaining junctions are able to be matched using traditional strategies, i.e. buffering method, geometrical distances, or topological searching approaches starting from the control points to the unmatched junctions with connection relationship.

EXPERIMENT AND DISCUSSION
To test the feasibility of the structure-based matching approach, we made a rigid experiment on the matching of road networks.Road datasets are got from National Fundamental Geographic Information System (NFGIS) and NavInfo TM navigation datasets of Wuhan City in China.In this experiment, the original geographical coordinate systems are modified manually.Therefore, we cannot use any positional information, and the only matching criterion is road structure.The threshold of MiCED is set as 0.4 in this experiment, which means the minimum similarity should be larger than 60%.For each junction, if the junction pairs having the MiCED value within threshold in the other dataset, these two junctions are considered as a potential correspondence.The global structures extracted from original urban networks are depicted in figure 3. Figure 4 shows the matching results according to MCS finding and robust estimation from between two global structures.To make the matching results clearer, two coordinate systems are projected uniformly.As illustrated in figure 5, junctions contained in the same ellipse indicate a matched pair.Despite unknown coordinate systems, the structural matching strategy using MiCED measure and robust estimation can also establish correct correspondence by comparing the shape of junctions.128 junctions in NFGIS dataset were selected from the experimental area, the recall ratio is 93 / 103 = 90.30%and accuracy ratio is 93 / 108 = 86.11%.It can be seen that the MiCED measure is able to match road network from different geographical position, and the accurate rate is more than 90%.The main drawback is that complexity of the compared road network yields high computational costs.In MiCED computing stage, there are average 18 three-step adjacent nodes of a junction to judge the similarity between junctions.On the other hand, in the global structure matching stage, the number of operations is upper bounded by O(n 3 ), where n is the number of potential matched nodes in the MCS algorithm.Our experiment is run on a personal computer with CPU 3.06 GHz and 1 GB of RAM.It takes 13.5 minutes when dealing with 700 junctions.The computation time will increase dramatically as the growing of data extent.Some possible available semantic information may improve the matching results and efficiency.Suppose two compared junctions crossed by roads with the same name respectively, they can be regarded as matched one directly.

CONCLUSSION
This paper proposes a structure-based approach for matching road junctions between urban road networks with different coordinate systems.The proposed approach is an improvement over previous approach.It extracts hierarchical strokes from original urban road networks, and converts them as graphs.Meanwhile, a junction-derived cluster is depicted from each junction to measure the similarity of junctions.Based on that, a graph matching algorithm is implemented to find the optimum correspondences between the compared graphs.The incorrect matches are rectified with a robust estimation approach.Finally an affine transformation is executed to transform the different coordinate systems into a uniform one.Two road network datasets were selected to verify the validity of the proposed approach.The main contribute of our approach is that:  Depicting the shape of a junction with junction-derived cluster;  Proposing MiCED measure to evaluate the shape similarity;  Extracting the hierarchical strokes as skeletons to describe the whole structure of an urban road network;  Matching road network skeletons correctly with a graph matching approach; and  Rectifying incorrect graph matching results with a robust estimation strategy.
The proposed approach not only matches junctions with different coordinate systems, but also is able to find out changes while there is no correspondent junction in another dataset of one junction.Moreover, the matched junctions can be used for urban road line matching between networks and integrating multi-sources networks and updating spatial database for maintenance of urban street networks in the further work.
excellent Ph.D. Candidates funded by Ministry of Education of China (5052011619019).
is an example given for the construction of the product graph PG of two graphs G 1 and G 2 .The nodes in PG are formed by compatible nodes in G 1 and G 2 , for example the edge pairs (a, a') and (b, b').For the node pair (a, b'), suppose their MiCED similarity measure is larger than threshold, the corresponding edges are then incompatible.The edges in PG are formed by compatible edge pairs of G 1 and G 2 .For instance, if both nodes b and c in G 1 and b' and c' in G 2 are connected, the node pairs (b, b') and (c, c') in PG are connected.Only nodes a and c in G 1 are connected, however, nodes a' and c' are disconnected in G 2 , the node pairs (a, a') and (c, c') in PG are disconnected.Based on the product graph PG, a maximum clique in PG corresponds to a maximal common subgraph in G 1 and G 2 .
where the superscript k indicates the iteration number.The weight ω(r i k-1

Figure 4 .
Figure 4. Matching result from an original view