GEOMETRICAL ADJUSTMENT TOWARDS THE ALIGNMENT OF VECTOR DATABASES

Comparison of geospatial databases presenting similar spatial extent might show substantial differences. This is the consequence of different factors, such as: accuracy, scale, data collection and processing methods, level-of-detail, data models to name a few. The differences are reflected in the geometric structure of objects, location, topology and the accompanying information. Geometric discrepancies are emerging, and sometimes even contradictions exist between the various data sources. Thus, the demand for processes that enable alignment of different data sources while maintaining spatial consistency is growing. Global solution strategies, such as an affine transformation, are incomplete solutions since discrepancies are still likely to exist due to the inability of such a global solution to account for the remaining errors due to local distortions. In order to account for the resulting random distortions, e.g., geometric conflicts, a localized geometric alignment process is implemented in this research. During this process the distortions (deviations) are quantified locally via sets of specifically selected observation constraints, to assure the spatial consistency of the vector data. This strategy exploits local spatial topologic and geometric relationships between corresponding line-features prior to the implementation of Least Squares Adjustment for the alignment, and observes local distortions and ambiguities that might exist. The outcome presents a significant improvement of the initial state by resolving local geometric distortions and discrepancies, suggesting a reliable solution for the problem on a statistically sound basis.


INTRODUCTION 1.1 Inconsistencies in Geodatabases
Reliable geospatial vector databases play an essential role in a variety of activities and applications, as does positional certainty and reliability in the manner by which the data is perceived and used.Data stored in vector databases results from extensive data collection and compilation that was carried out throughout decades or centuries.Along this process the collected data is usually highly influenced by various factors, such as the available surveying tools and techniques, the processing methods, or the quality assurance practice that was implemented.Furthermore, the Spatial Data Infrastructure (SDI) on which these databases are based has also changed considerably in terms of its availability and quality.
Comparing heterogeneous vector databases over a restricted or extensive area, in regard to the process of producing a unified homogenous SDI, might lead to inconsistent results; thus, not preserving the basic assumption of spatial consistency, i.e., the position of a topographic object does not vary.Gradually, differences are emerging, and sometimes even contradictions exist between the various data sources.Thus, the demand for processes that enable alignment of different data sources is growing.At the same time, tools for updating and improving the data are essential.The most influential differences on vector databases are as follows: 1. Large difference in location and number of supporting points as a result of the level of accuracy, level of detail, generalization, and different scaling; 2. Objects are divided into sub objects in different positions so that there is a different segmentation of similar objects.This is based on differences in the data catalogue; 3. The points in comparable objects differ, e.g., they may appear in different order and/or distribution ; and, 4. Information can be presented in different coordinate systems or different mapping projections.

Solution Strategies
A global solution strategy should bring the databases into the same datum (thus eliminating some systematic inconsistency); but still, geometric discrepancies between the overlapping area of the databases are likely to exist.Proper averaging of the overlapping vector data may reduce discrepancies, but it may also introduce distortions, and by that cause a violation of the relationships between data elements (such as: parallelism, perpendicularity, etc.).To reduce the remaining differences, local solutions based on the shape and position of objects and the relation to other objects are used.They include techniques from (among others): shape similarity and pattern recognition.These techniques can be divided to feature based alignment, where the matching is based on the geometry of the objects, and relational alignment, where the neighbors of an object are taken into account.To handle the resulting random distortions, i.e., geometric conflicts, a localized geometric alignment process based on the relation of points in regard to homological line features of numerous objects is introduced in this research (Boljen, 2010).

Alignment in General
Matching of geospatial information first appeared in the mid eighties, in a joint project of the Geological Survey Authority and Bureau of the U.S. Census (Rosen and Saalfeld, 1985).It was carried out to combine the databases of two branches to a new and better database, consisting of the information existing in the two branches.The project concentrated on the development of mutual adjustment processes, where the solution was based on homological point matching on intersections of roads through an iterative process.After finding the matching points, a transformation process was implemented based on rubber sheeting.Several approaches were developed to improve this procedure (Cobb et al. (1997), andXiong (2000)).
The main assumption behind adjustment methods based on points is that the databases are characterized as isomorphic: for each element in one database there exists only one object that matches in the other database -1:1 correspondence (matching solution is relatively trivial).When topologic differences exist, several candidates might exist -1:n or n:m correspondences; using a Greedy approach might yield unreliable solution.Li and Goodchild (2010) proposed a more general solution for selecting the most consistent relation.Point based alignment might not be reliable enough for these cases since a more complex solution is required, which might include a broader combination of geometry, statistics, and topological analysis.Kampshoff and Benning (2005) used a Least Squares Adjustment (LSA) to harmonize data, making it possible to include specific constraints, such as straight lines preservation.

Feature based Alignment (Geometric)
Feature based alignment is based on the examination of structure objects.The degree of objects compatibility can be determined by the geometrical shape, size, area, area overlap, distance (Euclidian, Fréchet, Hausdorff), length.This process is performed by analysis of one set of objects and comparing similar structural analysis of the candidates fit the other data set (such as: Belongie et al. (2002), Butenuth et al. (2007)).

Relational Alignment
Relational alignment or relational matching is a process that takes various relationships between elements in the data sets into consideration.Primary example of this is topological relations between objects, or the different topological and geometrical relations between the internal parts that make up the objects.When considering two objects for correspondences, this can verify the fit and may even determine clearly that the objects are compatible because they are neighbors of identical objects that are similar to the objects' relative location.The objects correspondences can be evaluated based on buffer methods (Walter and Fritsch, (1999), Volz (2005)) or relaxation ones (Zhang (2009), Siriba et al. (2011)).This can also be done with imperfect knowledge using statistical methods (Mustière andDevogele (2008), andShnaidman et al. (2011)).

Geometrical Dependency
The coordinates of two vector databases x 1 and x 2 (one is considered as subject while the other is reference), including their stochastic modeling matrices Σ xx,1 , Σ xx,2 , are known.To achieve the geometric alignment, the perpendicular gaps (d ij ) (where i and j are two homological linear features) between corresponding features from both databases are reduced (minimized).The rectangular gaps are formed by calculating vertex-to-line(s) distances, e.g., coupling-up pairs of corresponding vertices considered the nearest ones existing in the corresponding linear features.The algorithm is designed to improve the given coordinates in order to remove the gaps (regarded as contradictions) in the LSA iterative process.
In most cases, the number of vertices and line segments stored in each database is not equal, thus a coarser feature structure might not be aligned completely to a finer feature structure.The elimination of geometric conflicts between databases of different vertex densities is carried out by introducing a geometric sub-division of the coarser database features.This enables higher flexibility of the coarse features and their alignment.Establishing intermediate vertices in the later adaptation as predetermined breaking points within the immediate line profile leads to a more homogenous geometrical adaptation of features.
In order to solve the adaptation, we look at the length of the perpendiculars d ij from a given point P j in x 2 to the corresponding line segment between the two points P i P i+1 in x 1 .The coordinates of both databases are adjusted in the way that the quadratic sum of these corrections is minimized under the condition of eliminating the perpendiculars.The geometric dependencies are shown in Figure 1.The derogation along the line segment of x 1 is called p ij and is defined together with d ij , as depicted in Equation 1, where i=1,...,n d , and, j=1,...,n p. (1) The angular value  ij describes the azimuth of the line segment P i P i+1 and also the direction of the perpendiculars d ij . 1 Figure 1.Geometric dependency between two line features During the LSA P j and P i P i+1 are moved to a common straight line.The scale of the adjustment depends on the number and distribution of the perpendiculars, as well as the stochastic model of coordinates.

Least Squares Adjustment
Based on the functional model for the perpendiculars d ij (1), a LSA is defined based on conditional observations.The conflict between the two datasets, denoted as w x , is described by the value of the perpendiculars.The functional dependency from the corrections of the coordinates, denoted as v x , to the existing conflict w x is described in matrix B x , depicted in Equation 3, and therefore has to be linearized.A stochastic model is derived out of the covariance matrix of the point coordinates  ll,x .The cofactor matrix of the point coordinates Q ll,x is then build by eliminating the variance factor  0 Each row of B x corresponds to one perpendicular link between the two databases.The matrices and vectors can be split up in two parts containing only the functional dependency to coordinates of x 1 or x 2 .The condition for the algorithm to work correctly is to set no correlations between the two datasets, therefore Q ll,x can be split up to a cofactor matrix for the coordinates of x 1 , Q xx,1 and a cofactor matrix for x 2 , Q xx,2 .The linearized model derived from (1) describes the following relationships for the coordinates, as depicted in Equation 4: The solution v x , depicted in Equation 5, is derived using the cofactor matrix of the conflicts Q ww .The stochastic information of adjusted coordinates is contained in the adjusted observations cofactor matrix, and is determined by variance propagation. (5)

Geometrical Assignments
The LSA minimizes the quadratic sum of the coordinate corrections under the condition of eliminating the perpendiculars d ij .In order to get a valid solution, the assignments for the perpendiculars have to be chosen regarding several criterions.At first, the width of the target line segment for each perpendicular has to be defined via a buffer around every line segment.It regards the maximum length of the perpendicular d c and the length p c that defines the maximum length for the extension of a line segment, where s ij describes the total length of the line segment.The definition of these two thresholds leads to the equations depicted in Equation 6: All perpendiculars that are not valid regarding (6) are not used in the adjustment process.The buffer for valid assignments is visualized in Figure 2, where red d ij depicts an invalid assignment not used.The thresholds p c and d c have to be adjusted to a certain task.
Figure 2. Buffer (dashed blue lines) of valid areas for assignment

Structural Adaptation
The quality of approximation for an object is given by the number and density of recorded points.A structure with a high density of points is always adjustable to a structure with a low density, but not vice versa; there is a bias towards the structure with the higher density of points, as it has more observations.This effect can be avoided by increasing the density with the interpolation of new points.For this process every point of x 1 is projected via the perpendicular to the corresponding line of x 2 .A new point is interpolated and added to the line segment in x 2 if the perpendicular length d ij and the distance to the next point on the line segment p ji is valid regarding the equations depicted in Equation 7: The position of the new point on the corresponding line segment is given by the relation of p ji /s ji .An example for such a situation is depicted in Figure 2 shown as green dashed d i+1,j+1 .

Invalid Assignments
If a point P j can be assigned to more than one line segment, all valid combinations to line segments P i P i+1 have one common point P i .To find the valid combination of assignments one has to identify the P i which appears the most.The valid assignments are the ones where the corresponding line segments have the identified common point P i .All other assignments are not regarded in the adjustment process, depicted in Equation 8.
x y After the definition of valid assignments, cases may occur where a point P j can show several assignments, especially at junctions, as in the example depicted in Figure 3.As the perpendiculars are calculated to all possible corresponding line segments, the combination has to be identified where two perpendiculars are most rectangular to each other.After this process, there is a maximum of two assignments available for each point P j .In the case of two possible assignments, the intersection angle of the two perpendiculars has to be investigated for its rectangularity.If  c describes the derogation of the intersection angle  i,j- i+1,j from rectangularity, the intersection angle has to be in the range depicted in criterion of Equation 9.In case this criterion is not fulfilled, one of the assignments has to be eliminated from the adjustment process (as in the case of parallel or close to parallel line segments).

 
, 1 , sin cos , 0 2 In addition, cases may occur where a vertex breaks into two linear features.As connections to line extensions are valid (6) two connections from the same vertex are built to nearly the same point position, leading to a singularity problem in the adjustment model (B x matrix has two dependent rows).In this case only the connection that falls on the line segment and not on the extension is regarded as valid.
Figure 3. Example for several assignments for point P j Additionally, cases have to be regarded where one point P j has connections to line segments meeting in a sharp angle.If these connections are introduced in the adjustment model, the solution minimizing the quadratic sum of the coordinate corrections under the condition of eliminating d ij will move P j to the intersection point of the two line segments.This leads to an undesired 'out-of-bounds' shift of the point (Figure 4, top).A different case occurs when the corresponding line segments are parallel.The only possible solution to fulfill the condition of eliminating the d ij is to move the reference and let both line segments collapse in point P j (Figure 4,bottom).As these solutions are not valid, the bad correspondences have to be removed from the adjustment model.The candidates are examined in respect to the line segment point P j is a part of (e.g., connectivity to points P j-1 and P j+1 ), to choose the one that has the best topological and geometrical similarities.A similar process is carried out in case the reference segment lines are not parallel but still have a joint vertex that is out-of-bounds (this are identified when d ij << s ij ).Statistical measures are incorporated in the process, which identify ambiguous correspondences existing between features in case more than one reference exists to a specific subject.The aspiration of this is to identify the most likely corresponding reference feature, thus validate which line segment to use, i.e., which correspondence to put in the adjustment solution.This is based on the existing (1:1) feature correspondences of the subject features that help evaluate ambiguous ones (1:n).

Data
The algorithm was implemented and analyzed on several datasets.First, it was used to harmonize cadastral data of two neighbouring states, Figure 5.The geometric differences of the support points were between zero and several meters.The aim of this experiment is to fill the existing gaps and also removing overlaps.Second, it was tested on linear features (Figures 6 and  7), representing two databases: one is a road network derived from a topographic digital map designed for a scale of 1:25,000, while the other is a road network derived semi-automatically from analogue 1:2,500 cadastre databases.The aim was to align the cadastre database (subject) to the topographic one (reference).Third, cadastral and topographic data were aligned (Figures 8 and 9).The scale of the cadastral data was 1:5,000, and the topographic data was designed for a scale of 1:25,000.Discrepancies exist not only because of generalization but also due to different data collection methods.The aim of the alignment was to fit border lines with topographic objects.

Results
Figure 5 depicts an example from the first dataset obtained from implementing the proposed algorithm.Left image depicts some common geometric discrepancies exist, such as: gaps, continuity of features, etc., when trying to harmonize and align different vector databases.After applying the algorithm (right), the geometric discrepancies are reduced to zero, while corresponding features (borders) are aligned.Not only that all gaps are closed, due to the use of the constraints, corresponding line-segments from different objects are perfectly aligned.Figure 8 depicts the alignment of subject polygon features from a cadastral database (dashed blue) with a topographic reference one (dashed green).While previous experiments showed results where one dataset is reference (high accuracy) while the other is subject (low accuracy), in this experiment both databases were considered to have the same accuracy, thus both had the same weights in the process.The top image depicts the entire area showing geometric discrepancies before the alignment, which vary up to 10 meters.The bottom image depicts the alignment results after three iterations, presenting a complete alignment of corresponding features (subject dataset overlays perfectly with reference).Table 1 shows that during the three iterations applied, the existing residuals (v t v) are reduced for both databases as the process progresses toward convergence, thus converging to the optimal mutual alignment solution.9 depicts a subject dataset that is the result of a global affine transformation (translation and rotation only) carried out on the topographic reference dataset.Reference has a higher accuracy, such that the result should bring the subject dataset to alignment with the reference one.This will enable a quantitative analysis of the proposed process, in which the transformation parameters will be calculated via the vector fields' magnitudeand compared to the one used.The left image shows an extract before alignment -subject (dashed blue) and reference (dashed green) datasets superimposed with displacement values (vector fields in black arrows).The right image shows the final result.Displacements for all features after alignment are close to 0 meters.standard deviation values exist for the calculated parameters (in the scale of +/-0.1 m in t x and t y ); these are explained by the fact that the alignment model suggested here has a local fashion, together with the fact that the adjustment equations identify point-to-line correspondences, as opposed to the point-to-point one that originates in the used synthetic global transformation model.
Figure 9 It is worth noting that the results do not change, deviate or are affected when the number of features to be aligned is modified, i.e., data amount is increased or data coverage is expanded.The geometric adjustment results do not differ since the strategy presented here exploits only local spatial topologic and geometric relationships that exist between corresponding features -prior to the implementation of LSA, i.e., no global transformation and alignment is extracted during the process.
Minor geometrical alterations might exist, but these are only local ones that have a restricted affect on the overall solution.

CONCLUSIONS AND FUTURE WORK
A geometrical adjustment approach for the alignment and harmonization of vector databases was presented.This strategy exploits local spatial topologic and geometric relationships between corresponding line-features prior to the implementation of LSA, and observes local distortions and ambiguities that might exist; as opposed to a global transformation and alignment that suggests otherwise.As depicted and analyzed in the examples, the outcome presents a significant improvement of the initial state by quantifying local geometric distortions and discrepancies.This suggests a qualitative and reliable solution to the problem of spatial inconsistency that is evident when comparing different vector databases.
Future work will entail adding more constraints to fine tune problems that are encountered from poor geometry constellation of data -sparsely distributed points and also excessive ones.Larger databases that have wider coverage areas will also be analyzed, which will mainly have an effect on the size of the normal equation system to be solved.In order to reduce the number of features involved, a possibility could be to apply a hierarchical partitioning of the space, e.g., using the major road network as objects on the high level, and subsequently adjust the features within such a high level network mesh.
There are also cases, where a mere geometric analysis of possible corresponding features is not successful, as several neighboring objects are possible.Therefore, in the next stage we will make use of supplementary data to 'enrich' the current geometric process, such as semantics, attributes and other topological characteristics, to better identify corresponding features for alignment.

Figure 5 .
Figure 5. Geometric alignment of cadastral databases along the border of two neighbouring states: before (left) and after (right) Figure 6 depicts the alignment of linear features derived from a road network: subject in dashed blue and reference in dashed green.The left image depicts an extract of the data showing geometric discrepancies of up to 50 meters before the implementation.The right image depicts the alignment results, presenting almost a complete geometric overlay and consistency of corresponding features.The overall alignment result is robust and accurate.A detail from Figure 6 is depicted in Figure 7: before (left), during (middle, including vector fields as black arrows), and, after (right) implementation.The vector fields are derived from the local geometric spatial displacements extracted.Perfect alignment of linear features is clear (right): subject dataset overlays seamlessly with the reference one.

Figure 6 .Figure 7 .
Figure 6.Alignment of road network databases with up to 50 meters of discrepancies: before (left) and after (right) Table 2 depicts the parameter values used, calculated and differences; parameter values extracted are very close to those used, hence precise features alignment achieved.Small

Table 2 .
Affine transformation used and calculated via process