MATCHING AERIAL IMAGES TO 3 D BUILDING MODELS BASED ON CONTEXT-BASED GEOMETRIC HASHING

In this paper, a new model-to-image framework to automatically align a single airborne image with existing 3D building models using geometric hashing is proposed. As a prerequisite process for various applications such as data fusion, object tracking, change detection and texture mapping, the proposed registration method is used for determining accurate exterior orientation parameters (EOPs) of a single image. This model-to-image matching process consists of three steps: 1) feature extraction, 2) similarity measure and matching, and 3) adjustment of EOPs of a single image. For feature extraction, we proposed two types of matching cues, edged corner points representing the saliency of building corner points with associated edges and contextual relations among the edged corner points within an individual roof. These matching features are extracted from both 3D building and a single airborne image. A set of matched corners are found with given proximity measure through geometric hashing and optimal matches are then finally determined by maximizing the matching cost encoding contextual similarity between matching candidates. Final matched corners are used for adjusting EOPs of the single airborne image by the least square method based on co-linearity equations. The result shows that acceptable accuracy of single image's EOP can be achievable by the proposed registration approach as an alternative to labourintensive manual registration process. * Corresponding author


INTRODUCTION
In recent years, a large number of mega cities provide detailed building models, representing their static environment for supporting critical decision for smart city applications.However, a city is a dynamic entity, which environment is continuously changed and accordingly its virtual models also need to be timely updated for supporting accurate model-based decisions.In this regard, a framework of the continuous city modelling by integrating multiple sources was discussed by Sohn et al. (2013).A first important step for facilitating task is to coherently register remotely sensed data taken at different epochs with existing building models.A large research effort has been made for addressing the problems related to the image registration.A comprehensive literature review can be found in (Brown, 1992;Zitova and Flusser, 2003).Also, Fonseca and Manjunath (1996) conducted a comparative study of different registration techniques using multisensory remotely sensed imagery.Although most of registration methods show promising success in a controlled environment, Zitova and Flusser (2003) also pointed out that the registration is a challenging vision task due to the diverse nature of remote sensing data (resolution, accuracy, signal-to-noise ratio, spectral bands, scene complexity and occlusions).These variable affecting the performance of registration leads to severe difficulty of its generalization.Even though a design of universal method applicable to all registration tasks is almost impossible, the majority of existing registration methods consist of following three typical steps; feature extraction, similarity measure and matching, and transformation (Brown, 1992;Habib et al., 2005).Thus, a successful registration depends on proper establishment of a strategy for individual steps.Recently, the advancements of aerial image acquisition technology makes it possible to direct geo-reference.Even though EOPs obtained direct geo-referencing technique provides sufficient accuracy for certain types of applications (coarse localization and visualization), the EOPs obtained though this technique need to be further adjusted for improving their accuracy for many applications where engineering-grade accuracy is concerned including the continuous modelling.Traditionally, accurate EOPs are determined by the bundle adjustment procedure with known ground control points (GCPs) in photogrammetry.However, obtaining or surveying GCPs over large-scale area is labour intensive and timeconsuming.An alternative method is to use known points instead of direct survey of GCPs.Nowadays, large-scale 3D building models have been generated over the major cities of the world.Thus, corners of these valuable existing building models can be used for this purpose.However, the quality of building models varies respective to individual building, which is often unknown.Also, computational overhead to match airborne imagery with large-scale building models must be considered for building an effective model-to-image matching pipeline.
To address these issues, we propose a new registration method between a single image and the existing building models.In this study, we propose a new feature which consists of corner and its arms (edged corner feature).In addition to the use of the single feature, context feature is also used to help robust matching.Our matching method is based on the Geometric Hashing method which is a well-known indexing-based object recognition technique.However, we rectify the method by introducing several constraints and geometric properties of context feature because a standard geometric hashing method has its own limitations.

Related Works
Registration process can be recognized to find correspondence between datasets by establishing relation.Brown (1992) classified existing registration methods into area-based and feature-based methods according to their nature.Area-based approach uses image intensity values extracted from image patches.It deals with images without attempting to detect salient objects.Correspondence can be determined with a sliding window of a specific size or over the entire image by correlation-like methods such as; fourier methods, and mutual information methods, and so forth.While, feature-based methods uses salient objects such as points, lines, and polygons to establish relation between two different datasets.The featurebased methods generally consist of feature extraction, feature matching, and transformation.In model-to-image registration, most of registration methods are based on feature-based methods because models have no texture information while salient objects can be extracted from the models and image.Points features such as line intersections, corners and centroids of regions can be easily extracted from models and images.Thus, Wunsch and Hirzinger (1996) used the iterative closest point algorithm (ICP) to register a model to the 3D data.In similar way, Avbelj et al. (2010) used point features to align 3D wire-frame building model with infrared video sequences using a subsequent closeness-based matching algorithm.However, Frueh et al. (2004) pointed out that point features extracted from image cause false correspondence due to a large number of outliers.As building models or man-made objects are mainly described by linear structures, many researchers have used lines or line segments as features for the registration process.Hsu et al. (2000) used line features to estimate a 3D pose of video where the coarse pose was refined by aligning a projected 3D model of line segments to oriented image gradient energy pyramids.Frueh et. al. (2004) proposed model to image registration for texture mapping of 3D models with oblique aerial image using line segment as a feature.Correspondence between line segments was computed by a rating function which consists of slope and proximity.Eugster and Nebiker (2009) also used line features for real-time geo-registration of video streams from unmaned aircraft systems (UAS).They applied relational matching which not only consider the agreement between an image feature and a model feature, but also takes the relations between features into account.However, Tian et al. (2008) pointed out several reasons that make the use of lines or edge segments for registration a difficult problem.First, edges or lines are extracted incompletely and inaccurately so that an ideal edges might be broken into two or more small segments.Secondly, there is no strong disambiguating geometric constraint.While, building models are reconstructed with certain regularities such as orthogonality and parallelism.Utilizing prior knowledge of building structures can reduce matching ambiguities and the search space.Thus, Ding et al. (2008) used 2D orthogonal corner (2DOC) as a feature to recover camera pose for texture mapping of 3D building models.Correspondence between image 2DOC and DSM 2DOC were determined using Hough transform and generalized M-estimator sample consensus.Wang and Neumann (2009) pointed out that 2DOC features are not very distinctive because the feature is described by only orthogonal angle.Instead of using 2DOC, they proposed 3 connected segments (3CS) as a more distinctive and repeatable feature.For putative feature matches, they applied two level RANSAC which consists of a local and a global RANSAC for robust matching.

REGISTRATION METHOD
To register a single image with existing 3D building models, edged corner features are extracted from both datasets and their corresponding matches are computed by an enhanced geometric hashing method.The EOPs of the image can be efficiently adjusted based on the established correspondence of features.The EOPs are updated by an iterative process.Figure 1 illustrates the outline of our approach.

Feature Extraction
Feature extraction is the first step of the registration task.The selection of salient features should consider the properties of the datasets used, its application, the required registration accuracy, and so forth.In our study, we use a corner and its arms as a single feature because it can be detected and distinguished in both image and a building model with structure information of a building object.In the building model, it is straightforward to extract edged corner features because each vertex of building polygon can be thought as a corner.In the image with rich texture information, various corner detectors and line detectors can be used to extract the feature.Also, context features are used to achieve more accurate and robust matching results by adding relative geometric information of the context features.In this section, we explain extraction of edged corner features from a single image and properties of context features.

Edged Corner Feature Extraction from Image
Edged corner features from a single image are extracted by three separate steps; 1) extraction of straight lines, 2) extraction of edged corner points and 3) verification of extracted features.
The process starts with the extraction of straight lines from a single image by applying a straight line detector.In this study, we used Koversi's algorithm that relies on the calculation of phase congruency to localize and link edges (Korvesi, 2011).
Corners are extracted by finding the intersection of extracted straight lines considering proximity with a given distance threshold ( d T = 20 pixels).Corner arms have a certain fixed length (20 pixel) and their directions are determined by two straight lines used.These process may produce incorrect features because we only considered proximity constraint.Thus, verification process remove incorrectly extracted features based on geometric and radiometric constraints.As a geometric constraint, inner angle between two corner arms is calculated and investigated to remove features with sharp angle.This is because buildings are constructed according to certain geometric regularities (e.g., orthogonality and parallelism) where small acute angles are uncommon.So, features, which have a very acute inner angle (that is, the angle between two arms), are filtered out by a certain inner angle threshold (  T = 10º).For applying the radiometric constraint, we analyze radiometric values (digital number (DN) value or colour value) of the left and right flanking regions ( ) of corner arms with flanking width (ɛ) as used in Ok et al. (2012).Figure 2(a) shows a configuration of an edged corner feature and the concept of flanking regions.In a correctly extracted corner, the average DN (or colour) difference between L F 1 and , is likely to be small, underlining the homogeneity of two regions while the average DN difference between L F 1 and L F 2 , , should be large to underline the heterogeneity of two regions.Thus, we measure two radiometric properties, the minimum DN difference of two neighbour flanking regions for homogeneity measurement, , and the maximum DN difference of two opposite flanking regions for heterogeneity measurement, . Thus, a corner is considered as an edged corner feature if the corner has a smaller mo ho D m in than a threshold mo ho T and a larger hetero D max than a threshold heteo T .In order to determine thresholds for two radiometric properties, we assumed that intersection points were generated from both correct corners and incorrect corners, and the two types of intersection points have different distributions of radiometric property.Because there are two cases (correct corner and incorrect corner) for the DN difference values, we can use the Otsu's binarization method (Otsu, 1979) to automatically determine appropriate threshold value.The method is originally designed to extract an object from its background for binary image segmentation based on histogram distribution.It calculates the optimum threshold separating the two classes (foreground and background) so that their intraclass variance is minimal.In our study, a histogram for homogeneity values (or heterogeneity values) of entire intersection points is generated and then the optimal threshold for homogeneity (or heterogeneity) is automatically determined by Otsu's binarization method.While the context feature is invariant under scale, translation, and rotation, it provides many advantages in the matching process.

Similarity Measure and Primitives Matching
Similarity measurement and matching process takes place on the image space after existing 3D building models are backprojected into the image space using the co-linearity equations with an initial EOP (or updated EOP).In order to find reliable and accurate correspondences between edged corner features extracted from a single image and building models, we propose an enhanced geometric hashing method where the vote counting scheme in standard geometric hashing is supplemented by a newly developed similarity score function.

Geometric Hashing
Geometric hashing is a model-based object recognition technique for retrieving objects in scenes from a constructed database (Wolfson and Rigoutsos, 1997).In geometric hashing, an object is represented as a set of geometric features such as points and lines, and its geometric relations which are transformation-invariant under a certain transformation.Since only local invariant geometric features are used, geometric hashing can handle partly occluded objects.Geometric hashing consists of two main stages; the pre-processing and recognition stages.The first pre-processing stage encodes the representation of the objects in a database and store them into a hash table.Given a set of object points (

hashing table entries with base pair, (c) all hashing table entries with all base pairs
In the second recognition stage, the invariants derived from geometric features in a scene are used as indexing keys to assess the previously constructed hash table for matching with stored models.In a similar way to the pre-processing stage, two points from a set of points in scene are selected as a base pair.The remaining points are mapped to the hash table and all entries in the corresponding hash table bin receive a vote.
Correspondences are determined by a vote counting scheme, producing candidate matches.Although geometric hashing can solve matching problems of rotated, translated and partly occluded objects, it has some limitations.The first limitation is that the method is sensitive to the bin size which is used for quantization of hash table.While a large bin size in the hash table cannot separate between two close points, a small bin size cannot deal with the position error of the point.Secondly, geometric hashing can produce redundant solutions because the method is based on vote counting scheme (Wolfson and Rigoutsos, 1997).Although it can significantly reduce candidate hypotheses, a verification step or additional fine matching step is required to find optimal matches.Thirdly, geometric hashing has a weakness in cases where the scene contains many features of similar shapes at difference scales and rotations.Without any constraints (e.g.position, scale and rotation) based on prior knowledge about the object, geometric hashing may produce incorrect matches due to the matching ambiguity.Fourthly, the complexity of processing increases by the number of base pairs and the number of features in the scene (Lamdan and Wolfson, 1988).To address these limitations, we enhance the standard geometric hashing by changing the vote counting scheme and by adding several constraints such as scale difference of a base and specific selection of bases.

Enhanced Geometric Hashing
In our study, we describe the building model objects and the scene by sets of edged corner features.Edged corner features derived from the existing building models are used to construct the hash table in the pre-processing stage while corner features derived from the single image are used in the recognition stage.Each building model in the reference data consists of several planes.Thus, in the pre-processing stage, we select two edged corner features, which belong to the same plane of a building model, as base pair.It can reduce the complexity of the hashing table and ensures that the base pair retains the spatial information of the plane.The selected base pair is scaled, rotated, and translated to define the reference frame.The remaining corner points are also transformed with the base pair.In contrast to the standard geometric hashing, our hashing table contains model IDs, feature IDs of the base pair, the scale of pair (the rate of real distance of base pair), an index for member edged corner features, and context features generated by combinations with edged corner features.Figure 4(b) shows an example of information to be stored in hashing table.Once all entries with possible base pairs are set, the recognition stage tries to retrieve corresponding features based on the designed score function.In order to reduce search space, two corner points from the image are selected as base pair with two constraints; 1) scale and 2) position constraints.As a constraint on a scale, we assume that scales of base pairs from the model and from the image are similar because initial EOP provides an approximate scale of the image.Thus, a base pair from the image is filtered out if the scale ratio between the base pairs from the image and from the model is smaller than userdefined threshold .In addition to the scale constraint, possible positions for endings of base pair can be also restricted with a proper searching space which can be determined by calculating error propagation with the amount of assumed errors (calculated by iterative process) for initial EOP (updated EOP) of the image and models.The newly designed score function consists of a unary term, which measures the position differences of the matched points, and a contextual term, which measures length and angle differences of corresponding context features, as follows; where,  is an indicator function where the minimum number of features to be matched is determined depending on c T ( c T = 0.5, at least 50% of corners in the model should be matched with corners from the image) so that all features of the model do not need to be detected in the image; n and m are the number of matched edged corner features and context features, respectively; w is a weight value which balances the unary term and the contextual term (w = 0.5).
Unary term: The unary term measures the position distance between edged corner features derived from the model and the image in a reference frame.The position difference between an edged corner feature in the model and its corresponding feature in the image is normalized by the distance P i N calculated by error propagation.
Contextual term: This term is designed to deal with relationship between neighbour features (that is, context feature) in terms of length and four angles.The contextual term is calculated for all context features which are generated from matched edged corner features.For length difference, the difference between lengths of context features in the model and in the image is normalized by the length L ij N of the context feature in model.For angle differences, the angle difference between the inner angles of a context feature is normalized by the For each model, a base pair and its corresponding corners which maximize the score function are selected as optimal matches.Note that if the maximum score is smaller than a certain threshold ( m T = 0.6), the matches are not considered as matched corners.Once all correspondences are determined, the EOPs of the image are adjusted through space resection using pairs of the object coordinates of the existing building models and newly derived image coordinates from the matching process.

EXPERIMENTAL RESULTS
The proposed registration method is tested over the Toronto Downtown datasets provided by ISPRS Commission III, WG3/4 (Rottensteiner et al., 2012).Both reference building models manually digitized by human operator and LiDARdriven building models reconstructed by Sohn et al. (2012) are used as existing building models, respectively, to investigate effects on modelling errors of used models.The image used for the test covers the most of the existing building models (Figure 6(a)).A total of 16 check points, which are well distributed over the image, were used to evaluate the accuracy of the EOPs.From the image, a total of 90,951 straight lines were extracted and then 258,486 intersection points were extracted by intersecting any two straight lines with proximity constraint (20 pixels).Out of these, 57,767 intersection points were selected as edged corner features where approximately 15% and 60% of intersection points are removed by geometric constraint (  T =10º) and radiometric constraint ( mo ho T = 26 and hetero T = 55 by Otsu's binarization method), respectively (Table 1).After the existing building models were back-projected to the image using error-contained EOPs, edged corner features were extracted from the vertices of the building models in the image space.As shown in Figure 6, some edged corner features extracted from both the existing building models were not observed in the image due to occlusions caused by neighbour building planes.Also, some edged corner features extracted from LiDAR-driven building models do not matched with edged corner features derived from the image due to their modelling errors.Thus, correspondences between features from the image and from the existing building models are likely to be partly established.The proposed geometric hashing method was applied to find correspondence between features derived from the image and derived from the existing building models.When manually digitized building model is used as the existing building model, a total of 693 edged corner features were matched while a total of 381 edged corner features were matched for LiDAR-driven building models (Table 1).The number of matched features is affected by the quality of used existing building models.Based on matched features, EOPs of the image was calculated by applying the least square method.For qualitative assessment, the existing models were back-projected to the image with refined EOPs of the image.Each columns of Figure 7 and Figure 8 shows back-projected building models with errorcontained EOPs (a) and back-projected building models with refined EOPs (c), respectively.In the figures, boundaries of the existing building models are well matched building boundaries in the image with refined EOPs of image.As quantitative evaluation, we evaluated RMSE of check points back-projected to the image space with refined EOPs (Table 2).When 6 .0  m T , the result with manually digitized building models shows that the average difference in x and y directions are -0.27 and 0.33 pixels, with RMSE of ±0.68 and ±0.71 pixels respectively.The result with LiDAR-driven building models shows that the average differences in x and y directions are -1.03 and 1.93 pixels, with RMSE of ±0.95 and ±0.89 pixels, respectively.Although LiDAR-driven building models are used, the accuracy for the check points is less than 2 pixels.Considering that one pixel is approximately 15cm in ground sample distance (GSD), the refined EOPs provides a greater accuracy for engineering applications.3).As m T is smaller, the number of matched features increase.Interestingly, in terms of accuracy, the m T was affected by the quality of used building models.When accurately digitized building models are used, the matching accuracy shows constantly good accuracy regardless of m T .However, the results with LiDAR-driven building models show that the accuracy get worse as m T is smaller.Also, when high value is assigned to m T , the number of matched features is too small to recover accurate EOP of image.Therefore, the result indicates that better quality of the existing building models can lead better accuracy of the EOPs.

CONCLUSTIONS
In this study, we proposed a model-to-image registration method which aligns a single image with the existing 3D building models.Two types of matching cues, edged corner feature and context feature, are proposed for robust registration.
From the image, the edged corner features are extracted by calculating intersection of two neighbour straight lines and then verified by geometric and radiometric properties.For similarity measurement and matching, enhanced geometric hashing method was proposed by compensating the limitations of standard geometric hashing method.The qualitative assessment shows that boundaries of the existing building models were aligned with building boundaries of the image using refined EOPs.The quantitative assessment shows that use of existing building models can be applied to find accurate EOPs of image with acceptable and reliable accuracy.Also, an analysis on the effect of the used threshold value used was conducted.The results shows that the more accurate building models is used, the reliable accuracy can be achieved.As future works, we will conduct various analysis to confirm our proposed method's performance in various aspects.

Figure 1 .
Figure 1.Flowchart of the proposed model-to-image registration method Figure 2. (a) Edged corner feature and flanking regions, (b) context feature 2.1.2Context Feature While an edged corner feature provides only local structure information about a building corner, context features partly impart global structure information for configuration of the building object.Context features are set by selecting any two adjacent edged corner features, that is; four angles ( shown in Figure 2(b).Note that each angle is determined by the relative line connecting two corners (l).
selected as a base pair (Figure3(a)).The base pair is scaled, rotated, and translated into the reference frame.In the reference frame, the magnitude of the base pair equals 1; the midpoint between i p and j p is placed at the origin of the reference frame; The vector j i p p is the same as a unit vector of x axis.The remaining points of the model are located in the coordinate frame based on the corresponding base pair (Figure 3(b)).The locations (to be used as index) are recorded with the form (Model, used base pair ID, corner index (x and y coordinates in reference frame)) in the hash table which is quantized by a proper bin size.For all possible base pairs, all entries of corner points are similarly recorded in the hash table(Geometric Hashing (a)  the model points, (b) Information to be stored in hashing table (dot lines represent context features to be stored in hashing table.)After the selection of possible base pair from the image, remaining points in the image are transformed based on a selected base pair.Afterwards, optimal matches are determined by comparing similarity score which combine vote counting scheme and geometric properties of context features.The process starts by generating context features from the model and the image in reference frame.Given a model consisting of five edged corner features (black colour), ten context features can be generated as shown in Figure5.Note that all corners are not matched with corners from the image (red color).Thus, only matched corners and their corresponding context features (6 long-dash context features in the Figure5) are used in the calculation of the similarity score function.

Figure 5 .
Figure 5. Context features to be used for calculating score function Features Extraction; (a) image and existing building models, (b) image lines (black) and edged corner features (blue) from image, (c) back-projected model (red) and edged corner features (cyan) from manually digitized models and (d) LiDARdriven models.

Figure 7 .Figure 8 .
Figure 7. Results with manually digitized building models: (a) with error-contained EOPs, (b) matching relations (magenta) between edged corner features extracted from the image (blue) and the models (cyan) and (c) with refined EOPs

mT
has an effect on accuracy of the EOPs.In order to evaluate the effect on m T , we measure RMSE of the check points with different values of m T (Table

Table 1 .
Extracted features and matched features

Table 2 .
Quantitative assessment with check points (unit: pixel)In this study, threshold,