ORIENTATION BASED BUILDING OUTLINE EXTRACTION IN AERIAL IMAGES

: The goal of this paper is to extract automatically the building contours regardless of shape. By extracting these contours, detection results will be more accurate, giving useful information about urban area, which is important for many tasks, like map updating and disaster management. First, we extract local feature points from the image, based on a modiﬁcation of Harris detector’s saliency function, which can represent urban area and building effectively. This point set is then used to deﬁne the main orientation of the buildings, which characterizes well an urban region and helps to deﬁne directions, where object contours have to be searched. Second, we applied shearlet approach to extract edges in the deﬁned directions. This results in an edge map, which helps us to determine point subsets belonging to the same building. Convex hulls of the point subsets is used for contour initialization, then region based Chan-Vese Active Contour method is applied to extract the accurate building outlines.


INTRODUCTION
Automatic evaluation of aerial photographs is a very important research topic, as the manual analysis is very time consuming.Nowadays, there are many approaches for multispectral or synthetic aperture radar (SAR) images, but developing methods handling optical aerial or satellite images have also big importance (Zhong and Wang, 2007), (Dey et al., 2011).When working on optical photographs, the challenge is the large variety of features: images can be grayscale or containing poor color information, scanned in different seasons and in altering lighting conditions.In this case, pixel neighborhood processing techniques like multilayer difference image or background modeling (Benedek and Szirányi, 2008) cannot be adopted efficiently since details are not comparable.
In our work, we concentrate on building detection, which is a very important task, as land area might be changing dynamically and a continuous periodic administration is necessary to have upto-date information.This is very useful for urban development analysis, map updating and also helps in disaster management, vegetation monitoring and discovering illegal surface forming activities.The challenges of building detection partly come from the aforementioned diverse imaging circumstances, causing different color, contrast and shadow conditions.On the other hand, the shape of different buildings is quite various, which needs sophisticated techniques to have more accurate results.
There is a wide range of publications in remote sensing topic for urban area and building detection.However, we concentrate on novel approaches which will be used for comparison later.Stateof-the-art building detection approaches can be divided into two main groups.The first group contains methods which only localize buildings without any shape information (Sirmacek and Unsalan, 2009) and (Sirmac ¸ek and Ünsalan, 2011).In these approaches, only the location of the building is detected.In (Sirmacek and Unsalan, 2009) a SIFT salient point based approach is introduced for urban area and building detection (denoted by SIFT-graph in the experimental part).This method uses two templates (a light and dark one) for detecting buildings.After extracting feature points representing buildings, graph based tech-niques are used to detect urban area.The given templates help to divide the point set into separate building subsets, then the location is defined.However, in many cases, the buildings cannot be represented by such templates, moreover sometimes it is hard to distinguish them from the background based on the given features.(Sirmac ¸ek and Ünsalan, 2011) proposes a method to detect building positions in aerial and satellite images based on Gabor filters (marked as Gabor filters in the experimental part), where different local feature vectors are used to localize buildings with data and decision fusion techniques.
The other group contains approaches which use some shape templates (e.g.rectangles) for detecting the buildings (Song et al., 2006), (Sirmac ¸ek and Ünsalan, 2008) and (Benedek et al., 2012).In this case, beside the location, additional information is given about the size, orientation and shape.In (Song et al., 2006) a segment-merge technique is introduced (Segment-Merge), which represents a distinct trend.This method considers building detection task as a region level problem and assumes that buildings are homogeneous areas (either regarding color or texture information), and based on this fact, they can be distinguished from the background.In the first step, the background is subtracted, then some shape and size constraints are created to define building objects.However, the basic assumptions influences the success of the approach: sometimes buildings cannot be distinguished from the background effectively by using color and texture features, therefore the further steps will also fail.
(Sirmac ¸ek and Ünsalan, 2008) (named as Features-Canny in the experimental part) combines roof color, shadow and edge information in a two-step process.First, a built-in candidate is defined based on color and shadow feature, then a rectangle template is fitted using a Canny edge map.This sequential method is very sensitive to the deficiencies of both steps: the inappropriate shadow and color information results in false candidates, and accurate detection is not possible with a malfunctioning edge map.
A novel building detection approach is introduced in (Benedek et al., 2012), using a global optimization process, considering observed data, prior knowledge and interactions between the neigh-boring building parts (marked later as bMBD).The method uses low-level (like gradient orientation, roof color, shadow, roof homogeneity) features which are then integrated to have object-level features.After having object (building part) candidates, a configuration energy is defined based on a data term (integrating the object-level features) and a prior term, handling the interactions of neighboring objects and penalizing the overlap between them.The optimization process is then performed by a bi-layer multiple birth and death optimization.
Although the second group provides some information about the shape, it is still just an approximation.Therefore, our aim is to construct a method, which can deal with the shape diversity.In the first step, we generate a feature point set, based on our modification of the Harris corner detector (Kovács and Szirányi, 2012), which is able to represent object contours effectively.Next, we calculate main directions in the surroundings of feature points to get orientation information, which characterizes the urban area as well.Then, an improved edge map is constructed by strengthening the edges only in the calculated main directions with shearlet method (Easley et al., 2009).Based on the edge map, the feature point set is divided into subsets by a graph based connectivity detection (a similar was introduced in (Sirmacek and Unsalan, 2009)), where each subset represents a building candidate.Using the convex hull of the point subset for initialization, Chan-Vese active contour method (Chan and Vese, 2001) detects the final boundary of the building.Evaluating the method on aerial images provided by the Hungarian Institute of Geodesy, Cartography and Remote Sensing, the initial results show that our proposed method is able to detect buildings more efficiently and it can be a rival for other state-of-the-art methods.

Original Harris detector
The detector was introduced in 1988 (Harris and Stephens, 1988) and based on the principle that at corner points intensity values change largely in multiple directions.By considering a local window in the image and determining the average changes of image intensity result from shifting the window by a small amount in various directions, all the shifts will result in large change in case of a corner point.Thus corner can be detected by finding when the minimum change produced by any of shifts is large.
The method first computes the Harris matrix (M ) for each pixel in the image, consisting the product of the first order derivatives smoothed by a Gaussian window.Then, instead of computing the eigenvalues of M , an R corner response is defined: with Det and Tr denoting the determinant and trace of M and k is a coefficient, usually around 0.04.
This R characteristic function is used to detect corners.R is large and positive in corner regions, and negative in edge regions (

Modified Harris based feature map
When working on contour detection, contour points have to be emphasized with some techniques.In our previous work (Kovács and Szirányi, 2012), we introduced a modification of the original Harris method, which is able to emphasize edges and corners  equally, therefore can be applied efficiently for generating a feature map for active contour approaches.The proposed modification looks as follows: where λ1 and λ2 denote the eigenvalues of M .When emphasizing corners and edges, they both have one large component, thus max(λ1, λ2) function separates the flat and non-flat regions accurately.To produce a steady feature map, the dynamics of the characteristic function should be compressed into a balanced distribution by keeping the necessary strength of the main attractors.
The natural logarithmic (log) function satisfies this condition: it has a balanced output for both corner and edge saliency.The target set of the R logmax is the positive domain (when it is used as a feature map), thus the outer max function is responsible for replacing negative values of small λ (points in flat regions) with zeros.
By calculating the local maxima of the proposed R logmax function, the modified feature point set is defined, see Figure 2.

Urban area detection
The advantage of the (original and modified) Harris detector is its strong invariance to rotation, illumination variation, image noise and robustness on fixed scales.Therefore it can be an efficient tool in aerial image segmentation and can handle the altering characteristics of different images.The first question in applying the proposed function for the building detection task was if the proposed extended feature point set could represent the urban area.Is it possible to use these points for building detection?
To get answers to these questions, we have evaluated the point set for urban area detection.For testing, we used spatial voting, which was proposed in (Sirmac ¸ek and Ünsalan, 2010) for Gabor filter based local feature points.The method assumes that when detecting an urban area, many local feature points should be in it located closely in the spatial domain, and around the points there is a high possibility of urban features.Therefore, the constructed voting matrix has the highest vote at the location of the feature point (xi, yi for the i th point), and the vote is decreasing around it in accordance with the spatial distance: where σi is the parameter for voting proximity for point (xi, yi).
After calculating V for every pixel in the image, Otsu thresholding (Otsu, 1979) was applied to distinguish urban area from background.
Testing the method for modified Harris feature points for many different aerial images, the results showed that urban area is well represented by this point set.A typical result can be seen in Figure 3, where a grayscale image was used with limited color information and some cloud shadows generating false contours in the left size of the image, which makes the detection harder.However, the generated point set is mostly situated in the urban area and the spatial voting can detect the area quite accurately.

ORIENTATION BASED BUILDING DETECTION
Our tests have shown that the proposed Harris point set is able to represent urban area, therefore features of the points can also be used to extract information of the urban area to improve building detection.Our idea was to extract the main direction in the small neighborhood of the feature points which can characterizes the urban area.Edges in the defined direction are strengthened with the shearlet method (Yi et al., 2009).Building candidates are determined then based on the improved edge map and the feature point subset.The accurate building contours are generated with Chan-Vese active contour method (Chan and Vese, 2001).

Orientation estimation
A small urban area has buildings with connected orientation.In most cases, houses are oriented according to some bigger structure (e. g. the road network), therefore main orientation of the area can be defined.As the proposed modified Harris point set represents the area, our idea was to calculate the main direction of the buildings of the area based on this point set.(Benedek et al., 2012) used a low level feature, called local gradient orientation density, where the surroundings of a pixel was investigated whether it has perpendicular edges or not.Now we use a similar feature to calculate the main direction of a feature point's neighborhood.Let us denote the gradient vector by ∇gi with ∇gi magnitude and ϕ ∇ i orientation for the i th point.By defining the n × n neighborhood of the point with Wn(i) (where n depends on the resolution), the weighted density of ϕ ∇ i is as follows: (5) After calculating the direction for all the K feature points, we define the density function ϑ of their orientation: where Hi(ϕ) is a logical function: We expect that the density function ϑ will have two main peaks (because of the perpendicular edges of buildings), see Figure 4.This can be measured by correlating ϑ to a bimodal density function: where η2(.) is a two-component mixture of Gaussian, with m and m + 90 mean values and d ϑ is standard deviation for both components.The value θ of the maximal correlation can be obtained as: And the corresponding orthogonal direction (the other peak of α(m)): Thus, we expect building edges to be in the calculated main orientation and we try to enhance edges in the given directions.We tested the orientation estimation approach for different datasets and different window size (n × n neighborhood).The results show that with larger size n = 15, the density function is smoother (see Figure 4(b)), while with smaller size (n = 5) rougher and blurred, but the main characteristics and the main peaks were obvious in both cases, therefore we only showed the result of 15×15 neighborhood in Figure 4.

Edge detection with shearlet transform
Now, the main orientations have been defined.The next step is to enhance edges in these directions to extract useful edge information from the image that can be later combined efficiently with the feature point set.There are different approaches which uses directional information like Canny edge detection (Canny, 1986) using the gradient orientation; or (Perona, 1998) which is based on anisotropic diffusion, but cannot handle the situation of multiple orientations (like corners).Other single orientation methods exist, like (Mester, 2000) and (Bigun et al., 1991), but the main problem with these methods is that they calculate orientation in pixel-level and lose the scaling nature of orientation, therefore they cannot be used for edge detection.In our case we need to enhance edges constructed by joint pixels, thus we searched for such edge detection method which can handle orientation as well.Moreover, as searching for building contours, the algorithm must handle corner points as well.Shearlet transform (Yi et al., 2009) has been lately introduced for efficient edge detection, as unlike wavelets, shearlets are theoretically optimal in representing images with edges and, in particular, have the ability to fully capture directional and other geometrical features.
For an image u, the shearlet transform is a mapping: providing a directional scale-space decomposition of u with a > 0 is the scale, s is the orientation and x is the location: where ψas are well localized waveforms at various scales and orientations.As we are working with a discrete transform, a discrete set of possible orientations is used, for example s = 1, . . ., 16.
In our case, the main orientation of the image θ is calculated (see Section 3.1), therefore our aim is to strengthen the components in the given direction on different scales as we only want to detect edges in the main orientation.The first step is define the s subband which includes θ and θ ortho : After this, the SH ψ u(a, s, x) and SH ψ u(a, sortho , x) subbands have to be strengthened.For this reason, the weak edges (values) have been eliminated with a hard threshold and only the strong coefficients are amplified.
Finally, the shearlet transform is applied backward (see Eq.12) to get the reconstructed image, which will have strengthened edges in the main directions.The strengthened edges can be easily detected with Otsu thresholding, results can be seen in Figure 5.While the pure Canny method detects the edges sometimes with discontinuities, the shearlet based edge strengthening helps to eliminate these problems.As building colours may vary largely and the shadow effect have to be reduced to eliminate false contours when detecting buildings, we used two color channels in the edge strengthening step: for red buildings we used the u * component of CIE L * u * v advised in (Muller and Zaum, 2005); for grey buildings we applied the C b component of YC b Cr space, which was found to separate grey coloured objects and their shadows the most effectively in (Tsai, 2006).First, we extracted the red building outlines based on the u * edge map, after that the remaining buildings were detected in the C b -based map.

Building contour detection
After defining the main orientation for the extended Harris point set for the urban region, the shearlet based edge strengthening approach enhanced the edges in the main directions, resulting in an edge map S. In the next step we will compound the feature point set and edge map with a graph based representation, which was introduced in our previous work (Kovacs and Szirányi, 2010); based on the generated edge map, connected feature point subgraphs are determined, indicating building candidates.The E edge network of G = (V, E) graph is constructed by connecting vi = (xi, yi) and vj = (xj, yj), the i th and j th vertices of the V feature point set, if they satisfy the following conditions: 1. S (x i ,y i ) = 1 , 2. S (x j ,y j ) = 1 , 3. ∃ a finite path between vi and vj in S .
The result of this a graph composed of many separate subgraphs, where each subgraph indicates a building candidate.However, there might be some singular points and some smaller subgraphs (points and edges connecting them) indicating noise.To discard them, we select subgraphs having points over a given threshold.
To detect the accurate contour of the buildings, we use Chan-Vese active contour algorithm (Chan and Vese, 2001) and initialize the contour for a building candidate as the convex hull of the vertices of the subgraph.
Main directional edge emphasis may also enhance road and vegetation contours, moreover some feature points can also be located on these edges.Therefore after the contour extraction step the results have to be supervised to filter out misdetections.When detecting false objects, like road parts or land section borders, the edges in the detected area are unidirectional, unlike buildings, which have either orthogonal or multidirectional contours.Thus, the directional distribution of edges is evaluated in the extracted area (see the technique in Section 3.1) and unidirectional hits are eliminated.Here, we use again the correlation to a bimodal density function (Eq.8) then measure and threshold the α value to select multidirectional hits.
Figure 6 shows the result of the building detection with the detected and filtered contours.Based on the contours, we can estimate the location of the buildings which will be useful in the further work for evaluation and comparison.

EXPERIMENTS
We have evaluated our proposed method for the Szada dataset provided by the Hungarian Institute of Geodesy, Cartography and Remote Sensing.This dataset was also used in (Benedek et al., 2012) for evaluation and for comparison with different methods ( (Sirmacek and Unsalan, 2009), (Sirmac ¸ek and Ünsalan, 2011), (Song et al., 2006) and (Sirmac ¸ek and Ünsalan, 2008)), therefore quantitative test results are available.In Table 1 the quantitative results for Szada dataset is shown.The name of the compared methods is abbreviated as marked in the introduction part.The complete dataset contains 57 buildings out of which our method is able to detect 55 buildings (meaning 2 misdetections) with 0 false positive object.In this case we used the location of the buildings (see Figure 6(b)) for evaluation.By comparing this with the other approaches, one can see that our method is able to outperform the others.
For qualitative evaluation we used the detected outlines (see Figure 6(a)).A part of this image was cut and enlarged to compare our method qualitatively with (Benedek et al., 2012).Figure 7 shows the detailed differences.However, rectangular templates provide a very close estimation for the shape of the buildings, the fine details are lost.Unlike shape templates, active contour based techniques do not apply any restrictions for the shape and able to detect the varying contour parts more accurately.
Although active contours are able to cope with the altering shapes, sometimes they suffer from the lack of contrast difference between the building and the background and have difficulties when detecting contours (like missing a part of the building outline, see the building in the bottom-middle of Figure 6(a)).Moreover, sometimes buildings are oriented variously and the edge strengthening step misses their edges, which may result in misdetections.

CONCLUSION AND FUTURE WORK
In this paper, we have introduced a novel orientation based method for building detection in aerial images.The proposed method  first calculates a feature point set based on the modification of the well-known Harris corner detector.In the first step, we have proven the detector's ability to characterize urban area by testing the point set with a voting matrix technique for urban area detection.The orientation feature of the point set is then used to define main direction of the urban area, to make an edge strengthening in the given directions.Shearlet transform have been applied for the orientation based edge enhancing, as it is able to handle orientation information, even in multidirectional cases (like corners).The improved edge information is combined with the feature point set and a graph based technqiue was introduced to get feature point subgraphs as building candidates.Finally, Chan-Vese nonparametric active contour approach was applied to extract the building contours.The proposed method have been compared with other state-of-the-art algorithms quantitatively and qualitatively.The results showed that our approach can detect buildings more effectively than the others, either for pure localization and contour extraction.
However, this paper is just the first step of work, introducing the main principles and needs more evaluation for different databases.

Szada dataset (57 buildings)
Missing objects False objects SIFT-graph (Sirmacek and Unsalan, 2009) 17 26 Gabor filter (Sirmac ¸ek and Ünsalan, 2011) 17 23 Features- Canny (Sirmac ¸ek and Ünsalan, 2008) 10 18 Segment-Merge (Song et al., 2006) 11 5 bMBD (Benedek et al., 2012) 4 1 Proposed 2 0 Table 1: Quantitative results for Szada dataset Moreover, there are still some open questions, which have to be answered in the near future.If the resolution of the aerial image is smaller, with larger scanned urban area, the orientation of the buildings may varying, which have to be handled, for example by correlating the orientation distribution function with multiple bimodal Gaussian functions.Furthermore, the active contour method might suffer from difficulties when building contours are hided by other structures like trees.In this case some prior constraints (like edge parts running in the defined main orientations) can be introduced.
Figure 1(b)).By searching for local maxima of R, the Harris keypoints can be found ( see Figure 1(c)).

Figure 1 :
Figure 1: Operation of the Harris detector: (a) shows the original image; (b) is the R characteristic function; (c): Keypoints chosen as local maxima of R.

Figure 2 :
Figure 2: Operation of the modified Harris detector: (a) shows the original image; (b) is the proposed R logmax function; (c): Keypoints chosen as local maxima of R logmax .

Figure 3 :
Figure 3: Urban area detection with spatial voting: (a) shows the original image; (b) is the detected urban area.

Figure 4 :
Figure 4: Orientation estimation for an image: (a) shows the feature points in yellow; (b) shows the λi(ϕ)orientation density function of the points in blue, calculated for 15 × 15 neighborhood.ϕ ∈ [−90, +90] is the horizontal axis, the number of points is the vertical axis.The η2(.) two-component Gaussian mixture is in red, detected peaks are θ = −47 and θ ortho = +53.

Figure 5 :
Figure 5: Comparing the edge maps for u * channel : (a) shows the result of the pure Canny edge detection; (b) is the result of the shearlet based edge strengthening.

Figure 6 :
Figure 6: Result of the building detection: (a) shows the detected contours; (b) is the estimated locations of the detected buildings.

Figure 7 :
Figure 7: Qualitative comparison of MPP-based and proposed method: (a) is the original image part; (b) shows the result of MPP-based method; (c) is the result of the proposed approach.