SPECTRAL INFORMATION RETRIEVAL FOR SUB-PIXEL BUILDING EDGE DETECTION

Building extraction from imagery has been an active research area for decades. However, the precise building detection from hyperspectral (HI) images solely is less often addressed research question due to the low spatial resolution of data. The building boundaries are usually represented by spectrally mixed pixels, and classical edge detector algorithms fail to detect borders with sufficient completeness. The idea of the proposed method is to use fraction of materials in mixed pixels to derive weights for adjusting building boundaries. The building regions are detected using seeded region growing and merging in a HI image; for the initial seed point selection the digital surface model (DSM) is used. Prior to region growing, the seeds are statistically tested for outliers on the basis of their spectral characteristics. Then, the border pixels of building regions are compared in spectrum to the seed points by calculating spectral dissimilarity. From this spectral dissimilarity the weights for weighted and constrained least squares (LS) adjustment are derived. We used spectral angle mapper (SAM) for spectral similarity measure, but the proposed boundary estimation method could benefit from soft classification or spectral unmixing results. The method was tested on a HI image with spatial resolution of 4 m, and buildings of rectangular shape. The importance of constraints to the relations between building parts, e.g. perpendicularity is shown on example with a building with inner yards. The adjusted building boundaries are compared to the laser DSM, and have a relative accuracy of boundaries 1⁄4 of a pixel.


INTRODUCTION
Automatic man made object detection from aerial and space images has been an active area of research for a few decades (Mayer, 1999;Gruen et al. 1995). Building extraction provides essential information for e.g. urban planning, generating 3D building models, and also change detection in cases of natural disasters. Images can be taken with different types of sensors according to the spectral, spatial and radiometric resolution. The advantage of hyperspectral (HSI) sensors is that they detect surface reflection in many narrow and contiguous spectral bands, e.g. 10 nm, and enable to identify surface materials by comparing spectral signatures to the spectra of known materials (Shaw and Manolakis, 2002). The similarity between spectra are measured with similarity measures, such as Euclidean distance or distances designed especially for HSI data, the Spectral Angle Mapper (SAM, Kruse et al., 1993), spectral information divergence (Chang, 2000). The spectral signatures of materials can be measured in laboratories with spectrometers, and stored in spectral libraries, e.g. the ASTER spectral library (Baldridge, et al., 2009). In contrast to the laboratory spectral libraries HSI images are influenced by the atmosphere. Thus, modeling of atmospheric influences is needed before comparing image spectra to laboratory spectra. Building a spectral library from image spectra evades the problem of atmospheric corrections; so such a library is image and sensor dependant, usually requiring an expert to select appropriate spectra. The spectral libraries are used for supervised classification of HSI images. Classification techniques such as support vector machines (Camps-Valls and Bruzzone 2005) or artificial neural networks (Chen and Tran, 1994) has been successfully developed and applied to HSI data. The spatial resolution of HSI images is generally lower compared to panchromatic (PAN) or multispectral (MS) images. Consequently pixels with building edges are usually spectrally mixed, so the pixel wise classification is not sufficient for precise building boundary extraction. A mixed pixel is a spatial pixel in a MS or a HSI image and is the result of a reflection from surface covered with more than one material. Soft classification techniques allow to assign more classes to one pixel, expressing membership of each defined class, but not a fraction of present materials (Villa et al., 2011). The fractions or abundances of mixed pixels are estimated by spectral unmixing algorithms using a collection of endmembers, e.g. spectral signatures of distinct material substance (Keshava, 2003). The basis for our idea for sub-pixel boundary detection is that fraction of materials in mixed pixels can support boundary estimation on the assumption that all the roofing materials are known. Sohn and Dowman (2007) describe the automatic building extraction as a hierarchical process, starting with feature extraction, then searching only the features relevant to the buildings and finally reconstructing the building. The geometrical features describing buildings, such as points and lines are difficult to detect in a HSI image due to the low spatial resolution. Many edge detection algorithms, e.g. Canny edge detector (Canny, 1986) are developed only for grayscale images or 3-band colour images. However, the Robust Colour Morphological Gradient (RCGM) edge detector (Evans and Liu 2006) is defined for any number of bands. Tarabalka et al. (2008) classify HSI images using two methods; first, they apply watershed segmentation to the gradient of HSI image, computed with the RCGM and second, using a SVM classifier. Then, they calculate majority voting between both results, and so improve classification by combining the spatial information from watershed segmentation with the spectral information from SVM classification. In their following work (Tarabalka et al., 2010) they use several classifiers to classify a HSI image and by combination of the classification results select markers, i.e. seeds, to build minimum spanning forests for final classification map. One of their proposed techniques is segmentation, based on iterative hierarchical stepwise optimization region growing, using SAM to join spectrally similar adjacent and non-adjacent regions.
Our main goal is the building boundary detection and not image classification. Therefore, we propose a similar technique of seeded region growing and merging for the first definition of a building region. In order to avoid using existing spectral libraries or large manual training of classifiers, the seeds are automatically selected from the digital surface model (DSM). Our contribution to the seed detection is in statistical testing of spectral similarity for group of seeds. Braun et al. (2011) classify roofing materials using SVM based technique with kernel composition; they fuse HSI image and laser data, by computing kernels for each of them and add the kernels before SVM classification. Huertas et al. (1999) model larger buildings from HSI and PAN images. The line segments, detected in the PAN image, are overlaid over the HSI image, and the roof hypotheses are tested only in the areas where buildings were detected in the HSI image. Thus, the HSI image approximately defines the position of the buildings and the final buildings outline is modeled on the basis of the higher resolution PAN image.
A main focus of our research is the estimation of building boundaries based solely on spectral information. A line in a grayscale image can be estimated with weighted least squares (LS) adjustment defining weights from the pixel's gray values (Sohn and Dowman, 2007). The rich spectral information of HSI images allows us to derivate weights by calculating the dissimilarity between roof pixels and pixels on the roof's border. Furthermore, we implement the knowledge about building shapes, e.g. rectangularity and perpendicularity between building edges by adding constraints to the LS adjustment. The results of the method are building boundaries estimated with sub-pixel accuracy.

METHOD
The proposed method consists of three main steps, (2.1) seed point selection, (2.2) region growing and merging, and (2.3) the building boundary estimation with sub-pixel accuracy. The main focus of the developed method is on the third part, estimating the building boundaries using the spectral information.

Seed point selection
The spectra of roofing materials can be defined by selecting pixels on the roofs in the HSI image. To select these seed pixels automatically we exploit the height data from DSM. The seed pixels should lie in the middle of the building roofs representing only one roofing material. Local high points in urban areas are mainly building roofs and peaks of high vegetation. So, the candidate pixels for seeds are selected as the local highest points in DSM by grayscale reconstruction (Vincent, 1993). These seed point candidates are mapped to the HSI image, and then the candidate seed points representing high vegetation are suppressed by calculating the Normalized Difference Vegetation Index (NDVI). The remaining seed point candidates include predominantly pixels belonging to the roofs. The candidate pixels are connected according to the spatial neighborhood into sets. The spectral similarity between all pairs of pixels x jk in each set is calculated. For the spectral similarity measure we use the SAM (Kruse et al., 1993), defined as where r is reference spectra, t is image spectra and n is number of channels in an image.
The spectra r and t are treated as vectors with n dimensions. If vectors r and t are equal, the SAM between them equals 0. The SAM is insensitive to the illumination because it calculates the angle between two vectors, regardless of their length. This characteristic of similarity measure is beneficial when dealing with building roofs where illumination differences are likely to appear.
where m is number of points in a set and Then, outliers are searched in each set of connected seed point candidates with corresponding similarities X by statistical testing using modified Thompson Tau statistics (eq. 4). Several statistical outlier detectors were tried, but modified Thompson Tau outperformed the others. The possible outliers in each set of points are inspected iteratively. In each iteration step, the pixel with the absolute maximal difference or maximal spectral dissimilarity denoted by j is tested where e jk is a vector of sums of SAM values between point j in the set and all other points and e j is mean value of vector e jk . The outlier is defined and removed from the set of points when S is the standard deviation of the sample of the vector e j and  is the critical value of modified Thompons's Tau statistics t /2 is the critical value for Student's t distribution with m-2 degrees of freedom, and confidence level .
The sets of connected seed point candidates with less than three points are not considered and removed prior to the outlier detection. This statistical testing is needed when projected seed point candidates are lying on the edges of the building or the set includes a larger object of different material. Pixels nearer to the edges of the buildings are more probably spectrally mixed, because they could be a result of reflection from a roof and e.g. a sidewalk. The mixed pixels are spectrally less similar to the pixels on the roof top and are therefore removed from the candidate seed pixels.

Region growing and merging in hyperspectral image
To estimate the building boundaries, first the building region must be identified. The sets of seed points are defining the material of each roof and are the start points for seeded region growing.
For each set of seed points, the mean spectrum q of seed points is computed. Furthermore, the stopping criterion for region growing is separately defined as the maximum spectral distance between seeds in the set (eq. 3) multiplied by a constant β. The constant β is an optional parameter of any positive real value typically 1.0-3.0. The more spectral homogenous sets of seeds get a smaller number as stopping criterion and vice versa. By setting β=1, the stopping criterion is set to the maximum spectral distance among the pixels in the set.
The 4-or 8-pixel neighborhood of seed points is searched and the SAM angles between region mean spectrum q and all the neighboring pixels are computed. Further on, the regions are grown and simultaneously merged if two neighboring regions are spectrally close. The results of this step are regions representing the building roofs of one material.

Sub-pixel building boundary estimation
The building regions are showing the shapes of the building polygons. Polygon sides are approximated using a weighted LS method. In chapter 3 we show an example for rectangular shaped building and set the required condition equations with constraint.
The building regions are the result of seeded region growing in the HSI image and subsequently pixels belonging to one region are spectrally similar. Furthermore, the bordering pixels to the building region, do not belong to it, but might include a part of roofing material; and as well the region border pixels might include some other material apart from a roof. This mixed pixels and their spectral dissimilarity compared to the spectrum of region seed points provide information about presence of roofing material. From the presence of the roofing material in mixed pixels the boundary of a known shape e.g. a line segment can be estimated. We propose a weighted and constrained LS technique to estimate sub-pixel building boundary (Mikhail and Ackermann, 1976).
Let's set that buildings have a rectangular shape and so each of the building regions can be approximated by a rectangle. The distance from mixed pixels to the line segments of the rectangle should be minimized, using the presence of roofing material for weights. The weight is defined as The functional model consists of four lines subsequently orthogonal to each other which form a rectangle (Figure 1). Each of this lines is described by a vector pair (n, d), where n is the normal vector to the line and d is the vector from the origin of the coordinate system to the line. The orientation of the rectangle is given by the normal vector n, and therefore the angles of the vectors d are not unknowns in the functional model. The length of the vector d is denoted by d. The condition that a normal vector must be of length one is a constraint to the equations (eq. 10). To describe a rectangle, five unknowns are defined, these are: distances d 1 , d 2 , d 3 , d 4 and the normal vector n with components n x and n y (eq. 9).An alternative way to count the unknowns of the functional model is, to count each of distances d as one unknown, and the components of the normal vector n separately, which are altogether six unknowns. The coordinates of each mixed pixel are observations with weights as defined in eq. 5. We set two sets of condition equations and the constraint to fit the rectangle (eq. 6). To find the best fitting rectangle by the means of LS, the quadratic form  (eq. 7) must be minimized.
where A, B, C are the matrices of numerical coefficients, v is the vector of residuals of observations,  is the vector of unknown parameters, W is the weight matrix of observations derived from spectral information, k, k c are the vectors of Lagrange multipliers and, f, g are the constant-term vectors. Every mixed pixel, i.e. every observation pair must be assigned to at least one of the lines of the rectangle and maximal to two lines, when dealing with corner points. The condition equations are dependant on the point assignment to the lines of the rectangle and therefore must be made prior to adjustment. We propose two simple techniques to assign points to the lines of the rectangle. First, diagonals of minimum bounding box of each region divide the plane in four quarters and accordingly the points are assigned (Figure 1). The second option is to assign each point to the nearest line. What is more, the minimum bounding box of each region is also used to calculate the first approximation of values of unknowns (d 1 , d 2 , d 3 , d 4 , n).
For each point x (observation pair) assigned to one of the rectangle lines, one equation is set i, j, m, p = 1, 2, …, number of points assigned to the edge of the rectangle and x is a vector of 2D coordinates of each observation.
The above system of equations (eq. 8-10) corresponds to the rectangular building. However, other geometrical shapes can also be approximated, e.g. any rectilinear building (U-, L-Cshaped), under the assumption that the shape of the border is known. For instance, the borders of rectangular building with a rectangular inner yard can be approximated by above described method twice independently, or when the inner yard borders are parallel to the building borders only once, adding additional four unknowns. Additionally, the adjustment of conic curves does not require any constraints and assigning points to lines, but buildings in shapes of e.g. ellipse are rare.

Data Set
The used HSI image was acquired with the airborne hyperspectral sensor HyMap with 125 channels in the spectral range 0.4-2.5 m and with the spatial resolution of 4 m. The HyMap image was prior to use geometrically and atmospherically corrected. Due to the high noise the last five channels of the HyMap image were removed before testing the proposed method.
Two different DSMs were used for the candidate seed point detection, i.e. CartoSat DSM and DSM calculated from multi view WorldView-2 (WV2) images with the spatial resolution of 5 m and about 1 m, respectively. The estimated building boundaries are compared to the sizes of buildings manually measured from aerial laser DSM (laser DSM). All used data sets are georeferenced, additional registration was not applied.

Experiment
The developed method described in chapter 2 is tested on two examples. For the first example, the seed points are detected from both, WV2 and CartoSat DSMs and for the second example only WV2 DSM is used. The step-wise method is shown on Figures 2-3 and the results are discussed in the subchapter 3.3. All colour images are true colour composites of the HSI image channels with the wavelengths 0.455, 0.528 and 0.675 m.
First the candidate seed points are detected in the DSMs (Figure 2), and superimposed on HSI image. Due to the different spatial resolution of DSMs and the HSI image, the candidate seed points are resampled to the spatial resolution of the HSI image (4 m) using nearest neighbour interpolation. Then, the seed point candidates are connected according to the 4-pixel neighbourhood. Subsequently, the regions with less than three pixels and the seed point candidates on high vegetation are removed using a threshold NDVI>0.4. The removed seed point candidates can be observed comparing Figures 2 and 3. For example southeast of the building in the centre is an area with high vegetation where seed point candidates are removed. Furthermore, the seed pixel regions are statistically refined (confidence level  = 0.05) by iteratively searching for outliers. The blue points are seeds and are used as an input for region growing and merging (Figure 2, left). Figure 3 (right) is a result of region growing in 120 channels of the HyMap image, by searching spectrally similar pixels in the 4-pixel neighbourhood. The minimum bounding box of the building region is defined, and pixels on the border, i.e. probably mixed pixels are assigned to the sides of the rectangle. Assigning pixels according to the diagonal of the minimum bounding box or according to the minimal distance, did not show any significant difference in the results. The building boundaries are computed using weighted LS adjustment. In Figure 4 (left) the outbound of the building and both inner yards are independently calculated. The rotation of the northern inner yard compared to the building's main orientation is clearly seen. In Figure 4 (right), we set that the outside boundaries must be parallel to the inner yards, and therefore no rotation is present. Figure 4: Result of building boundary approximation with subpixel accuracy (yellow lines). Outer and inner building boundaries are: independently adjusted (left), and adjusted with the condition that they have the same orientation (right).
On the second example, an outbound of the U-shape building is approximated. The seed points are selected from WV2 DSM, because they could not be chosen from CartoSat DSM due to the high noise. The adjusted boundaries of the building outbound are marked yellow in Figure 5.

Discussion
The building regions are needed to localize and describe the shapes of the buildings. The regions are defined by the seeded region growing and are dependant on the quality of the initial seed selection. Thus, the outliers among seeds are removed using statistical testing. The Thompson Tau statistics penalize more points as outliers as minimum needed and is therefore an appropriate test statistics. Furthermore, the seeds are selected as local highest points in the DSMs. Both used DSMs were automatically computed and include inaccuracies and noise ( Figure 6) which are influencing the selection of local high points. The buildings in CartoSat DSM (blue) have rounded borders, often reaching beyond the real boundaries. This can be observed in Figure 6 by comparing the widths of the building on the left or in the middle to the other DSMs (grey, green). In addition, some errors appear, e.g. a part of the building in the middle of Figure 2 (left) is missing. The seed selection from CartoSat DSM is possible only for the larger and wider buildings, due to the 5 m resolution. The WV2 DSM has a spatial resolution of about 1 m, so seeds on smaller buildings are found. However, the consequence of high frequency noise present in the WV2 DSM ( Figure 6, grey) is that some local high points are not on the building roofs, but on the streets (Figure 2, right). The applied region growing is highly dependant on the used DSM. To define building regions more robust, other classification techniques should be considered. However, the main focus of this research is the boundary estimation in HSI images using spectral information, and not the classification of HSI images. The comparison between vertical profiles of all used DSMs and the adjusted building boundaries is shown in Figure 6. The seed points are selected from CartoSat DSM (blue), and the height of the adjusted building (red) is set to the highest point of this building in the CartoSat DSM. The shift between the profiles is due to the accuracy of georeferencing. The width of the estimated building boundaries in the profile is similar compared to the reference laser DSM (green). The numerical comparison of sizes, i.e. width and length of estimated buildings with the sizes in reference is given in Table 1. For both examples, the outbound of the buildings are estimated better than a pixel. The difference in the roof area between reference and approximated building boundaries is 4% ( Figure 4) and 1% ( Figure 5, left).

Approximated no condition with condition
Building (Figure 4)  The used LS adjustment is accurate, if redundancy of the measurements is high and observations are free of outliers. The sensitivity of LS adjustment to the low number of redundant measurements is seen on Figure 4 (left) where the north inner yard is rotated and shrinked (Table 1). Only three pixels were assigned to the west border of the north inner yard. In addition, when closely observing the spectral signatures of the mixed pixels on the border of this inner yard, their shape is significantly more similar to the roof pixels and not to the pavement material. We assume this is a consequence of the sensor or not removed atmospheric influences. The similar effect is present on the borders of the first building southeast of the approximated building ( Figure 5). The used spectral similarity SAM for weights has a benefit of being nearly illumination independent, but does not provide the real material abundances.

CONCLUSIONS AND FUTURE WORK
We developed an automatic method for sub-pixel building edge detection in HSI images using spectral and spatial information. The roofing materials in HSI images are defined solely with support of height data and without training samples or existing spectral libraries. The building edges are approximated with sub-pixel accuracy using weighted LS method. The weights are derived for each building separately, using spectral information of the mixed pixels on the building borders.
The first results of the proposed building boundary estimation method are promising. The relative accuracy of estimated building outbound for the shown examples is ¼ of a pixel, which corresponds to 1 m for the used HSI data. A further investigation and comparison of building boundary approximation should be made, defining weights with spectral unmixing, other similarity measures or soft classification techniques. Furthermore, the method should be tested on larger urban areas and extended to approximate more complex buildings, e.g. rectilinear or polygonal.
We expect to receive this year the HSI data from Hyspex sensor with 0.5 m spatial resolution. Thus, the potential of the proposed method is to estimate the building boundaries with accuracy comparable to laser DSM. Furthermore, the building roofs can be modelled automatically using combination of the HSI and DSM data.