TOWARDS A SEMANTIC INTERPRETATION OF URBAN AREAS WITH AIRBORNE SYNTHETIC APERTURE RADAR TOMOGRAPHY

In this paper, we introduce a method to detect and reconstruct building parts from tomographic Synthetic Aperture Radar (SAR) airborne data. Our approach extends recent works in two ways: first, the radiometric information is used to guide the extraction of geometric primitives. Second, building facades and roofs are extracted thanks to geometric classification rules. We demonstrate our method on a 3 image L-Band airborne dataset over the city of Dresden, Germany. Experiments show how our technique allows to use the complementarity between the radiometric image and the tomographic point cloud to extract buildings parts in challenging situations.


INTRODUCTION
A Synthetic Aperture Radar (SAR) is a powerful Earth observation tool.By sending and receiving pulses in the microwave region of the electromagnetic spectrum, it allows to retrieve information related to the physical and geometrical backscattering properties of objects.Being an active system, it is operational independently of the weather and illumination conditions, unlike optical sensors.However, due to the side-looking acquisition geometry, interpretation of SAR images is more difficult than optical ones.A SAR image is a 2D projection of scattering phenomena occurring in the 3D space.The position of an object in the image is due to the time delay between the sent pulse and the received echo, which depends on its distance to the sensor.Therefore, echoes due to objects located at the same distance of the antenna will be superposed in the same pixel.This effect is commonly called layover (Gini et al., 2002).
Fortunately, techniques employing several images can solve the layover problem and localize several scatterers in the 3D space.In the case of 2 images, SAR interferometry allows to relate phase differences to heights but is limited to only one scattering phase center (Bamler and Hartl, 1998).SAR tomography extends this approach to several scatterers by using 3 images or more.With this technique, applications such as imaging of urban areas (Sauer et al., 2011) as well as retrieval of the vertical structure of forested areas (Tebaldini, 2010) have arisen.
Very few works have addressed information retrieval from the 3D points obtained with tomographic SAR.First attempts to automatically interpret such data are geometric primitive extraction from airborne data (D'Hondt et al., 2013a) and facade extraction from very high resolution spaceborne data (Shahzad and Zhu, 2015).
In this paper, we introduce a technique to detect and reconstruct building parts from tomographic SAR data.Unlike the work of (Shahzad and Zhu, 2015) which is suitable for very high resolution data where point-like scatterers are assumed, our method is designed for moderate resolution (of the order of a few meters) airborne images.It uses tomographic heights but also incorporates radiometric information from the reflectivity images and exploits pixel connectivity in the slant range geometry.
The type of data we consider is affected by speckle, due to the interaction between the electromagnetic wave and random media (Goodman, 1984).Decorrelation effects between images may also occur.For these reasons an incoherent target model has to be assumed (Huang et al., 2012).In this case, the information of interest for TomoSAR is the covariance matrix which has to be estimated at each pixel position.
Our method relies on a planar primitive extraction similar to the one of (D'Hondt et al., 2013a) but improves the approach in two aspects.First, the segmentation of the tomographic height map is guided by the radiometric reflectivity information from the images.Adaptive filtering of covariance and pre-segmentation of the radiometric images allow this improvement.Second, a classification step of the obtained geometric primitives follows, to recognize building parts.This last step relies on geometric rules to identify facades and roofs.A visual summary of the main steps of our technique is shown on Figure 1. Figure 2 shows a more detailed scheme of the method.
The paper is organized as follows: Section 2. presents the dataset and the area we selected to test our method.Section 3. describes the principles of SAR tomography.Section 4. introduces our building extraction method.Section 5. shows the results of our experiments on the selected dataset.Finally, conclusion and perspectives are exposed in Section 6.

DATASET: AIRBORNE L-BAND E-SAR IMAGES
Three SAR images have been acquired in L-Band over the city of Dresden, Germany, by the E-SAR airborne sensor of the German Aerospace Center (DLR) in 2000.The resolutions are 3 m in azimuth and 2.2 m in range.We have cropped an area of interest of approximate size 700 ⇥ 1000 pixels that contains a wide variety of buildings with different dimensions and orientations.Figure 3 shows the SAR intensity image of the cropped area and an optical image of roughly the same area (due to the geometric  distortions in the SAR image, the crops cannot correspond perfectly).Many vegetated areas can be seen on the corresponding optical image and the presence of trees between buildings can be noted.At this frequency band, tomographic imaging of buildings and ground through the canopy is possible due to the penetration of the microwaves.However, due to the low number of images, vegetation makes the tomographic imaging of such buildings difficult.Moreover, geometric distortions due to the acquisition geometry and the presence of speckle corrupting the intensity make a semantic interpretation challenging.

Acquisition Geometry and Layover
A SAR image is acquired thanks to a side-looking antenna located on a moving platform such as a plane or a satellite.The  Figure 4: Illustration of tomographic SAR principle: due to the acquisition geometry of a SAR sensor, the backscattering of several targets may be superposed in the same pixel.This effect is called layover.Moreover, the altitudes are lost.To separate and localize these scatterers in the 3D space, SAR tomography combines several images acquired with slightly different incidence angles.
between the imaged objects, called targets, and the sent pulse.These echoes are then combined to form an image in a stage called SAR focusing.Description of this procedure is beyond the scope of this paper and can be found in several publications, such as (Oliver and Quegan, 1998).SAR produces 2D complex images where the pixel amplitude is due to the power backscattered to the sensor and the phase mainly depends on the distance to the object.
The image geometry is described by two coordinates, the azimuth along the direction of flight and the slant range which is the lineof-sight direction.Considering the platform at a particular azimuth position, the received signal is time-gated in order to separate the echoes of the different targets into pixels according to their corresponding time delays.Because the wave travels at the known speed of light, there is a straightforward relationship between time delay and slant range coordinate in the SAR image.
Hence, objects are projected from the 3D space to the azimuth/ slant range coordinate system of the SAR.Therefore, the value of a pixel corresponds to the coherent sum of echoes from scatterers located within a given slant range interval, as shown in the idealized drawing of Figure 4.As depicted, if a non-flat structure is present, several targets with distinct 3D positions may have their contributions added into the same pixel.This effect is called layover.
In order to separate such contributions, SAR tomography uses several images of the same scene under slightly different view angles.The following section briefly explains how using an appropriate signal model and spectral methods allows to localize the 3D positions of targets.
Furthermore, multiple bounce scattering between ground, facade, tree trunks, etc may occur.These affect the phase and may cause wrongly located points in the tomographic point clouds.Double bounce scattering between ground and facade are typically characterized by bright lines in the image.Besides, due to the side looking geometry, building and ground parts may be located in the shadows of other structures, which are seen as dark areas in the image.Examples of such phenomena are illustrated in Figure 5.

Signal Model
We consider K SAR images of the same scene acquired with slightly different angles.Assuming all the images have been coregistered to the geometry of one of them, called master, we have a tomographic image stack with K-dimensional complex pixels.In SAR images with a resolution of the order of meters, a target may contain many elementary scatterers due to the randomness of media.This results in a phenomenon called speckle due to constructive and destructive interference between all elementary scatterers, which can be modelled as a multiplicative noise (Goodman, 1984).Consequently, for the target vector k containing the K complex values of a pixel in the tomographic stack we assume the following signal model (Lombardini et al., 2003): where represents the Hadamard product, Ns is the number of targets, hi and ⌧i are the height and reflectivity of target i, xi is speckle vector of density CN (0, ⌃x i ), n is a Gaussian additive noise vector of density CN (0, 2 n I) and a is called the steering vector.Its elements are a l = exp ( j 4⇡ d l (hi)) and contain phase information related to target i and its distance d l (hi) to sensor l, where is the wavelength.The covariance ⌃x i has diagonal elements equal to 1 and off-diagonal elements which depend on decorrelation effects.

Sample Covariance Matrix
First, standard steps such as de-ramping and phase calibration, to compensate possible errors on antenna location, are performed on the image stack.More details can be found in (Reigber and Moreira, 2000).From this signal, SAR tomography's goal is to retrieve the vertical positions of the targets.This problem can be recast as a Direction-Of-Arrival problem, solved by standard spectral estimation procedures such as Beamforming, Capon and MUSIC (Stoica and Moses, 1997).These methods all require the estimation of the covariance matrix ⌃ = E(kk H ).This matrix being unknown, it has to be estimated from the data by performing an average over L i.i.d.samples (i.e.pixels): where j denotes the pixel index.This estimate can be obtained by applying a moving average window or pre-summing, at the cost of a spatial resolution loss.

Tomographic Processing
In this work we choose to use the MUSIC method assuming one scatterer per pixel (MUSIC-1).
In (Sauer et al., 2011), it is shown to give the best results on this data for single polarization when compared to other approaches.
The MUSIC method consists in maximizing the pseudo-power: where G is the matrix of covariance eigenvectors which span the noise subspace and a(h) is the steering vector computed at height h.This pseudo-power is called tomogram and its Ns local maxima give the wanted positions of the scatterers.
MUSIC-1 allows to extract for each pixel the height of the dominant scattering mechanism which leads to a 2D height map with the same dimensions as the image.Once the target heights are known, the points may be reprojected from the SAR geometry into the original ground geometry (x, y, z), leading to a 3D point cloud.In the following we refer to the terms height map when the points are in the SAR azimuth/slant range geometry and point cloud when the points have been re-projected in the original geometry.

Adaptive Covariance Estimation
Estimation of the covariance matrix by equation ( 2) leads to a trade-off between variance reduction and degradation of spatial resolution.If a high number of samples is used, edges between objects are blurred which may cause problems for the tomographic inversion and object detection.In order to mitigate this effect, we apply an adaptive procedure to improve the estimation of the covariance matrix without degrading edges.This filter was originally developed for polarimetric SAR data (D'Hondt et al., 2013b) and is based on the well-known bilateral filter.Applying this filtering for covariance estimation has two advantages.First, it reduces speckle on intensity which is useful for our radiometry guided segmentation.Second, it improves the quality of the phase used in tomographic inversion, leading to a better quality height map.
Figure 6 shows an example comparing the original intensity and the despeckled one obtained after applying the bilateral approach.
Figure 7 shows how filtering the data prior to tomographic processing improves the quality of the 3D point cloud.
Note that a pre-estimation of the covariance matrix has first to be provided according to equation (2) on very small number of samples, which is a common practice in SAR image processing.
In this work we perform a pre-summing of 3 along the azimuth coordinate.

Segmentation Driven Region Growing
Our building extraction method has been motivated by a few observations: • Many buildings in the SAR image appear as bright structures.This is due to the fact that the backscattering of large facades facing the sensor tend to dominate the overall response in layover areas.For these pixels, the estimated tomographic heights are those of facades as MUSIC-1 estimates the height of the dominant scattering mechanism.• Roofs of buildings which are not facing the sensor are well visible in the height maps.However, their reflectivity is low and they cannot easily be detected in the radiometric image. • The assumption of planar patches to decompose building roofs into geometric primitives is made.Due to the large distance between the sensor and the imaged objects, the projection of planar facades into the SAR geometry will also be well approximated by planes.Due to a smooth topography, the ground may also be approximated as locally planar.
• Most buildings have an elongated shape and rectilinear edges.
Example of facades and roofs seen in intensity image and tomographic height map are shown in Figure 8.
Based on these observations, we have developed a building detection method exploiting the complementarity between the radiometric image and the purely 3D geometric information.Operations are carried on in the SAR geometry: incoherent tomographic processing leads to dense height maps where the natural image connectivity of pixels can be used for segmentation.
In (D'Hondt et al., 2013a) a region growing procedure was introduced to extract the planar parts of buildings and ground.The idea is to collect pixels in a window and compute at each position plane parameters of type h(i, j) = ai + bj + c where i and j are 2D pixel indices, and h is the measured height at this position.Assuming the heights are affected by white noise, defines the standard deviation of the points around the estimated plane.
Then the patch with lowest is retained to start growing a region.Points connected to the patch are progressively added if their distance to the fitted plane is less than a threshold (which was set to 3.5 in the cited work).Every time the area of the region doubles, the plane parameters and are updated.When no pixels can be added to the region, its corresponding subset is removed from the data and the whole procedure is repeated until the dataset contains less points than a user defined number (this number corresponds to the expected proportion of outliers, i.e. points that do not fit any plane).
In this work, we extend the previous approach by using the radiometric image information to guide the region growing procedure.This allows a significant improvement in the quality of the extracted regions.We first perform a simple pre-segmentation on the filtered image: the intensity image (the trace of the filtered covariance matrix) is computed and thresholded in order to separate high reflectivity areas, which are likely to contain building facades, from low reflectivity ones.Then we apply independently the region growing procedure to these two pixel subsets.
Prior to applying region growing to segmented high reflectivity regions, we eliminate regions that do not have rectilinear edges by applying the LSD (Line Segment Detector) to the intensity image (von Gioi et al., 2010).All connected components that do not intersect with a detected segment are discarded.The stages of our pre-segmentation procedure are illustrated in Figure 9.
Unfortunately, this procedure cannot be used for the subset of low reflectivity pixels since these structures do not have strong edges in the intensity image.Therefore, region shapes are examined in the classification step of the next paragraph to decide if they can be identified as building parts.

Identification of Building Parts
The region growing approach allows to identify planar primitives in the image.They potentially belong to roofs, facades and ground.Roofs can simply be distinguished from ground by their height.This is however not possible for facades.Therefore, we first identify facades by using the fact that they should be vertical when projected in the ground geometry.Hence, we compute the normal vector for each projected primitive and classify as facades the segments for which the absolute value of the z component is lower than a threshold.
If a segment has been identified as a facade, it is added to the set of detected building parts.If not, we classify the segment as a roof if its mean height is higher than a threshold.For this data, the ground component is almost flat and roofs can be extracted this way.The approach could be extended in future works to terrains with stronger slope by identifying the ground components (as for example the components with the highest number of points) and computing the distance between a primitive and the plane corresponding to the closest ground component.
This classification step is applied to both high and low reflectivity primitives.The result of detection is finally the union of both classifications.
Additionally, because segments belonging to the low reflectivity subset have not been extracted with line segment information, we retain only regions with an elongated shape.For this, the eccentricity of the ellipse having the same second order moments as the region is computed and compared to a threshold.

EXPERIMENTS
We have applied our approach to the dataset presented in Section 2. Figure 10 shows the intensity image of the filtered covariances and the tomographic height map extracted for the whole area.The intensity is obtained by computing the trace of the covariance matrix for each pixel.For a better visualization of the height map, we have masked the points with very low MUSIC pseudo-power (they mainly belong to the river and building shadows).Extracting information of such data is challenging due to the low number of images and the moderate spatial resolution.The height map contains artifacts which may be due to the lack of precision in the flight path coordinates, decorrelation effects affecting the phase quality, occlusion of buildings by vegetation and multiple bounce scattering.
Figure 11 shows the segmentation labels obtained with our method.
For the classification step, we used thresholds of 0.3 on the absolute z component of the plane normals and 20 meters above the data's minimum height for the roof detection.For low reflectivity regions, we experimentally set the threshold on eccentricity to 0.92.Unfortunately reference data was not available at the time of the experiment.It is however possible to visually assess our detection by comparing the results with the optical image, height map and SAR intensity image.
It is interesting to note that most segments extracted in the low reflectivity areas have an orientation that corresponds to buildings not facing the sensor.These segments mainly correspond to roofs that are not well visible in the intensity image.
It can be observed that the facades which have been correctly reconstructed by the tomographic processing are successfully extracted by our method.Furthermore, the algorithm results in a significant improvement of the facade reconstruction thanks to the re-projection of points on their respective plane, as shown in Figure 12.
Figure 13 shows the outputs of the different stages of our method on a zoomed area: pre-segmentation, region growing over the   ground and vegetated areas are discarded from the final segmentation.The region growing segmentation of the low reflectivity regions is of lower quality due to the absence of image edge information to guide the procedure.However, several roofs that are not well visible in the radiometric image but appear in the tomographic height map are successfully segmented by the method.
For comparison, Figure 14 shows the result of region growing without pre-segmentation.This corresponds to the method pre-sented in (D'Hondt et al., 2013a).In this case, several buildings are not well separated from the ground.
Finally, Figure 15 shows the result of the extraction method on the area presented in Figure 8 and the corresponding 3D point cloud of the reconstructed building parts.Here, the union of classification over high and low reflectivity subsets is shown.

Limitations
Our method is suited for buildings that either have a large facade facing the sensor or high roofs with a strongly anisotropic footprint.In the case of buildings with a non-rectilinear footprint or roofs below the vegetation level, detection will fail due to the use of thresholds which are set globally.Our method also suffers from the data resolution and quality: pixels belonging to small buildings (e.g.houses) which are surrounded by vegetation are likely to contain volume scattering from the canopy and multiple reflections due to tree trunks.With this low number of images, the tomographic resolution of many scatterers is not possible.For this reason, some small buildings will not even appear in the height map.
The reconstruction of 3D planes also depends on the data quality: some facades may appear slanted due to the low number of points and be possibly wrongly located due to multiple bounce scattering effects.

CONCLUSION
In this work we have proposed an approach to segment and reconstruct building parts in tomographic SAR data which is suited for airborne data with moderate resolution.Unlike previous approaches, we have considered points in the original image geometry.This allowed us to exploit edge information from the radiometric image in order to guide the segmentation of 3D planar regions.This edge information was incorporated through a presegmentation based on image reflectivity.The segmentation was refined by the use of a line segment detection in the radiometric image.In the height map, planar primitive identification was performed by a region growing approach exploiting the natural connectivity of pixels in the image geometry.The obtained primitives were then classified thanks to geometric tests to determine if they were building parts.
This approach was applied to experimental data and has allowed to extract elongated buildings.In the high reflectivity areas, most of the reconstructed structures are facades, whereas roofs are mostly present in the low reflectivity areas.
The obtained results are encouraging.Future works will consider the use of energy based methods to combine different the different types of information in a single segmentation step as well as regularization by Markov Random Fields.Detection of nonrectilinear and non-elongated buildings will also be addressed.This could be done by including more radiometric and geometric features in a machine learning based approach.Discrimination of vegetated areas could also be considered by incorporating texture information.

Figure 1 :
Figure 1: Visual illustration of the main steps in our method.

Figure 2 :
Figure 2: Principle of our building extraction method.

Figure 3 :
Figure 3: (a) SAR and (b) optical images of the selected area.Dimensions of the cropped SAR image are approximately of 700 pixels in azimuth and 1000 pixels in range.The optical image is from c Microsoft Bing Maps, 2015 antenna emits an electromagnetic pulse that illuminates a terrain patch and receives the backscattered signal due to the interaction

Figure 5 :
Figure 5: Example of other artifacts in SAR data.Multiple bounce and shadow occur (here we show the example of double bounce between a ground and a facade).

Figure 6 :Figure 7 :
Figure 6: Comparison of (a) original intensity image with speckle and (b) filtered image thanks to adaptive covariance estimation.The trace of the covariance matrix is displayed.

Figure 8 :
Figure 8: Comparison of roof and facades as they appear in (a) the intensity image and (b) the height maps obtained with tomography.Facade appear as bright structures in the radiometric image and heights ramps in the point cloud.Roofs have low amplitude in the image but are well visible in the height map.

Figure 9 :
Figure 9: Illustration of image pre-segmentation: (a) detected lines (in red) superimposed with the radiometric image, (b) segmentation by thresholding intensity, (c) segmentation with line segment information.

Figure 11 :
Figure 10: (a) Filtered covariance (intensity image) and (b) tomographic height map for the whole area of interest.

Figure 12 :Figure 13 :Figure 14 :
Figure 12: Examples of facade reconstructions.(a) and (c) show the original points, (b) and (d) show the reconstructed points after applying our extraction method.We have manually outlined the facades in the original point cloud for a better visualization.
Figure 14: Results of region growing where no pre-segmentation is used.Black rectangles show examples of bad segmentation that can be compared to results of Figures 13 and 15.

Figure 15 :
Figure 15: Segmentation result zoomed over the roof and facade areas of Figure 8.(a) Final segments (combined view of both high and low reflectivity areas).(b) 3D reconstructions of the corresponding building parts reprojected in the (x, y, z) geometry.