BUILDING EDGE DETECTION USING SMALL-FOOTPRINT AIRBORNE FULL-WAVEFORM LIDAR DATA

The full-waveform lidar technology allows a complete access to the information related to the emitted and backscattered laser signals. Although most of the common applications of full-waveform lidar are currently dedicated to the study of forested areas, some recent studies have shown that airborne full-waveform data is relevant for urban area analysis. We extend the field to pattern recognition with a focus on retrieval. Our proposed approach combines two steps. In a first time, building edges are coarsely extracted. Then, a physical model based on the lidar equation is used to retrieve a more accurate position of the estimated edge than the size of the lidar footprint. Another consequence is the estimation of more accurate planimetric positions of the extracted echoes.


Context and objectives
Information related to the emitted and backscattered laser signals can be completely accessed with the full-waveform (FW) lidar technology.Such signals gather the contribution of one or several objects that have been hit by the laser beam.An offline post-processing step extracts maxima of the recorded signals.These maxima are called "echoes" and will be turned into 3D points thanks to a georeferencing process.Each of them corresponds to one or several objects closer than the sensor spatial resolution along the axis of the laser beam.A majority of the applications of full-waveform lidar deals with forested areas, but some recent works based on small-footprint data (diameter d < 1 m) have demonstrated the relevance of lidar waveforms in urban areas, mainly for land-cover classification (Mallet et al., 2011), and vegetation detection (Höfle et al., 2012).The literature has barely investigated the field of pattern recognition (Jutzi and Stilla, 2005b).Building edges, and more generally 3D lines, are patterns of high interest for many applications such as strip registration, building detection, change detection or database updating (Lee et al., 2007).For that purpose, small-footprint lidar waveforms may be relevant.Typical footprint size and digitization rate are around 0.5 m and 1 GHz : each echo within the waveform is likely to correspond to a specific target.Information from several objects will not be mixed since the distance between the object is large enough.This is all the more interesting when detecting building edges as, in case of high altimetric shift between the targets, echoes will not overlap, and particular contributions may be specifically analysed.Two issues are tackled in this paper.Firstly, we aim to detect building edges using both georeferenced 1D signals and extracted 3D points.Only linear structures are considered.Secondly, based on the assumption that building roofs are locally homogeneous in terms of geometry and radiometry, we use the knowledge of the coarse position of the edges to estimate the correct position of the 3D point within the lidar footprint, and, finally, re-estimate more precisely the 3D edge segments.In addition to the standard "waveform processing step" which mainly add information in the altimetric component of the 3D points, our work is thus also dedicated to the planimetric improvement of the points ex-tracted from the waveforms.This is particularly important when the lidar beam is not very focused.

Existing works
FW-based building edge detection has only be carried out in (Jutzi and Stilla, 2005a).However, plethora of papers exist when dealing with standard multiple-pulse lidar data.Two major kinds of approaches are possible: image and 3D-based approaches.Raster methods are mainly based on the analysis of Digital Surface Models (DSM) or normalized DSM (Rutzinger et al., 2009).Morphological filters, rank filters, robust hierarchical interpolation or gradient computation can be used to detect building edges.Vegetation areas are then commonly filtered by subtracting the last-echo DSM to the first-echo DSM.Classification can also be carried out by calculating the local curvature or variance of the normal vectors for each area of interest (Arefi, 2009).Furthermore, the intensity or amplitude value may also provide discriminant images for building edge detection.However, such information requires a calibration step (Höfle and Pfeifer, 2007), and has not yet been optimally used.Visible or Infra-red geospatial images are therefore often preferred as complementary information (Rottensteiner et al., 2005;Matikainen et al., 2007).The second possible approach is based on the geometrical study of the 3D point cloud.Linear features or significant vertical discontinuities can be directly detected in the point cloud (Sampath and Shan, 2007).Such analysis can be preceded and facilitated by a building focusing step, traditionally 3D point classification as in (Zhang et al., 2006;Poullis and You, 2011).Mesh-based methods such Zhou and Neumann (2010) may allow to compute accurate edges while loosing the semantic information.Only few specific methods are dedicated to the lidar signal analysis.The aim is to estimate a more accurate position of the edges than the size of the lidar footprint.Jutzi and Stilla (2005a) have initiated the field with airborne simulated data, and managed to improve edge location to one to seven tenth of a pixel according to the measurement noise.Their method is based on the fact that the building edge orientation cannot be estimated with a single waveform.For each waveform, the distance to the building edge is estimated.The 3D point positioned on the roof can be anywhere on a circle centered on the waveform line of sight.Several possible building edges are extracted and the ambiguity problem is solved thanks to the power of the first backscattered echo.3D point posi-ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume I-3, 2012 XXII ISPRS Congress, 25 August -01 September 2012, Melbourne, Australia tion improvement has only been tackled with real data for terrestrial datasets (Neilsen, 2011).In the proposed method, the lidar point cloud is corrected from the surface response, and particularly the 3D position is refined with the backscattered waveform.Unfortunately, there are major differences between aerial and terrestrial lidar issues.This method assumes all surface to be homogeneous within the lidar footprint, which may be inaccurate with airborne large scale data.The paper is organised as follows.The proposed strategy will be first presented as well as the available dataset.Then, each step of the workflow will be described and commented.Finally, results will be presented, and conclusions will be drawn.

Methodology
The proposed methodology is based on the assumption that buildings can be approximated by polygons, i.e., fragmented in segments.Given a set of lidar measurements, our method is decomposed into three steps: • Step 1: Segmentation of boundary regions.Firstly, the location of building areas is coarsely retrieved, in order to decompose the problem and offer parallelization opportunities.Main vegetated areas are removed and not examined in the following steps.Secondly, for each focus area, lidar waveforms are segmented in "boundary region" or not.The process is not limited to waveforms with multiple echoes. • Step 2: Initial boundary extraction and diagnostic.A coarse determination of the 3D locations of the segments is first performed.3D point clouds are first extracted from the waveforms.Then, 3D boundary segments are estimated with the standard RANSAC algorithm, adapted to deal with remaining vegetated areas and with focus areas composed of several buildings.Finally, segments are qualified, and vegetation points are filtered.Only the most reliable segments are fed into the adjustment process.
• Step 3 : Boundary adjustment.Based on a physical model, the positions of the 3D segments and support 3D points are improved, up to twice more accurately than the lidar footprint.

Data
Full-waveform data has been acquired over the city of Amiens, France, in February 2008, under leaf-off conditions (OPTECH 3100-EA sensor).The point density is 2 points/m 2 .We will focus our study on the city center, which corresponds to 5 million backscattered waveforms covering 0.7 km 2 .For each lidar pulse, both emitted (pulse width = 4.8 ns) and backscattered waveforms (1 GHz digitization rate), GPS time, position and attitude of the sensor are available.Therefore, waveforms can be georeferenced, as well as 3D subsequent points (see Section 4.1).The French national 2D topographic database is available in order to qualify the results.The building layer has a planimetric accuracy of around 1 m.A 0.5 m orthoimage is also used to visually assess the extraction process.
Due to the scan pattern, the 3D genuine spatial sampling of the data is not regular, which does not facilitate the local analysis, required in Section 3. Knowledge of sensor direction and position for each waveform allows to adopt a sensor topology strategy for neighbourhood computation (David et al., 2008).A georeferenced image is created where each pixel represents the conical region of the landscape illuminated by the laser beam.One line represents one scan line, whereas one column corresponds to one angle of incidence.The major advantage of this topology is its execution time for retrieving adjacent waveforms.

Acquisition of building edges
Depending on the angle of incidence of the laser beam and the position of the sensor, edges of a building will not be all similarly acquired.Two cases can be distinguished: • "Single echo" (SE): Edges corresponding to walls illuminated by the laser beam will create waveforms with a single echo (ground, facade or roof).
• "Multiple echo" (ME): The backscattered waveform is composed of two echoes : one on the roof, and one on the ground.
Our strategy mainly relies on the detection of "multiple echo" waveforms, but Step 1 is designed to retrieve all the roof waveforms close to the gutter heights.We consider that the other lidar strips may take measures on the building from the reverse point of view, thus offering the adequate complementary information for the full contour description.

Focus on boundary areas
The method is based of the computation of vertical discontinuities in 3D waveform data.For each waveform, if the vertical discontinuity is superior to a theoretical minimal building gutter height, hmin (set to 2 m to favour a high detection rate), it may lie on an edge.Computing such discontinuity as only the difference of elevation between the first and the last echoes of one waveform is not adequate, since it may discard many waveforms and provide noisy results.Therefore, a 4-connexity neighborhood V in the sensor topology is used to detect the two illumination cases.B is filled, a binary regular occupancy grid is generated with a resolution of 1 m.Since only a coarse segmentation is expected, this value is sufficient.The sparse grid is densified by applying a morphological closing (3×3 structuring element).Segments that may correspond to building areas are retrieved by connectivity constraint.This fast procedure allows to decompose the problem into local sub-issues.As illustrated in Figure 1, the approach is successful: all building edges are in a single focus area.However, one area may include several buildings, that will require specific constraints in the segment extraction step.One can note that some areas focus on vegetation: such segments will be removed afterwards.

Waveform segmentation
The goal of this part is to select waveforms from the p th focus area (w ∈ Fp) which really lying a building edge by favouring a high detection rate.Selecting waveforms with two echoes will result in segmenting both building edges, chimneys, and vegetation.The segment extraction may be highly corrupted by such false detections.
Two assumptions are made for this segmentation : the height (hi) and the gutter line elevation (Ei) are approximately constant by piece for the i th building edge.So, for each wall, waveforms have a first echo elevation of Ei and a difference of elevation between their first echoes and one of the echoes of their neighbouring waveforms of hi.Thus, one of the major difficulties is to compute Ei and hi.
An area can contain several buildings, and in case of sloped terrains, may have walls with different heights.Thus, an iterative procedure should be set up, starting with the most prominent wall until finding the lowest one.In one focus area, waveforms with two echoes and largest differences of elevation between the first and the last echoes are likely to be located on building edges.
Building height can then be computed among the most important differences of elevation.Since trees can be taller than buildings, the largest differences of elevation may sometimes be discarded.
A rank filter is applied on the candidate waveforms: the 70 th percentile (P70) is selected, and provides the hi value.Thus, for each waveform w ∈ Γi (Γi is the set of waveforms which lying the i th building edge), we have: Such value allows to select all multiple-echo waveforms that exhibit similar differences.A buffer χ around hi is introduced to gather all relevant waveforms.χ is set to 0.5 m, which corresponds to half of the maximum variation of the wall height for a building.Moreover, the median (Md) altitude of the first echo of these waveforms is a good approximation of the elevation of the tallest gutter line (Ei).
Wall height and gutter line altitude of the tallest building in the focus area are then known.Waveforms lying on this building can now be selected.A waveform w is labelled as "building edge" lying on Γi if: • Its first echo altitude is in the buffer γ around the gutter line elevation Ei. γ is set to 0.2 m, which is the addition of the spatial resolution of waveforms (0.15 m), and an altimetric tolerance on the gutter line elevation (0.05 m); • The difference of elevation between its first (or single) echo and one echo of the neighbouring waveforms (in the sensor topology, 4-connexity) is close to the wall height hi (±χ).
In practice, the procedure starts with the highest building.The corresponding waveforms are then removed, and the process iterates until no significant differences of elevations are found in one area.

3D point extraction
Once waveforms lying on building edges have been segmented, 3D segments can be computed.Two strategies are conceivable: processing the spatio-temporal data volume of the georeferenced waveforms as proposed in (Jutzi and Stilla, 2005b) or the 3D points extracted from the waveforms.In order to deal with large scale issues, the second possibility is selected.For that purpose, the Gaussian decomposition is performed on the waveform (Mallet and Bretar, 2009), and only the first detected echo is preserved.
Figure 2 shows the location of the extracted 3D points for an area of interest.One can notice that, building edges are well described but also that many points remain on trees.Since the following steps will deal with this problem, the segmentation step is considered as satisfactory: a high true positive rate is achieved for "building edge" points.These "building edge" points are evaluated with the available ground truth.Around 50% of the extracted 3D points are located on building edges (Figure 2).
Figure 2: First echoes extracted from waveforms labelled as "building edge" in Section 3.2.Yellow (resp.red) points are located on building (resp.vegetation) areas, with respect to the ground truth.

Segment extraction
In order to fit segments on the 3D points, a robust algorithm is required.These 3D points are positioned in the center of the 0.8 m lidar footprint diameter.Furthermore, for various configurations and objects (balconies, antennas etc.), multiple echoes are outliers and may appear.The iterative RANdom SAmple Consensus (RANSAC) algorithm is appropriate to achieve this step (Fischler and Bolles, 1981).However, it has to been adapted to cope with two main issues: points located in vegetation, and separated but aligned buildings within a single focus area.For the latter case, a single segment would be extracted for two distinct buildings.Consequently, two constraints are introduced.3D points have to fulfil both criteria to yet be considered as inliers.(cf. Figure 3, red segments).
• Orientation consistency of lines.For a given building edge, the roof is always in the same side i.e., the vertical discontinuity has always the same orientation (Poullis and You, 2011).The orientation of the building edge is thus constrained to be in the same direction along the segment.A tolerance of ±45 • is sufficient to remove false detections in vegetation.(false positive rate: 59% → 34%).
• A maximum horizontal range between two successive points projected on the extracted line.It is set to 2 m to prevent the extraction of a unique segment on both sides of a street.

Boundary diagnostic
The final adjustment procedure requires a correct initialization.Furthermore, some remaining segments are still lying on vegetated areas, and should be removed.Consequently, the boundary diagnostic is decomposed into two steps.Firstly, segments in vegetated areas are removed, and then filtered segments are diagnosed (geometrical accuracy and reliability).The segments in vegetation areas are filtered by using two criteria: • Planimetric scattering: Vegetated areas are known to generate backscattered lidar signals with a larger width than the width of the emitted pulse (here 4.8 ns).Indeed, several objects closer than the sensor range resolution will contribute to the same echo.Such echo will be wider because of the spatial extent of these targets.This is particularly true for the first echo which corresponds to the tree canopy with many material.Conversely, if the waveform hits a building edge, a single object will contribute to the first echo but the proportion of roof within the footprint is not sufficient to notice a big influence on the echo width (even if the roof is inclined).Thus, roof echoes are narrower.Consequently, segments where RANSAC inliers were extracted from waveform which have an average width superior to a threshold of 5.5 ns.Such value has been retrieved using a study of waveform widths in vegetation areas.
• Volumetric scattering: This criterion is point density-based.Vegetation generates more 3D points, that are in addition spatially scattered.A segment is labelled as "volumetric" whether : Where N3q and Nq are the numbers of 3D points located in a buffer respectively of 3q and q meters around the segment.This criterion is similar to (Höfle et al., 2012).For building areas, point density does not vary much while considering increasing areas of interest: N3q Nq.In this paper q=1 m so 3q is lesser than street width.The 40% increase reflects the ability of lidar beams to penetrate vegetated areas, and thus generating multiple scatterings.The buffer size has to be empirically adapted to the lidar footprint and the increasing rate needs to be adjusted to the point cloud density.

TRUE FALSE TRUE
88 12 FALSE 7 93 Table 1: Confusion matrix for segments corresponding to building edges.
A segment is considered as "planar" if it fulfils these two criterion.Non-planar segments are removed.As illustrated wit the Table 1 and on Figure 3, the global filtering step permits to efficiently remove vegetated segments.The ratio of segments in vegetation areas decreases from 34 % to 7%.Few isolated residual false detections are still noticeable, but we consider that an optional regularization step that turns segments to polygonal areas, posterior to our workflow, is likely to remove these errors.
The final quality of the extracted segments can be estimated with two criterion : • Geometrical accuracy: The Root Mean Square (RMS) value is used.An segment is "accurate" when: RMS < 0.5 d, where d is the lidar footprint.
Figure 3: Vegetation filtering.Red segments are discarded using orientation criterion (RANSAC procedure), blue segments with planar criterion.Yellow segments are can be diagnosed.
• Reliability: A segment with a point density higher than 0.5 inlier/m is considered as "reliable".
The segment adjustment step is only processing accurate and reliable segments.

Principle
This step is inspired from the theoretical inversion on simulated data of Jutzi and Stilla (2005a).3D extracted points have the same order of accuracy than the lidar footprint radius.The Gaussian decomposition permits to retrieve a correct vertical accuracy but their horizontal position need to be adjusted.For that purpose, the lidar equation (Wagner, 2010) is used and inverted to exactly find their position.A Lambertian assumption allows a clear simplification of this equation but is only valid for echoes lying on building roofs : with Ccal is a calibration constant, R is the range between the sensor and the target, α is the angle between the local normal of the roof and the direction of incidence of the laser beam, ξ is the footprint area, Pe and Pr are the emitted and received power.K gathers other constant values.One can notes that area of the target can't be approximated by a disk for waveforms which lying the building edge, as illustrated on Figure 5 .
Roof are locally planar and radiometrically homogeneous inside the lidar footprint, but this is not the case for ground and vegetation echoes.Only first echoes are considered to adjust a segment.Our proposal method consists to iteratively : • Use the knowledge of segment orientation to adjust its points (inliers).• Estimate a new segment (and therefore a new orientation).
The proposed approach is a numerical Expectation-Maximisation algorithm, as illustrated in Figure 4.

Initialisation
For the j th accurate and reliable segment (Section 4.3), the initial coarse orientation (θ 0 j ) and position (r 0 j ) are already estimated in Section 4.2.The normal N of each part of the roof edge can be extracted from the 3D point cloud using RANSAC.The first set Initialisation ?Expectation ? of points Θ 0 j is composed of the 3D points extracted from the first echo of waveforms lying the building edge.The local reflectance can be interpolated (up to a calibration factor) from the one echo waveforms of the neighbourhood: with sw and pw are, respectively, the amplitude and the width of the echo.WSE is the set of waveforms with a single echo.
K j can be estimated with one echo waveforms, lying building roofs using: (8)

Expectation
The goal of this step is to adjust the horizontal position of points extracted from multiple-echo waveforms and located on roof edges.In a first time, the ellipsis resulting from the intersection between the laser beam and the roof plane is computed.This requires the knowledge of N , the beam divergence β, the position and orientation of the sensor, and the standard deviation of the beam σR.σR, in case of spatial Gaussian beam, is the distance between the center of the line of sight and the roof, where the amplitude is divided by e 2 (σR = R(2 ln 2) −1/2 ).The ellipsis is then spatially sampled (A(τ ) = 1 cm 2 ).An angular decomposition is performed in order to have a regular mesh in the orthogonal direction of propagation.A Kronecker test δ(τ ), based on orientation, is run for each cell τ in order to know whether it belongs to the roof.In a second time, the laser beam is temporally sampled (1 GHz).
For a fixed time t, the intersection between each discretized ray and the roof is computed.Then, the associated transmitted incident power for each cell can be calculated: where d is the orthogonal distance between the laser beam axis and the intersection between the discretized ray and the roof, c(R) is the amplitude diminution factor due to the distance R between the sensor and the roof.For each discretized ray integration, the associated backscattered power Pr th can be calculated by using the simplified lidar equation (Mallet and Bretar, 2009): where K is a constant which depends on the wavelength and the aperture of the laser.By an integration over time, the backscattered power of each laser beam is known.
With the fixed orientation, θ k−1 j , different positions of the building edge are simulated for the given ellipsis position.Calculated backscattered powers Pr th are compared to the measured backscattered power Pr.The position of the good simulated building edge is obtained when the sign of (Pr th − Pr) change.The adjusted point is positioned in the orthogonal position to the laser beam axis on the good simulated position of the building edge.

Maximisation
Adjusted segment can be computed on the set of adjusted points with a least-square algorithm (Θj).Then, the new orientation θ k j and position are r k j derived.Such value can be used to iterate on the 3D point positions.The algorithm stops when the change of the orientation is negligible (η = 5 o ).As illustrated on Figure 6, the adjusted segment has a better accuracy (in average the RMS decrease by 0.15 m).

RESULTS
Segment extraction before adjustment provides very exhaustive results.A comparison with a topographic database shows that the major part (87 %) of building edges are detected.Extracted segments are positioned on edges between building and street.False detection rate is therefore very low (12 %), corresponding to few isolated segments inside vegetation.These segments will be deleted if a polygonalization algorithm (merging segments to obtain closed polylines) is performed.Furthermore, borders that separate two connected buildings are not detected because the vertical distance between each side of these edges is not important enough.Segment adjustment results are less exhaustive (about 5%).The major parts of segments cannot be adjusted because the one material planar roof is a too restrictive assumption.However, when the procedure is possible, the extraction is improved: the RMS increases of 0.15 m, which is satisfactory with respect to the footprint radius (0.4 m). Figure 7 provides an overview of the full process applied to the whole dataset.One can notice that once results from different strips are merged, building are correctly outlined.

CONCLUSION
Building edge detection using airborne full-waveform lidar is a problem which have never been addressed with real data before.The approach described in this paper proposes a solution providing reliable boundaries segments with very low false positive rate.We have tried to benefit from the particular specification of the acquisition process with 0.8 m footprint size, delivering a large number of waveforms with multiple and well separated echoes.The algorithm is decomposed into several independent parts that can be easily tuned to fit to the specification of other surveys.The main concern of the paper was to provide fast methods to be able to efficiently process large amount of data.The first limitation of the proposed approach is that only segments are considered.However, other primitives (such as circles, curves) may be inserted for specific landscapes.One just has to add a model selection step within the RANSAC procedure.Furthermore, the adjustment step is limited because of the too strict roof modelling.Future work focusing therefore in improving the roof reflection model, by providing a more complete mathematical formulation.This would increase the rate of adjusted primitives, and being quantitatively assessed with simulated datasets.Finally, it would be interest to add a final step that would turn the set of adjusted primitives into polygons to obtain geometric forms similar to cadastral data.
Figure 1: Results on the focusing step : one color per area.

Figure 5 :
Figure 5: Illustration of the iterative segment adjustment algorithm.

Figure 7 :
Figure 7: Extracted feature before vegetation filtering : one colour per strip.