FACADE RECONSTRUCTION WITH GENERALIZED 2 . 5 D GRIDS

Reconstructing fine facade geometry from MMS lidar data remains a challenge: In addition to being inherently sparse, the point cloud provided by a single street point of view is necessarily incomplete. We propose a simple framework to estimate the facade surface with a deformable 2.5d grid. Computations are performed in a ”sensor-oriented” coordinate system that maximizes consistency with the data. the algorithm allows to retrieve the facade geometry without priori knowledge. It can thus be automatically applied to a large amount of data in spite of the variability of encountered architectural forms. The 2.5d image structure of the output makes it compatible with storage and real-time constraints of immersive navigation.


Motivations
We aim to automatically reconstruct a fine photorealistic facade model (lod3).In this paper, we focus on the geometrical modeling.The geometry is extracted from lidar data and has to be in accordance with the optical images acquired in the same time and used for texture mapping.Apart from being photo-consistent, the model has also to be sufficiently compact for scaling up.

Related Work
Facade Modeling is a vastly studied topic (Musialski et al., 2012b).Numerous articles offer to reconstruct the facades using knowledge based approaches, looking for semantic objects such as windows (Pu and Vosselman, 2007), (Wang et al., 2011), assuming they are regularly arranged in a grid structure (Müller et al., 2007).The structure information of facades is considered as necessary for a good reconstruction (Becker and Haala, 2009).Indeed, the search for well-defined objects (size, shape ...) is not obvious in point cloud data (PCD).Grammar rules can be used to palliate the lack of information.However the apparent simplicity of the facade structures, and the underlying grammars is actually complex to automatically detect, to the extent that an operator is deemed necessary to get the desired results (Musialski et al., 2012a).
Other methods use more general assumptions like the predominance of right angles.Chauve et al. (2010) proposes to build a 3d model from images by an arrangements of planes, adding horizontal or vertical planes where there is an absence of information.Although such approaches are adapted to human constructions, objects that do not fit the model cause strange artifacts (Furukawa et al., 2009).
Another line of research is the detection of repetitions and symmetries (Pauly et al., 2008).The repetitions that are not necessarily sequenced according to a grid structure (Shen et al., 2011) may form a kind of low-level grammar.Detecting repetitions require to deal with the uneven sampling of the ranks scan (Friedman and Stamos, 2011).Density varies depending on the distance of echoes to the sensor, but also on occluders.Facades are partially hidden, and density decreases toward the top of the buildings.Instead of replacing missing parts by flat or smooth surfaces, some papers offer to "consolidate" and fill the holes thanks to the repetitive structures (Zheng et al., 2010), (Nan et al., 2010).But here again, the detection of repetitive elements in PCD is complex and requires manual assistance (Li et al., 2011).
Although such knowledge based approaches are promising, Detecting facade structures and grammars remains difficult, as evidenced by the need for human intervention.In addition, such methods cannot handle old style facades, complex ornaments, and any unexpected object which does not match the model.Maybe hybrid reconstructions (surface/shape) are more appropriate to these contexts and could help the facade structure to emerge (Lafarge et al., 2013).
We opted for a low level approach, consistent as possible with the data.Indeed, we believe that a processing step is missing in facade modeling.Trying to detect structure into PCD imply to manage the density variation that can both depend on underlying objects and acquisition context (fig: 1).A more "acquisition independent" data is needed.A surface may fulfill this role.Many approaches are looking for the facade surface (Deschaud, 2010), but start with an irregular mesh and fill the holes in a second step.Our approach assumes the presence of a continuous surface that is iteratively adjusted toward the 3d points.
Carving approaches use the information provided by the lidar sensor.Even without knowing the position of the source (Shalom ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume II-5/W2, 2013ISPRS Workshop Laser Scanning 2013, 11 -13 November 2013, Antalya, Turkey et al., 2010), it is possible to constrain the surface reconstruction by estimating the angle of view of the sensor.Surfaces thus generated (Hadsell et al., 2009) ensure consistency with the acquired data.Rather than a purely carving method, we propose to adopt the sensor point of view.Namely, we place ourself in a coordinate system adapted to the acquisition method.Using such a coordinate system allows efficient and accurate results, as for the calculation of normal (Badino et al., 2011).The benefits and challenges of using the acquisition coordinate system are well shown by Frueh et al. (2005) in particular, they explain the reversal scan order that causes topological problems when trying to mesh points according to acquisition order.Their choice is to work with depth images from path segments that contain no reversal order.In section 2.1, we will discuss the use of such depth images where each pixel represents a lidar point (fig: 3).We also propose to reconstruct a 2.5d grid that leverages the acquisition geometry and overrides the topological problems, but our approach allows more freedom because one pixel can contain zero, one or several points, and the grid can be computed from any path segment.A 2.5d grid is fitted to the geometrical primitives.In the 2.5d grid coordinate system, any 3d point is defined by u and v that locate the point on the grid, and h that is the depth from the u-v plane.

Overview
The main steps and the inputs of the algorithm are depicted thereafter and in figure 2. Initialization.A vertical rectangle approximating the main facade wall is calculated as in (Demantké et al., 2012).It provides the u-v plane and the grid boundaries.The 2.5d grid is placed on this rectangle and the depth of each grid pixel is set to zero.
Primitive accumulation in the grid pixels.The primitives (lidar points, triangles or line segments) are projected onto the u-v plane in order to accumulate the primitive depths h into the grid pixels: for any kind of primitive, whenever the intersection between a primitive and a pixel is not empty, the primitive depth is accumulated in the pixel.
Iterative deformation of the grid.The grid is then iteratively deformed to draw near the primitives.An optimal h is searched for each pixel according to the depths of the primitives that are projected in, and according the neighboring pixel depths.Only the h coordinates are changed, the pixels therefore move along their iso-uv.This process is performed several times in order to converge to a satisfactory solution, despite bad initializations, outliers and empty pixels.The strategy for pixel depth computation serves to handle cases where many primitives have been accumulated in one pixel.The depth interpolation for empty pixel is made locally, according to a smooth parameter explained in section 3.Each input is independent from the others.As well, the strategy for pixel depth computation can be chosen separately, however if the grid resolution or some other parameter is changed between two iterations, the primitive accumulation must be performed again.

COORDINATE SYSTEMS
The choice of the 2.5d grid coordinate system and the corresponding projection is important because it affects the primitives accumulation and it determines the direction of the pixel displacements (orthogonal to the u-v plane if Cartesian coordinate system).Our first intuition was to use a Cartesian coordinate system.This favors perpendicular angles, which are prevalent in human buildings.However, it causes problems that could be avoided using a more "sensor-oriented" coordinate system.That is why we propose a generalization of the orthogonal projection called "prismatic" projection and the associated coordinate system.The prismatic coordinate system can be adapted to the facade and the laser beam orientations.
In this section, we analyze the sensor coordinate system corresponding to an acquisition with a rotating laser, scanning vertically and perpendicular to the trajectory.We list the benefits and the limitations of working in such a coordinate system.Then the prismatic coordinate system is described.Finally, orthogonal and prismatic coordinate systems are compared in the section 2.3.

Sensor coordinate system: relevance and limitations
Finding topological structure of objects from an unorganized point cloud is complex.The most probable surface is often searched, leaving decisions of point linking and hole filling to thresholds or assumptions.In fact, the objects are not sampled from every point of view.The sensor generally sees only one side of the objects, whereas a hidden part remains unknown and impossible to reconstruct.The acquisition result is thus a partial view of the scene.Rather than detecting the whole object volumes, the lidar discovers an empty volume between the sensor and the objects, "carved" by the laser beams.The object surfaces stop the laser beams and bound the sensor visibility area.Lidar echoes are thus located on object surfaces, and more generally at the interface between what is seen and not seen by the sensor.Assuming that this interface is a surface, reconstructible surface pieces (corresponding to the visible part of the objects) are also pieces of this interface.Therefore, we consider this interface and the ways to reconstruct it.This is an image in sensor projection.Points can be simply and rapidly meshed following the neighborhood relationships in sensor coordinate system.This "naïve" meshing has some shortcomings, but more sophisticated techniques adopting the sensor viewpoint may permit to benefit from sensor coordinate system.
Our acquisition vehicle is equipped as in (Paparoditis et al., 2012).
The laser scans perpendicularly to the trajectory, with a vertical angle Θ.Let P (x, y, z) an echo acquired at τ time, Sτ is the position of the sensor at τ .R is the distance between Sτ and P .Θ is the angle between the horizontal plane and the laser beam.
V is a vertical unit vector.Uτ = Tτ × V , with Tτ the unit vector of the vehicle displacement at τ .Each term has specific features that imply various accuracies and value distributions as shown in table 1.The lidar sweep produces points at regular intervals of time τ and firing angles Θ.Hence, point distribution in (τ, Θ) space is very homogeneous and structured.A regular mesh appears if each point is linked with the next point acquired, and with the point on the next scanline that has the same Θ.Such a mesh may fold on itself when projected into (x, y, z) space (fig: 3).This is caused by the reversal scan order as shown in fig: 4. The curvature of the vehicle trajectory is the source of laser beam crossings that make points out of sequence along τ , whereas the order is always kept along Θ(fig: 5).In addition to this order reversal problem, the neighborhood relationships established according to the sensor geometry does not necessarily make sense in 3d space.Some objects are not surfacic, like trees, generating scattered points.The windows, both transparent and reflective, induce the detection of several surfaces for the same (τ, Θ) area.In these cases, the regular mesh is no longer adequate to represent the data.Mention may also be made of the multi-echoes that share the same (τ, Θ), and of the laser shots oriented toward the sky that generate "no returning" beams.In sum, the nearest neighbors in sensor geometry are not necessarily the most suitable, because the sensor geometry is trajectory dependant (mesh folding), and because the geometry of the underlying objects may be too complex or sub-sampled.Nevertheless, it seems interesting to work in sensor coordinate system that highlights the almost-regular and almost-2.5ddata structure.The main hurdle is the absence of bijection due to the possible laser beam crossings.That is why we propose the prismatic coordinate system that imitates the sensor coordinate system: the iso-uv are aligned along the laser beams as far as possible, while ensuring a bijection between (x, y, z) and (u, v, h): no iso-uv crossings.

Θ
Figure 5: Scan order is kept when passing from (Θ, R) to (x, z).This is not the case when passing from (τ, R) to (x, y).

Prismatic coordinate system
We propose a new way to accumulate the points in the facade grid.The idea is to use a virtual sensor grid.For each pixel of the facade grid, there is a corresponding pixel on the sensor grid.Ideally, the following property is ensured: any laser beam passing through a facade pixel passes through the corresponding sensor pixel (fig: 6).The virtual sensor grid is positioned in order to approach this ideal.The function that associates prismatic coordinates (u, v, h) to the space coordinates (x, y, z) is explained in the 2d and 3d cases.Then a method to place the grid is proposed.The straight line (ab) is thus the iso-u of P .
The position of P on [ab) is given by h.We propose to use the following equation that sets h as the signed distance to b normalized by |ab|, b being the virtual intersection of the facade plane and the laser beam. (3) We will show how to find u for a given point P in order to obtain the couple (u, h).In this context, P , A, A , B and B are known.
As P lies on (ab), we have:

− →
Placing the grids.The rectangle with the B, B and B corners is placed along the facade wall.The rectangle with the A, A and A corners is placed along the trajectory in order to align laser beams with iso-uv.For this purpose, we only adapt the horizontal position of the rectangle.It is vertical with an height close to zero because the location of the sensor is not supposed to move vertically.
− − → AA is fitted with RANSAC, by selecting randomly pairs of laser beams.Each pair of laser beams is a potential couple AB, A B defining a prismatic coordinate system.The score of each pair must reflect the alignment between the laser beams and the iso-uv.The score is thus the sum of the dot products between the laser beam and the iso-uv of each 3d point.

Comparison between Cartesian and prismatic coordinate systems
The orthogonal and prismatic coordinate systems are compared through some examples in figures 9, 10, 11 and 12.Many problems in pixel accumulation and surface reconstruction can be avoided using a "sensor-oriented" coordinate system such as the prismatic one.
(  The iso-uv (purple) are along the laser beams, points are distributed homogeneously in the pixels.This is the best configuration for surface reconstruction: there is one point per pixel, there is no ambiguity in depth computation.
(c) Orthogonal projection, iso-uv (pink) orthogonal to the wall.Three points are accumulated in the same pixel (2) while there is no point in some other pixels (1).

Iterative deformation of the grid
We solve the problem of finding an optimal depth h for each grid pixel as an energy minimization with a data term that shifts h to the primitives, and a smoothness term that retains h close to the neighboring pixels.Each iteration, the depth of each pixel is moved from hn to hn+1.(5) hn : previous h hpixel : h estimated from primitives accumulated in the pixel.hneigh : h estimated from the neighboring pixels.

Strategy for pixel depth computation
The strategy for hpixel and hneigh calculation may vary according to the data or the desired output.In our case, we use a bilateral filter that allows discontinuities like at window edges, but smoothes the wall flat surfaces.Pixel depth is computed thanks to the following equations, where exp(− Where hi is the depth value of the i th primitive accumulated in the pixel.The Gaussian function weights the contribution of each hi.If an hi is far from the current depth, its contribution is low.This allows a relative independence to the outliers.The smoothness term is provided by the depth values of the eight neighbor pixels.Hence, the neighbors depth is given by: hi is the hpixel value of the i th neighbor pixel.di is the distance between the pixel center (u, v) and the i th neighbor pixel center (ui, vi).If the depth of a neighbor pixel is far from the current pixel depth, its contribution is low, allowing a discontinuity at the edge between these pixels.The values of σ h and σ d have to be fixed, one can choose the grid accuracy that is, the length of a pixel side.Some other strategies are possible.For instance, the closest point from the facade plane can be kept.The median would give a robust estimation if there are many point.Other quantile values would be used if the closest or furthest points are looked for.

Primitives
Points / Lidar echos: If many pixels are empty of 3d points, the iterative process may be slow to converge.A solution is to start with a low resolution grid (large pixels).This allows to initialize a more precise grid.This procedure can be repeated many times, the same way a scale-space pyramid is built.
Triangles: The advantage of the triangles over the points is that they provide surfaces.This allows the algorithm to converge more rapidly.Thus, the triangle soup provided by sensor geometry can be used as input.
Line segments: Line segments extracted from scan lines can be accumulated in grid pixels.This option is mentioned because many papers perform processes such as classification on scanlines (Hadjiliadis and Stamos, 2010).simplified scanlines are more compact than points and their projections could intersect more pixels.
Mixed primitives: Nothing prevents to accumulate heterogeneous primitives.

RESULTS
Figure 13: Grid constructed for a facade (left).Detail of the grid (right).
ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume II-5/W2, 2013ISPRS Workshop Laser Scanning 2013, 11 -13 November 2013, Antalya, Turkey We display some visual results, showing the precision of the proposed method, despite the low point density (fig: 13).Parameter adjustments permits a compromise between a smooth surface as in figure 14 (a), and more rugged and closer to the points, 14 (c).

CONCLUSION AND FUTURE WORK
We presented an approach to reconstruct facade geometry from lidar data.We aimed at surfaces consistent with the data, especially the optical images acquired in the same time.The proposed framework allows to reconstruct 2.5d grids in a coordinate system adapted to the sensor point of view.For this purpose, we introduced a prismatic projection that is a generalization of the orthogonal projection and the corresponding coordinate system.It allows to benefit from the sensor geometry while providing a coordinate system topologically consistent with the 3d space.The output 2.5d grids are suitable for immersive navigation and other applications that need compact and detailed 3d models.They could also be used as inputs for facade structure analysis, in particular for window detection or grammar rules extraction.
In future work, we would like to texture the grid with the optical images.For this task, our framework should allow us to deal with images, that are also 2.5d grids in image coordinate system, and to project image pixels onto the reconstructed grid.

Figure 1 :
Figure1: Point density depends on sensor location and orientation (left), and on surfaces geometry (middle).It is also decreased by occluders like trees in front of facades (right).Thus it depends both on acquisition (viewing angle, scanning frequency ...) and scanned objects.Nevertheless, point density is the essence of information given by PCD.

Figure 2 :
Figure 2: Workflow of grid fitting to the primitives.

Figure 6 :
Figure 6: The points are accumulated in the pixels of the facade grid.The accumulation volume of a given pixel is shown in red.It extends to infinity behind the grid.
Figure 7: 2d formulation 2d formulation.We place ourselves in the horizontal plane (fig:7).We look for the (u, h) coordinates of a point P (x, y).[AA ] symbolizes the sensor location, and [BB ] the facade footprint.The laser beam is approximated by a ray [ab) that passes through P and intersects (AA ) in a and (BB ) in b, such that − → Aa = u − − → AA and − → Bb = u − − → BB (2) Figure 8: 3d formulation 3d formulation.As in figure 8, − − → AA ⊥ −−→ AA and − − → BB ⊥ − −− → BB .For our application, we assume that −−→ AA and − −− → BB are colinear.u and v are the horizontal and vertical coordinates of P in both the basis {AA , AA } and {BB , BB }. h is the signed distance to b.

Figure 9 :
Figure 9: The facade in profile and a wall light are sketched.The points are accumulated on the vertical plane, according to an orthogonal projection (a), (b) and a prismatic projection (c), (d).Orthogonal projection leads to accumulating points from distant beams in a same pixel (a), causing an ambiguity in the choice of the surface (b).prismatic projection is more consistent with the scan (c) and leads to a surface reconstruction with less ambiguity (d).

Figure 10 :Figure 11 :
Figure10: Laser beams (a).Surface reconstructed in a Cartesian coordinate system (b) and a prismatic coordinate system (c).The use of Cartesian coordinate system may leads to a surface that "goes through" the wall because it is attracted by points of the room ceiling behind the facade (1).The surface is also attracted to a wall light (2).The surface reconstructed using the prismatic coordinate system follows the beam trajectory, it goes trough the window (1) and wraps the wall light (2).

Figure 12 :
Figure 12: Inconsistency between the surface and the laser beams (a) An object hides a part of the wall behind.(b) Prismatic coordinate system: The reconstructed surface does not fit everywhere with the actual surfaces, but is not pierced by the laser beams.(c) Cartesian coordinate system: The reconstructed surface is intersected by two laser beams (1).

Figure 14 :
Figure 14: Three parameterizations for the same facade piece (three-quarters and profile view).From left to right, the data term increases and the smoothness term decreases.The surface becomes less simple and more rugged, but closer to the lidar points.(a) λpixel = 0.01 and λneigh = 0.99 (b) λpixel = 0.10 and λneigh = 0.90 (c) λpixel = 0.90 and λneigh = 0.10