QUALITY ASSESSMENT OF BUILDING TEXTURES EXTRACTED FROM OBLIQUE AIRBORNE THERMAL IMAGERY

Thermal properties of the building hull became an important topic of the last decade. Combining the thermal data with building models makes it possible to analyze thermal data in a 3D scene. In this paper we combine thermal images with 3D building models by texture mapping. We present a method for texture extraction from oblique airborne thermal infrared images. We put emphasis on quality assessment of these textures and evaluation of their usability for thermal inspections. The quality measures used for assessment are divided to resolution, occlusion and matching quality.


INTRODUCTION 1.1 Motivation
Urban areas and buildings are one of the most important subjects being investigated using photogrammetric and remote sensing methods.Often, images taken from a flying platform are used for such purposes.Oblique airborne images taken from different direction make it possible to map the building hull on 3D data, such as point clouds (Höhle, 2014) or 3D building models (Früh et al., 2004).Such textured 3D data can be used to investigate diverse properties of buildings and phenomena.Visible images are used facade reconstruction (Ripperda and Brenner, 2006), damage assessment (Kerle et al., 2014).Using a thermal camera, the energy efficiency of buildings can be assessed and heat leakages can be detected (Hoegner et al., 2007, Borrmann et al., 2012, Westfeld et al., 2015).
In the recent years there have been many research projects concentrating on thermal properties and heat losses in (Meng andStilla, 2007, Fondazione Bruno Kessler, 2014).Mobile Mapping Systems may be used to quickly acquire thermal data for entire districts (Hoegner et al., 2007) or cities (Chandler, 2011).Thermal data can be combined with a point cloud.Such method was presented for an indoor application (ThermalMapper, 2013).A laser scanner and a thermal camera are mounted on a robot for mobile mapping of building interiors.Thermal information is mapped on the acquired point cloud.In another research project, the Automated Rapid Thermal Imaging Systems Technology (ARTIST) was developed to quickly identify inefficient buildings by detecting heat losses through walls, roofs, doors and windows (Phan, 2012).Thermal data collected in urban areas can be also used for an online system with an open access.HEAT (Heat Energy Assessment Technologies), a GeoWeb service, is provided in Calgary, which can be used by house owners to view their building quality, or by maintenance companies to verify building quality and to monitoring over space and time (HEAT, 2013).This system stores thermal images of building roofs together with address information and detected hot spots.The cost per day of heating the home and CO2 emission are estimated based on the thermal data that was acquired with a thermal pushbroom scanner TABI-320 delivering stripes 320 pixels wide.A similar system is available for the island Jersey in the Channel Islands (States of Jersey, 2013).
The thermal images can be combined with 3D building models via texture mapping.Due to narrow streets in urban areas, acquiring the data from a terrestrial platform requires relative orientation, matching with the 3D building models using a generated point cloud and automatic mosaicing of oblique image sequences in order to create high resolution thermal textures (Hoegner, 2014).By mounting the thermal camera on a flying platform, the resolution of extracted textures is usually lower, but it helps to avoid artifacts in the textures which are result of mosaicing.Besides, from the bird's eye view the roofs and facades in inner yards, which are not visible from the street view, can be captured.

Related work
Texture mapping on 3D models is a widely used technique, especially in computer graphics, and results in adding an image to the existing 3D geometry.Photorealistic (Weinhaus andDevarajan, 1997, Allen et al., 2001) and non-photorealistic (Klein et al., 2000) textures, however, can be distinguished.For photorealistic texturing, the assignment of the images of a real scene to the corresponding 3D model has to be accomplished.
Texture extraction has already been implemented in several commercial software and imaging systems.One of such systems, Semi-Automated Modeling and Texturing System (SAMATS), was presented by (Hegarty and Carswell, 2009).This system produces textured building models from a set of geo-referenced terrestrial images.Similarly, the system Toposcopy (Groneman, 2004) has been developed to create photorealistic 3D models.It uses photogrammetric methods for linking a 2D map to terrestrial images.(Grenzdörffer et al., 2008) use MultiVision, a commercial software, to texture the created 3D models semiautomatically.
Various imaging systems have been applied for texture mapping.(Wang et al., 2008) used the system Pictometry for this purpose.This system consists of five cameras with one nadir-looking camera and four oblique looking cameras which are mounted on a flying platform.This system found a variety of applications including urban planing, 3D modeling, and emergency response (Karbo and Schroth, 2009).The system PFIFF (Grenzdörffer et al., 2008) is also based on an oblique looking camera, which is integrated with a GPS receiver.Oblique view geometry require special treatment for flight planning (Grenzdörffer et al., 2008) or measurements (Höhle, 2008).Texture mapping is also possible using a push-broom instrument.(Lorenz and Döllner, 2006b) textured 3D building models using High Resolution Stereo Camera (HRSC) mounted on an aircraft.
In most studies, the texture quality was introduced as a value used to select the best texture.Such quality calculated for the selection procedure can be stored with the texture.It is an abstract value, however, which can be interpreted to compare the quality between faces, but does not give information any about the level of detail of the texture and its fit to the 3D model.Some researchers, therefore, calculate local resolution for every pixel.(Lorenz and Döllner, 2006a) analyzed the quality of texture extracted from airborne images taken with an HRSC camera and created quality maps consisting of local effective resolution.Similar resolution maps for textures are also presented in (Hoegner and Stilla, 2007).(Hoegner et al., 2012) assess the matching quality between the image sequence and the building model by analyzing the extracted textures.Textures from different sequences at different times and with different orientation parameters are compared through correlation and assessed visually.This method does not give any independent measure that could express the quality of fit between the model and the extracted texture.

Paper Overview
In this paper we present a method for texture extraction from oblique airborne thermal infrared images which is described in Section 2..We put emphasis on quality assessment of these textures and evaluation of their usability for thermal inspections.The quality measures used for assessment are divided to resolution, occlusion and matching quality and are presented in Section 3..Experimental results are presented in Section 5. and discussed in Section 6..

TEXTURE EXTRACTION
Textures are extracted from thermal images based on the 3D model projection.This procedure is schematically presented in Fig. 1.The 3D building model is projected into the image based on initial exterior orientation given by the camera calibration and navigation data.First, the visibility is checked in order to select the visible faces of the model for each frame.
Then, a model-to-image matching is carried out in key-frames.Applied matching is a line based matching and is supported by tracking lines between the frames.As result of the matching procedure, corrected exterior orientation parameters for every frame are calculated.This method is described in detail in Anonymous (2014).In the next step, thermal textures are extracted.Typically, buildings or building complexes need to be captured from multiple views in order to gain textures for all faces.Often, thermal images are taken in a sequence.Because of it, some faces may appear in many frames.Therefore, a best texture selection procedure is implemented.This procedure is based on a general texture quality results calculated based on distance, occlusion and viewing angle.This results in one-to-one assignments between the faces and images.This procedure is described in more detail in Anonymous, 2010.

Resolution
The texture's level of detail depends on its resolution.The resolution of 3D objects seen on the image plane is usually not unique along their surfaces.Unique resolution is possible only for planar objects that are parallel to the image plane.In nadir view photogrammetry, the ground resolution of the images is usually expressed using a ground sample distance (GDS), which is the distance between the pixel centers on the ground.It is calculated using the intercept theorem: where s is a distance on the ground, s is its image in the sensor, c k is the camera constant, and H is the flight height.If s = 1pix, then s is the ground sampling distance.Here, it is assumed that the ground is parallel to the sensor; therefore, all pixels have the same ground resolution.In oblique view, the GSD varies within the image significantly; it is smaller in the foreground and bigger in the background.The GSD does not give any information about the resolution of the 3D objects, such as fac ¸ade or roofs, which is the most interesting aspect for texture mapping.Therefore, a local resolution for every object is defined as the length of a line segment placed on this object, which is depicted within one pixel.This line segment is parallel to one of the axes of the image coordinate system.Accordingly, two resolutions for one pixel can be calculated: in xand in y-direction of the camera coordinate system.
An oblique view is equivalent to a nadir view of a sloped surface.This representation is suitable not only for the ground surfaces but also for other surfaces, e.g.fac ¸ades or roofs.In this representation, a planar surface can be defined for each pixel.This surface is parallel to the sensor and intersects with the photographed surface in the intersection point of the ray from the middle of the pixel with the photographed surface (Fig. 2a).
Figure 2: Detailed geometry for calculations of the resolution If the distance Di, which is the distance from the projection center to the photographed surface is known, the resolution of this parallel surface can be easily calculated using (1) by replacing H with Di, which results in Here, the index i denotes the pixel; however, in many cases, the photographed object is rotated by an angle where − → n is the normal vector of the photographed surface and − → z = [0, 0, 1].For every γi > 0, the length of the line segment on the photographed object is li > si.The ray from middle of the pixel does not intersect the line segment on the photographed object in the middle of this segment, but instead divides this segment into two line segments with the lengths li−1 and li−2 respectively (Fig. 2b).To calculate li, the triangles ∆A1B1P and ∆A2B2P should be solved.Using the Law of Sines, li−1 is calculated from where αi−1 = 180 • − (90 where αi−2 = 90 Here δi−1 = φi − φi−1 and δi−2 = φi−2 − φi.The length li is calculated as the sum of li−1 and li−2: φi is calculated by solving the triangle ∆OO P as follows If s = 1 [pix], then δi−1 and δi−2 are very small angles.If we assume that δi−1 Another simplification is presented in Fig. 3.Here li is length of the line segment, which has to be orthogonally projected onto the surface parallel to the sensor to fill one pixel Figure 3: Simplified geometry for calculations of the resolution The difference between the three ways to calulate li (6, 10 and 11) is not significant.Only for very large γ the difference in li calculated using these three equation is remarkable.However, such textures, where γ is very large are not useful at all and should not be used for texture mapping.Thus, in practice, the simplest (11) can be used.Di is given by the depth image, while c k -camera constant and s -pixel size comes from the camera calibration.
The resolution is calculated for every pixel in x-and in y-direction.This resolution is stored in form of resolution maps.The resolution maps are textures of the same size as the corresponding thermal texture and store the x-and y-resolution for each of the texture.

Occlusion
In general, it can be distinguish between selfocclusions and extrinsic occlusions.Self-occlusiosn is the occlusion of some faces of the 3D model by other faces and is calculated based on the 3D model acquisition geometry.Extrinsic occlusions is the occlusion of model faces by objects which are not modeled in the 3D model, such as cars, pedestrians, or trees if they are not included in the 3D model.The extrinsic occlusions are typically detected from multiple views (Böhm, 2004) or from additional data (Bénitez et al., 2010).
The self-occlusions can be permanent or temporal.Permanent occlusions occur independently on the viewing direction, for example between two touching buildings.Temporal self-occlusions depend on the viewing direction of the camera and can change from frame to frame (Anonymous, 2010).These occlusions are treated separately and stored together with the textures as occlusion maps.
The occlusion is also expressed as a single quality measure per face per frame.It gives information about which percentage of the texture can be seen in a frame.The occlusion factor oij is defined as where nvis is the number of visible pixels in face, j in frame i and N is the number of pixels occupied by face j.The quality oij ∈ [0, 1] takes value oij = 1 for fully visible textures.

Matching Quality
Matching quality is introduced to measure how precisely the model was projected onto the image and how accurate the model is itself.Inaccuracies in data acquisition, the creation process, or the generalization can result in a faulty model.The matching quality of a texture tij assigned to face pj in frame fi is calculated using where A ijk denotes the area between the projected model line segment and the actual corresponding line segment in the image, l ijk IM denotes length of the projected model line segment, Kj denotes number of sides in the face polygon pj (Fig. 4).
is used.To assess the matching quality for a face pj among all frames is calculated.Combining ( 14) and ( 15), the matching quality ν of the whole matching process is computed as In ( 14) -( 16) J denotes the number of visible faces in a frame fi, and I denotes the number of frames.

DATA DESCRIPTION
In order to capture all of the fac ¸ades and roofs in the test area, an appropriate flight campaign was designed and conducted.During this flight campaign, the helicopter flew four times above the test area, recording a sequence of oblique TIR images.As a result, four crossing stripes where captured.This flight campaign took place in autumn in the early afternoon.The test area is a densely built-up downtown area, where the buildings create ring-shaped structures with inner yards.

RESULTS
For each face of the model, a TIR texture was created.Those textures were mapped on the 3D geometry in CityGML format and displayed using Autodesk LandXplorer (Fig. 5).
Figure 5: 3D building model with TIR textures The matching quality was used to evaluate performance of the coregistration.For this purpose, the reference polygons were drawn manually in 70 frames.These reference polygons were used to calculate the quality measure ν from ( 16) with and without line tracking.
First, the quality measure was computed for the initial model projection using the exterior orientation parameters after system calibration.Then, the quality measure ν was calculated for projection using the exterior orientation parameters corrected by matching.These results are presented in Tab. 1.The quality measure ν was also calculated after tracking.These results are presented in Fig. 6.This experiment was carried out for varying key-frame intervals.
Figure 6: Quality measure ν calculated after tracking with varying key-frame interval

DISCUSSION
The introduced quality measures allow to evaluate single textures and whole process of texturing.
In this paper, matching quality was used to evaluate the usability of tracking as mean to speed up the matching procedure without loosing accuracy.By tracking the line segments assigned to the 3D model from frame to frame, the search area is restricted, and the time needed for calculation is reduced.
Analysis of the matching quality showed that tracking is sufficient for finding the line correspondences needed for camera pose estimation.

Figure 4 :
Figure 4: Calculation of matching quality The thermal images were taken with TIR camera AIM 640 QLW FLIR with a frame rate of 25 images per second, which was mounted on a platform carried by helicopter.The flying height was approximately 400 [m] above the ground level.The camera was forward looking with an oblique view of approximately 45 • .The size of the chip was 640x512 [pix 2 ].The helicopter flew over the test area four times, recording four strips of IR image sequences.Each strip consists of almost 130 frames.

Table 1 :
Quality measure ν calculated after system calibration and after matching in every frame ν [pix] ν [pix] after system calibration after matching in every frame 4.74 1.48