TERRAIN AWARE IMAGE CLIPPING FOR REAL-TIME AERIAL MAPPING

Many remote sensing applications demand for a fast and efficient way of generating orthophoto maps from raw aerial images. One prerequisite is direct georeferencing, which allows to geolocate aerial images to their geographic position on the earth’s surface. But this is only half the story. When dealing with a large quantity of highly overlapping images, a major challenge is to select the most suitable image parts in order to generate seamless aerial maps of the captured area. This paper proposes a method that quickly determines such an optimal (rectangular) section for each single aerial image, which in turn can be used for generating seamless aerial maps. Its key approach is to clip aerial images depending on their geometric intersections with a terrain elevation model of the captured area, which is why we call it terrain aware image clipping (TAC). The method has a modest computational footprint and is therefore applicable even for rather limited embedded vision systems. It can be applied for both, real-time aerial mapping applications using data links as well as for rapid map generation right after landing without any postprocessing step. Referring to real-time applications, this method also minimizes transmission of redundant image data. The proposed method has already been demonstrated in several search-and-rescue scenarios and real-time mapping applications using a broadband data link and diffent kinds of camera and carrier systems. Moreover, a patent for this technology is pending.


INTRODUCTION AND MOTIVATION
Generating digital aerial maps, respectively (true) orthophoto mosaics from aerial images is a common but usually complex and time consuming process.The computational complexity is mainly caused by image matching procedures, which are required by aero-tringulation and stereo-photogrammetric processing to derive point correlations and a 3D model of the considered scene.Conversely, more and more applications require an immediate situation picture from aerial perspective either directly after landing or even in real-time with use of an airborne data downlink.Next to the ability to precisely measure position and orientation of the image sensor (direct georeferencing), these kinds of application need fast image mosaicing algorithms that can operate in-step with the raw image acquisition.
Simply put, aerial image mosaicing is to project and adjust overlapping images in a preferably seamless and accurate manner.The necessary image overlap may be small or large, depending on flight speed, flight altitude, terrain height, camera's field of view and trigger rate.When it comes to render maps by using projective mapping techniques, a key issue is the selection of suitable image parts of the overlapping aerial images.An optimal image area selection provides a minimum of redundant image pixels to be rendered while keeping a seamless coverage of the whole captured area.Such an optimal image area selection could increase the visualization performance, since redundant image areas can be excluded from processing or render stages.Furthermore, such a selection represents a minimum of relevant image data of the captured area.This in turn enables a minimal data stream, which is important for real-time applications using limited capacity data links.Finally, a good image area selection may also increase the overall quality and accuracy of the generated aerial maps by selecting only the best image areas, e.g. in terms of radiometric or geometric properties or a preferred angle of view.

STATE OF TECHNOLOGY
There is a wide range of established commercial products that provide tools for generating maps and other geospatial data products, such as true orthophoto mosaics (TOM), digital surface models (DSM), point clouds or 3D meshes from aerial imagery (ESRI Inc., California, 2018b, ESRI Inc., California, 2018a, Pre-cisionHawk Inc., North Carolina, 2018, CGI Systems GmbH, Germany, 2018, nFrames GmbH, Germany, 2018, Vexcel Imaging GmbH, Austria, 2018).
When it comes to mapping or image mosaicing one can distinguish two different approaches: stitching methods that work solely in the texture domain and photogrammetric methods that work in the spatial domain exploiting the imaging geometry (Ghosh andKaabouch, 2016, Tjahjadi et al., 2017).Both approaches usually use some kind of image correlation, resulting for example in a set of matched tie points.While the former methods use these points for stitching procedures (Pravenaa, 2016, Cubic MotionDSP Inc., 2018), the latter use them for bundle adjustment followed by stereo-photogrammetric methods or structure from motion (SfM) approaches (Pix4D SA, Switzerland, 2018, Agisoft LLC, Russia, 2018, Trimble Inc., California, 2018, Tomasi and Kanade, 1992).Almost all approaches have two characteristics in common: They need a completed record and they require a major computational effort due to posterior image processing steps.In contrast, only few approaches address real-time mapping applications, which provide geospatial products just after landing or even during flight without the need for computational complex processing stages.
Kurz et.al demonstrated an aerial camera system with a broadband data downlink for real-time mapping applications and sub-sequent image analysis for real-time traffic monitoring applications (Avbelj andKurz, 2016, Kurz et al., 2012a).This system transmitted the whole image to a ground station, which is -in combination with the used 54 MBit/s downlink -why the trigger frequency of the aerial camera system was set to only about 0.1 Hz (Kurz et al., 2012b).A similar system is described by Römer et al. in 2013, in which a bunch of on-board computers calculated rectified and JPG compressed ortho images, in order to send them via a data downlink to a ground station (Römer et al., 2013).Here again, no overlap analysis or geometric clipping was applied, thus full frame images had to be transmitted.
Another real-time mapping system was shown in 2016 within the Clockwork Ocean project by IGI mbH (Helmholtz-Zentrum Geesthacht, 2016).In this experiment, thermal airborne imagery was mapped onto a moving map during flight.The images were not transmitted to a ground station, but presented to an on-board operator.Unfortunately, IGI does not provide any information about the applied mapping technology.Kekec et al. proposed an approach of real-time mosaicing of aerial images, which uses collision detection techniques to identify overlapping image sections (Kekec et al., 2014).A subsequent feature matching between those determined overlapping sections is then used for a continuous homography estimation between a limited number of lastly captured aerial images.By processing only potentially overlapping image sections and limiting the number of pairwise matched images, computation effort is minimized preserving real-time capabilities.

Prerequisites
The proposed algorithm is based on spatial intersection between camera rays (i.e.rays of its corresponding pinhole camera model in space) and an elevation model of the captured area.It therefore requires an inner orientation (i.e.focal length, sensor size, optionally distortion model) of the applied camera(s), an exterior orientation (i.e.position and orientation) for every single aerial image and an existing terrain elevation model E, specifying an elevation value for every two-dimensional geographic coordinate within the considered target area (i.e.E : R 2 ⇒ R) (Hartley and Zisserman, 2004, Heckbert, 1989, Kannala and Brandt, 2006).
Measurement of the exterior orientation is usually denoted as direct georeferencing (Cramer et al., 2000, Cramer, 2001).It relies on simultaneous determination and integration of the absolute 3D position by means of Global Navigation Satellite Systems (GNSS) and the cameras orientation measured by inertial sensors (IMU).Since we do not rely on subsequent bundle adjustment in our real-time applications, we require that the accuracy of direct georeferencing has to be sufficient enough to enable seamless image mosaics.The projection accuracy depends primarily on measurement accuracy of inertial sensor, on flight altitude (i.e.height above ground), cameras angle of view and the precision of the applied elevation model.The inertial sensor and elevation model we applied for our evaluations are noted in section 4.
Intersecting all four sensor corner rays of an aerial image with a given elevation model results in a rough footprint approximation of the images' captured area.Generally, the footprints of two consecutively recorded aerial images overlap more or less, depending on aforementioned conditions (Hein et al., 2017, Snyder, 1987).The aim of the algorithm we propose is to cut out these overlapping areas by determining appropriate (rectangular) image areas (respectively region of interest, ROI) for every single overlapping image.

Algorithm
The following algorithm determines sections for two overlapping aerial images, so that the overlapping area is minimized while keeping a seamless coverage of the captured area between both (clipped) images.Lets assume two aerial images A and B and the position PA and PB and orientation OA and OB of the corresponding camera at time of acquisition (see figure 1a).Furthermore and without loss of generality 1 lets assume, A and B have an overlap within their captured area along their image's y-axis, so that the upper region of A and the lower region of B captured the same area on the ground (from slightly different perspectives).The full frame areas are named roi(A) and roi(B).The optimized sections roi (A) and roi (B) and their projections A and B for both images, which minimize the overlapping area, result from following steps: 1. Determine horizontal geographic centre M between both camera centres: M = (PA + PB)/2 (see figure 1b).
2. Determine base point M on the elevation model's surface below M .This base point is formed by the horizontal components (x, y) of M and the elevation E(M ) at M as defined by the applied elevation model as vertical component.This 3D point will serve as pivot element for the following intersection steps.
3. Reproject M back to both cameras sensors, yielding one pixel coordinate for each aerial image pA(M ) and pB(M ).
4. Identify the horizontal pixel lines lA and lB in both images, containing the pixel coordinates pA(M ) and pB(M ).Determine the four pixel rays r(lA1), r(lA2), r(lB1) and r(lB2) of the lines' endings, i.e. the rays of the left-most and the right-most pixel for each line lA and lB.
5. Intersect the four obtained rays with the elevation model, yielding four intersection points on ground IA1, IA2, IB1 and IB2 (see figure 1c).

Section roi (A)
for A results from smallest lower area of A, which contains pA(C1) and pA(C2).Section roi (B) for B results accordingly from smallest upper area of B, which contains pB(C1) and pB(C2).Note: since the described algorithm minimizes the overlap just along the image's yaxis, both picture sections extend from left-most side of the image to the right-most side.The upper section includes the first row, the lower section the last pixel row (see figure 1f).
1 refer section 3.4 a) While the first three steps roughly generate a first 'marker' for a potential 'cutting line' in both images, the next four steps (4.-7.) serve as a refinement of this cutting line.Without this refinement, possible rotations, in particular around the vertical (yaw) axis, between A and B could result in gaps in between clipped images at the edges (see figure 1d).This refinement ensures a seamless coverage of the clipped images even at the edges by 'widening' both picture sections, if necessary (see figure 1f).

Generating maps
By repeated application of this procedure for every pair of consecutively recorded aerial images, the overlapping image areas are minimized while keeping a seamless coverage of the whole captured area.Figure 2 illustrates the overlap minimization for consecutively captured aerial images in profile view.Figure 3 demonstrates a sample scene and the application of the algorithm with a before-and-after comparison.The scene comprises 392 aerial images (6.15 billion pixels = 11.5 GB raw data), taken with a triple sensor camera head (oblique leftwards, nadir centre, oblique rightwards).The upper image displays the raw (full frame) overlapping footprints of all aerial images.The second image displays the full frame projection of all images.The third image shows the footprints after application of the presented algorithm.The lower image shows the textured footprints.Our algorithms reduced the data size of the overall image mosaic by about 86% (resulting in 873 million pixels = 1.6 GB raw data).Note, that in this sample some areas at the borders are not covered after image clipping.This effect may occur in cases of stronger alternating roll angles.For standard meandering straight flight lines with a more or less steady roll attitude, covered areas will not be affected.

Additional remarks
The proposed algorithm states some presuppositions, regarding the relative arrangement of any two overlapping images, such as the overlapping image edges or the translation axes between both images.Generally, the algorithm applies for any configuration of two overlapping aerial images, including images from different camera sensors and even oblique set-ups.The corresponding axes and directions, along which the overlapping is to be optimized, define the mapping between the camera set-up and the formalized elements in the described algorithm.

RESULTS
For all samples, applications and demonstrations referred in this paper we used the SRTM-90 dataset as elevation model (The Shuttle Radar Topography Mission (SRTM), 2000).It covers the entire globe and is free of charge.Nevertheless, other (e.g. more precise) elevation models could be used as well.In general, a more precise elevation model will increase projection accuracy, particularly in mountainous terrain.However, when using nadir aerial images, i.e. the angle of view is near to perpendicular axis, height errors will only have a moderate effect on the result.
The nadir mounted camera systems we used so far were all equipped with high-grade2 GNSS receiver and mid-grade3 MEMS-based IMU sensors, resulting in a typical projection precision in the order of decimetres at flight levels of about 300...500 metres above ground, referred to consecutive images.The absolute positional accuracy of direct georeferencing with the aforementioned set-up has been evaluated in (Brauchle et al., 2018).It states a typical horizontal position error of less than 2 metres for the configuration with a standard utility aircraft and a flight altitude of 2, 600f t (approx.780m) above ground.

Comparison
We further evaluated processing time and results of generating quick orthophoto mosaics between our approach and two commercial photogrammetric solutions AgiSoft PhotoScan Professional4 and Pix4DMapper 5 , which implement a structure-frommotion (SfM) based approach (Tomasi and Kanade, 1992).
The sample scene comprises 254 aerial images, each with a resolution of 4864x3232 pixels (16MPix) and 12bit radiometric depth, provided as 16bit tiff images.The average spacial resolution (ground sampling distance, GSD) is about 2.7cm, the captured area is about 464m x 607m (0.28km 2 ).Each approach was provided the very same input data, which consists of 254 tiff images, a WGS84 position and orientation 6 for each image file and nominal camera parameters such as pixel pitch, sensor size and focal length.In both commercial tools, each setting was set to the fastest option, which means in particular no color correction, no compression and no hole fittings where available.No ground control points (GCPs) were used.The GSD of the to be generated orthophoto mosaic image was set to native resolution of about 2.7cm per pixel.All tests where performed on the same machine, a decent workstation with a NVidia GTX1070 graphics card, which is used by AgiSoft and Pix4DMapper for hardware accelerated image processing steps.For each tool, the overall processing duration was measured from image loading up to the completion of orthophoto mosaic export.Finally, all approaches generated a georeferenced orthophoto mosaic as tiff image file with a resolution of about 16.000 x 23.000 pixels.Regarding positional accuracy, all mosaics coincide with a maximum variation below 2 metres.Due to the lack of GCPs, an absolute position accuracy evaluation was not performed.The generated mosaics are shown in figure 4, the corresponding processing times are shown in figure 5.
As seen in figure 5, both commercial solutions required considerably more time to complete the orthophoto mosaic generation.This is particularly due to the expensive pixel based image co-registration steps, which are required by both SfM-based approaches.In contrast, TAC relies completely on direct georeferencing solution, thus does not perform any computer vision processing.In fact, generating orthophoto mosaics with TAC is hardly more then successively non-affine transformed painting of clipped aerial images.

APPLICATION
There are two kinds of application that benefit from TAC algorithm: Rapid mapping applications, providing a fast aerial perspective situation picture and real-time mapping applications, which require instantaneous mapping and possibly image data transmission during flight.The proposed method was implemented and successfully tested within both scenarios, which is presented in the following.

Rapid Mapping
The presented algorithm allows for generating quick orthophoto mosaics of captured areas.Since it prefers the nadir looking image parts of overlapping aerial images, it reduces occlusion effects caused by adverse perspectives.Furthermore, in the case of nadir mounted cameras, the algorithm generally favours central image areas.This minimizes geometric discontinuities between adjacent images caused by radial distortion.Finally, central image parts are generally less affected by vignetting effects, thus resulting in radiometrically more homogeneous mosaics.
The algorithm is implemented and part of the MACS-SaR rapid mapping system for disaster management (Hein et al., 2017).This system was demonstrated and successfully evaluated for search-and-rescue missions within an international rescue exercise ACHILLES 2017 led by the United Nations in Switzerland (NRZ, 2017).It enables generation and visualization of highresolution aerial maps of captured areas up to several square kilometres within minutes after landing.Figure 6 shows the UAV carrying aerial camera system, purpose-built for rapid mapping applications, and the usage of on-site generated maps for disaster assessment, coordination and management tasks.In total, during this exercise several regions with a size of some square kilometres were captured in a few thousand aerial images.For each region an aerial map was generated immediately after landing and distributed to the task forces (Kraft et al., 2018).

Real-time Mapping
When it comes to real-time mapping with application of (capacity limited) broadband data downlinks, the presented algorithm provides a redundancy optimized selection of image data.Thus, image data to be transmitted is reduced to a minimum, while keeping full coverage and full geometric and radiometric resolution of the images.A streaming version of the TAC algorithm clips every recorded image in consideration of its previous and its subsequent captured image, before it is compressed and transmitted via a data downlink.This procedure was implemented and demonstrated within a real-time monitoring experiment in August 2017 over the North Sea off Cuxhaven, Germany (Brauchle et al., 2017).It was shown, that the redundancy optimized image data allows a seamless real-time mapping of the recorded area with full geometric and radiometric resolution through an average data link rate of about 5-10 MBit/s.Throughout the entire experiment, roughly 12 GB of raw image data was transmitted via the down- link.From these image data, real-time aerial maps were continuously rendered in the ground station, spanning a covered area of more than 100 square kilometres (see figure 7).

CONCLUSION AND OUTLOOK
The proposed algorithm reduces redundancies of overlapping aerial imagery for rapid mapping purposes.Depending on the degree of overlap it may considerably reduce the amount of image data to be rendered or to be transmitted (in case of applications requiring data transmission links), while keeping a seamless coverage of the captured area.At the same time, due to the preference for central image areas, it may increase the quality of generated maps in terms of radiometric homogeneity and / or geometric accuracy.The algorithm was implemented and tested for various platforms and use cases.It was successfully demonstrated within several projects, ranging from rapid mapping applications, such as on-site disaster management support to real-time applications using data downlinks for just-in-time aerial map generation and visualization.It is applicable for any set-up of aerial camera system, which provides a (approximate) pinhole camera model and a position and orientation date (direct georeferencing) for every captured image to be rendered and / or transmitted.Furthermore, the algorithm has a very small computational footprint with constant time complexity and does not require any image analysis, which allows its application even on lightweight embedded imaging systems like small UAV-based aerial cameras.Given a sufficient position and orientation quality, seamless orthophoto mosaics can be generated significantly faster than structure-frommotion based approaches -without any need of accelerating hardware.
However, several approaches may extend the range of application and are currently discussed and tested.The proposed algorithm prefers nadir views over oblique image areas.In the case of oblique aerial imaging with focus on e.g.fac ¸ade visualization, the proposed pivot element M could be replaced by any other non-nadir surface point.This pivot element could even be determined by a pixel ray of a specified angle of view.
The proposed algorithm is currently applied only on pairs of consecutively recorded images.In case of meandering flight patterns or even arbitrary arranged sets of aerial images covering a common area, one may intersect any overlapping pair of aerial images to optimize image areas to be rendered.Data structures like quadtrees make it easy and fast to find overlapping images within the set.
Another extension to our basic approach addresses the granularity of geometric projection.While the currently proposed algorithm extracts one (rectangular) section for every single aerial image, a "tiling version" of this algorithm, possibly supporting levelof-detail techniques, could extract several (distinct) image areas and their projection.This would increase the positional accuracy of computed projections, in particular for highly structured areas like mountains or deep valleys.
Furthermore, the current implementations do not apply any further image registration procedures.Similar to the approach of Kekec et.al (Kekec et al., 2014), lightweight image matching methods could be used to enhance the real-time georeferencing solution.The overlap analysis done with our algorithm could limit the image parts to be analysed, which in turn would minimize computational effort for the image co-registration methods.
Finally, we are evaluating the application of fast and lightweight sparse matching techniques to derive an approximate elevation model in real-time.This could eliminate the need for a priori knowledge of a terrain elevation model of the area to be captured.Furthermore, such a real-time derived height model providing better spatial resolution could also enhance the overall quality of generated maps and further derived geospatial data products.

Figure 1 .
Figure 1.Terrain aware image clipping (TAC) algorithm: a) Full frame projection of two overlapping aerial images.b) Projection M of centre M between both cameras and reprojection of M into both sensors (pA(M ) and pB(M )) and corresponding sensor lines lA and lB.c) Projection of sensor line endings to the ground IA1, IA2, IB1 and IB2.d) Cropping of aerial images by application of lA and lB and their projections.e) Determination of centres C1 and C2 between line endings of projection of lA and lB and reprojection of C1 and C2 into sensors pA(C1), pA(C2), pB(C1) and pB(C2).f) Final clipping roi (A) and roi (B) by application of these reprojection points and their projections.

Figure 2 .
Figure 2. Aerial imaging of mountainous terrain and overlap visualization of its view cones (profile view).Upper figure: full frame coverage and view cone overlap.Lower figure: minimized view cone overlap of clipped aerial images by application of TAC algorithm.

Figure 3 .
Figure 3. Map generation using TAC algorithm: Sample scene comprising 392 aerial images, captured by a triple sensor camera head.Full footprint rendering (upper images) and clipped images by TAC algorithm (lower images).By application of TAC algorithm, image data to be rendered was reduced by 86%.

Figure 6 .
Figure6.Rapid mapping with TAC algorithm for disaster management: International rescue exercise ACHILLES 2017 led by the UN in Switzerland with application of an integrated rapid mapping system.Autonomous UAV (maximum take-off weight: 14kg, wingspan: 3, 50m, vertical take-off and landing) with MACS aerial camera system and mobile map viewer for disaster assessment, analysis and coordination.

Figure 7 .
Figure 7. Real-time mapping using TAC algorithm: Aerial images are captured, clipped and transmitted via a 10 MBit/s data downlink to a ground station (upper figure), where mapping software generates seamless aerial maps from the transmitted image data (lower figure).The latency between image acquisition and visualization is about 2 seconds.