OPTICAL FLOW FOR GLACIER MOTION ESTIMATION

Quantitative measurements of glacier flow over time are an important ingredient for glaciological research, for example to determine the mass balances and the evolution of glaciers. Measuring glacier flow in multi-temporal images involves the estimation of a dense set of corresponding points, which in turn define the flow vectors. Furthermore glaciers exhibit rather difficult radiometry, since their surface usually contains homogeneous areas as well as weak texture and contrast. To date glacier flow is usually observed by manually measuring a sparse set of correspondences, which is labor-intensive and often yields rather irregular point distributions, with the associated problems of interpolating over large areas. In the present work we propose to densely compute motion vectors at every pixel, by using recent robust methods for optic flow computation. Determining the optic flow, i.e. the dense deformation field between two images of a dynamic scene, has been a classic, long-standing research problem in computer vision and image processing. Sophisticated methods exist to optimally balance data fidelity with smoothness of the motion field. Depending on the strength of the local image gradients these methods yield a smooth trade-off between matching and interpolation, thereby avoiding the somewhat arbitrary decision which discrete anchor points to measure, while at the same time mitigating the problem of gross matching errors. We evaluate our method by comparing with manually measured point wise ground truth.


INTRODUCTION
It has become common practice to use photogrammetric tools to acquire and analyze the motion of glaciers.The corresponding points in the images, which define the glacier motion, are still often measured manually.In this work we show how to automate the process by applying optical flow techniques to the problem.Optical flow algorithms have advanced to a state at which they can densely compute motion vectors at every pixel despite the difficult radiometry of glaciers.In this work we compare several methods on a data set with available manual measurements and perform a thorough evaluation using different error metrics.The evaluation shows that in areas where the glacier surface is properly visible modern optical flow methods are competitive with human observers.

Glacier Motion
The motion observed at the surface of a glacier is due to gravitational deformation of the ice and sliding at the base.The resulting ice flow transports mass from areas of snow accumulation to lower areas with mass loss by melt.The magnitude of the internal ice deformation depends on thickness and surface slope, whereas basal sliding depends on the basal conditions, which are mainly influenced by the seasonally varying water pressure (Paterson, 1994).This results in an overall smooth spatial pattern of the surface flow field.Observed temporal variations in glacier motion range from hours to seasons and decades (Iken, 1977).Thus, knowledge of surface flow fields of glaciers contributes to understanding glacier dynamics and is important data for many glaciological applications.
Direct measurements by geodetic methods (GPS, total station) can only be performed on a limited number of accessible points.Remote sensing has the potential to provide dense spatial coverage over large areas.A wide range of methods (optical/radar, feature tracking/interferometric, airborne/spaceborne platforms) have been applied especially on the polar ice sheets of Greenland and Antarctica (Lucchitta and Ferguson, 1986, Bindschadler and Scambos, 1991, Joughin et al., 1998, e.g.).Investigations of spaceborne systems are restricted to the time of satellite passes and appropriate orbits, which limits application for mountain glaciers with more complex and small scale flow fields.
Ortho-images are a standard product of digital photogrammetric analysis and are suitable for periodically repeated glacier mapping.Areas free of seasonal snow show a variety of features and texture suitable for motion analysis.Early glaciological applications have been implemented for semi-automatic and manual use with analytical stereo plotters (Flotron, 1979).More recently photogrammetric methods became popular to determine fully digital deformation measurements (Kaab, 2002).Analysis of pairs or sequences of ortho-images acquired at different points in time allow one to evaluate the horizontal motion field.The vertical component of the ice motion and the mass balance at the surface can then be separated by subsequent numerical flow modeling (Gudmundsson and Bauder, 1999).

Optical Flow
The problem of image alignment and 2D motion estimation has been studied in computer vision over the past 30 years and has reached an impressive level of reliability and accuracy.Numerous variants are already used in the industry performing tasks such as driver assistance, medical image registration and human motion analysis.In this section we give a short overview of the most common approaches for optical flow estimation, and introduce the basic terminology.
In general there are dense as well as sparse techniques.Sparse optical flow is often preferred in time critical applications and usually performs some kind of feature tracking (Tomasi and Kanade, 1991).In a way sparse techniques have already been applied to glacier flow (Debella-Gilo and Kaab, 2011).Here correspondences are established from correlation coefficients, later mismatches are eliminated by thresholding and manual inspection.
Dense techniques on the other hand compute a motion vector at every pixel.It is usually assumed that the observed motion between adjacent frames is small, although there are techniques which explicitly try to compute large displacement optical flow.A classic model for optical flow estimation was introduced by (Horn and Schunck, 1981).Many contemporary techniques are variants of this method, see the popular benchmark (Baker et al., 2007), thus we use it as basis for this chapter.
Instead of treating each motion independently, a regularizationbased framework is established, which enforces data fidelity and spatial smoothness at the same time.In the original paper the data term is expressed on a per pixel basis, using the well known brightness constancy assumption (BCA).The BCA states that the gray value of a moving pixel remains constant throughout the explored frames (Eq.1): where I denotes the image brightness at a given pixel and timestep, and (u, v) t denotes the optical flow vector.Taylor series expansion at (u0,v0) yields the optical flow constraint Here (Ix,Iy) t = ∇I and It = I(x+u0,y+v0,t+1)−I(x, y, t) is the temporal derivative.Using a variational formulation the data term is finalized as: Integration is performed over the image domain Ω.In the original formulation the penalty function ρD is the quadratic penalty ρD(x)=|x| 2 .From Eq. 2 we can directly observe that the problem is ill-posed, since only a single constraint per pixel can be extracted.In the literature this is known as the aperture problem.The problem is not limited to this specific formulation but also exists when image regions are to be matched, for instance a patch containing a single edge or a textureless area.Furthermore noise can degrade the quality of a matching process.Therefore a smoothness constraint on the motion field is introduced, resulting in an overall energy The parameter λ controls the amount of smoothing.The Euler-Lagrange equations of the energy term, derived with variational calculus, yield necessary conditions for a minimum of the energy functional.Since the functional is given in terms of continuous variables, variational methods naturally lead to sub-pixel accuracy.The resulting energy minimization problem is highly non-convex, because the optical flow constraint, Eq. 2, is only valid in a small local neighborhood.Therefore minimization is performed in a hierarchical coarse-to-fine approach.Using image pyramids, solutions from a lower pyramid level initialize the next-finer level.
The original formulation of (Horn and Schunck, 1981) suffers from the fact that quadratic error functions ρD and ρS lead to over-smoothing at motion boundaries.Different error functions have been proposed.The most popular one nowadays is ρ(x)= √ x 2 + 2 , a differentiable variant of the L1-norm, which is both robust and convex.Larger displacements can be estimated thanks to image warping and postponing any linearisation to the numer-ical scheme (Brox et al., 2004).Certain versions of variational optic flow can be computed in real-time (Wedel et al., 2008).
In contrast to variational methods, which minimize the energy functional by continuous optimization, combinatorial methods which are dominant in stereo matching are less popular in the literature.Mainly because of the enlarged two-dimensional search space.Notable exceptions are (Lei andYang, 2009, Lempitsky et al., 2008).Finally, local methods exist, which do not minimize a global energy function at all e.g.(Rhemann et al., 2011).

ALGORITHMS
In this section we describe in more detail the methods selected for our evaluation.We choose three different algorithms as representatives of the large number of optical flow methods developed over the years.All three algorithms are parallelizable and efficient in terms of memory and speed, which is important due to the sheer size of aerial images used for glaciology.

Total variation optical flow
The first algorithm in our set is a representative of the popular variational approaches.The energy functional is based on the classical formulation already presented in Eqs. 3, 4 and 5. However instead of approximating the L1-norm in order to make the functional differentiable, the integral equations are addressed directly, using a primal-dual scheme.In (Zach et al., 2007) it was proposed to add additional auxiliary variables v to the energy function, to simplify the optimization: where the data term is given by the optical flow constraint Eq. 2, ρ(v):=∇I,v−v0+I(x+v0,t+1)−I(x,t) and θ is a small constant restricting v to be a close approximation of u.The convex energy can be minimized by alternating steps updating either u or v. Fixing v, optimization with respect to u is achieved using the primal-dual algorithm proposed in (Chambolle, 2004).For a fixed u the optimization yields a sum of decoupled pixel-wise energies, which can be minimized individually.More recently an improved primal-dual scheme was introduced by (Chambolle and Pock, 2011), superseding the need for auxiliary variables.Moreover a slightly improved optical flow constraint is used which additionally models varying illumination: The parameter β controls the influence of the illumination term.
The function w is assumed to be smooth and therefore requires additional regularization.The final energy was already proposed in (Shulman and Hervé, 1989), Details of the optimization scheme go beyond the scope of this paper, please refer to (Chambolle and Pock, 2011).

Cost-Volume Filtering
The second algorithm we have selected considers optical flow estimation as a labeling problem.The space of possible solutions for the flow vectors is given as a discrete set, and each pixel is assigned to one element of the set.Because the solution space is no longer continuous, the task reduces to a search problem.A naive solution would be to select the label with the lowest data error for each pixel.Filter based methods, e.g.(Yoon et al., 2006), instead apply a local filter on the energy values for different flow vectors (the cost volume) before assigning the labels.These techniques can also be seen as an approximation to discrete energy based approaches.In contrast to the spatially global smoothing of a conditional random field, smoothness of the flow field is only defined locally.The intuition why similar solutions are achieved is that pixels far apart usually have only little influence on each other.The method we choose for evaluation was proposed recently by (Rhemann et al., 2011).The algorithm mainly differs from previous filter-based techniques by the choice of the filter weights.The guided filter introduced by (He et al., 2010) preserves edges in the input image and has a runtime independent of the filter size.Incorporated into the filter framework the technique achieves highquality solutions, and is competitive with energy-based methods on standard benchmarks.The method can potentially handle both small scale motion structure and large displacements.
More formally, let the set of labels be denoted by L = {1,...L} and l ∈Lbe a label, in our case a displacement (u, v) t .A cost C(i, l) is assigned to each pixel i and label l by evaluating the data term.For a label l the filtered cost at pixel i, C(i, l), is the weighted sum over the neighborhood N (i) of i: wi,j(I)C(j, l).
The final labeling is now given by taking the label li with minimal filtered cost at each pixel i, li = argmin C(i, l) .The key to high quality results is to use an edge-preserving filter.The weights of the guided filter depend on a guidance image I, in our case the reference image: Here µ k and σ 2 k are the mean and variance of the region N (k) centered at pixel k in I.The sum is composed of all image regions containing both pixel i and j.The edge preservation can be seen by considering a region with a single edge.If pixel i and j are on the same side of the edge, the weight wi,j(I) becomes 2, and close to 0 otherwise.For flat regions with σ 2 k <we have wi,j(I)=1for all pixel in the region, which results in a simple averaging filter.An extension to color images is straightforward.For more details please refer to (He et al., 2010).
As data term we select two different popular cost functions.Firstly the truncated absolute difference of the gray values and gradient at the matching points in the reference image I(•,t) and the displaced image I(•,t +1), and secondly the negative normalized cross correlation (NCC).The truncated absolute difference, Eq. 11, has been shown to be robust against illumination changes: The truncation values are denoted by τ1 and τ2 in the equation.
In contrast to this pixel-wise data cost the NCC is defined as a sum over a region N (i) centered at pixel i: where µi,t and µi+1,t+1 are the means of images I(•,t) and I(•,t +1 )over the region N (i) and σ 2 i,t and σ 2 i+l,t+1 are the respective variances.
The major challenge in the approach is the huge label space to be evaluated.In the examples the flow vectors extend over a region of {−75,...,75} 2 pixel, which leads to 22500 labels.That problem is tackled by very fast implementations of the filtering: the weighted filtering can be achieved by a sequence of box-filters with a time complexity linear in the number of pixels.Sub-pixel precision is accomplished by upscaling the second image by an integral factor and evaluating the cost function accordingly with respect to the new, increased label space.

Pyramid Lucas-Kanade Optical Flow
Another widely used approach for optical flow estimation was first introduced by (Lucas and Kanade, 1981).Originally developed for image registration the method is still commonly used for feature tracking and template matching.An image patch around a pixel i undergoes a deformation governed by a parameter vector p, in order to minimize the squared difference to a template T : Here f denotes a warping function parameterized by p.The key idea, compare also Eq. 2, is to perform gradient descent on the sum of squared distance (SSD) energy function, using a Taylor series expansion of the image function at p0: Iteratively solving the corresponding normal equations with respect to p delivers a solution for the parameter vector.In the 2D case it is common to restrict the parameter set to allow for pure translations or affine deformations only (Baker and Matthews, 2004).In order to achieve dense flow the scheme is applied at each pixel individually and embedded into a hierarchical coarseto-fine framework.
The scheme estimates the motion at each pixel independently, therefore the implementation can be easily parallelized.Care has to be taken if the normal matrix is close to singular, for instance in textureless regions.

RESULTS AND DISCUSSION
We evaluate the different optical flow methods on ortho-photos of the Unteraargletscher located in the Bernese Alps, Switzerland.This large valley glacier has two main tributary forming the common tongue with an area of about 23 km 2 and a length of 13 km.
The grayscale ortho-images are divided into three parts of 26, 20.5 and 12 mega-pixel, consisting of the individual tributaries and the tongue.One pixel corresponds to one meter ground resolution.Image pairs were acquired in a temporal distance of one whole year, at 1970/71, 1982/83 and 1997/98 (Fig. 1).Details about the acquisition and image processing chain can be found in (Bauder, 2001).The maximal motion observed over the one year period is about 40m at the tongue area and 90m in the area of the tributaries.
We study three algorithms: total variation (TV-L1), cost volume filtering (CF) and pyramid Lucas-Kanade (Pyr.-LK).Quantitative evaluation is done by comparing the results with manually measured, sparse correspondences (Bauder, 2001).The dataset consists of 5606 measurements on a 50 pixel grid and is only available for the years 1997/98.The accuracy of the manual measurements is evaluated by measuring several feature points twice and independently, see table 1.For all images the differences are in the range of 1 to 3 pixel (=meters).
Figure 1: Reconstruction of all image pairs.Flow vectors are color coded w.r.t.their magnitude (blue=slow flow, red=fast flow).
Because of the large temporal gap between the acquisitions, changes in image appearance are inevitable, which make optical flow estimation challenging.Difficulties include varying snow coverage, non-uniform melt out or crevasse patterns, moving and tumbling rocks, shadows and generally different illumination conditions.The biggest problem is posed by the varying snow cover extent.
The data term measuring visual similarity cannot identify corresponding parts correctly as their appearance is too different.
The second major problem are large textureless regions e.g.due to snow coverage or shadows.Those parts carry only little, if any, information about the motion.Usually optical flow methods deal with textureless segments by propagating information from neighboring areas into those regions.In our case however, the varying borders of the snow cover and shadow borders are the main source of information here.Since these are completely independent from the motion of the glacier, the quality of the reconstructions is low in these regions (compare Fig. 3).

Error Metrics
Popular error metrics in the optical flow literature (Baker et al., 2007) are the average end point error AEP and the average angular error AAE.The former measures the distance between two flow vectors in 2D.The latter compares the angle between two flow vectors (u, v, 1) in homogeneous 3D space.The errors are averaged over all test points.We also report the normalized root mean squared error NRMS of the end point error, where the normalization is performed w.r.t. the difference between the maximal and minimal flow magnitude in the image.Further we also report quantiles of the AEP, since the metrics are typically dominated by few gross outliers.
We compare the results of the optical flow algorithms with the sparse manual measurements.The results are shown in table 2.
The TV-L1 algorithm performs best in almost all cases, except for the image Unteraar l .Here the algorithm cannot handle the varying snow cover at the top of the glacier.A close-up of that part is shown in the first row of Fig. 3, and also on the right of Fig. 4.
Cost filtering (CF) with NCC performs better in that area, although only for parts where the data function delivers reasonable costs.Using the SAD data cost already produces unacceptable results, although the cost function works well for smaller images with relaxed environmental conditions.In all cases Pyramid-LK performs worst and is not further considered in the evaluation.Without any regularization, the algorithm is not competitive in these complex conditions.In Fig. 4 (left) we visualize the difference of the solution of the TV-L1 algorithm and the expert's solution.Areas where the estimates diverge are marked in red, and where they agree in green.The regions of greatest disagreement are found in spots showing limited visual correspondence in the original images.In Fig. 3 we visualize several of those areas.Both algorithms are misled by the ill-posed data term and can not estimate the flow correctly.In the case of the CF algorithm these regions have limited impact on neighboring areas, however the estimates within the regions are completely wrong.This can be explained by the edge preserving property of the filter.Vectors from regions separated by an image edge have little influence on each other.On the other hand, the TV-L1 algorithm generally seems to handle those parts better, and in most cases still produces reasonable motion estimates.Potentially the influence of erroneous areas could be restricted by the use of an anisotropic smoothing kernel.

Illumination Changes
Due to different environmental conditions the assumption of brightness constancy between successive image acquisitions is violated.
To alleviate the negative influence of illumination changes, sensor noise, or shadows on the data term, three different methods were tested in case of the TV-L1 algorithm.Structure-texture decomposition (STT) (Aujol et al., 2006), has been shown to be a successful pre-processing step for 2D optical flow estimation (Wedel et al., 2008).Images are decomposed into a structural part, corresponding to larger connected regions of the image, and a texture part containing fine scale details.Further it is assumed that using the textural part for the computation of the optical flow is more

Unteraarz
Figure 2: Image similarity at corresponding points according to TV-L1 and to manual flow estimates.NCC scores are overlain onto the images using color-code.Areas in red mark higher similarity with TV-L1, green areas higher scores of the manual measurements.robust against shadow and shading artefacts.A study comparing various filters is given in (Vaudrey et al., 2009).
The second method evaluated is an improved optical flow constraint in which the varying illumination is modeled by an additive term (Eq.7).Additionally the data term is extended to also include gradient images.In table 3 we summarize the experiment.
It is obvious that all methods significantly improve our results.In all other experiments the setup "STT&additive function" is used, which worked best overall.

Qualitative Evaluation
In a further experiment we attempt to evaluate the quality of the estimated flow fields by investigating the similarity of corresponding image regions.We compute the NCC coefficients in a 11×11 and a 5×5 window located at matching positions.Note that NCC is not used in the TV-L1 algorithm, so in that case it constitutes an independent check, whereas the CF method uses NCC, which constitutes a bias of the evaluation.We compute correlation scores at all pixel containing manual estimates.Table 4 shows that on average the score produced by the algorithms is noticeably higher than the score of the expert.Results for the TV-L1 algorithm are consistently between 11% and 28% better than the human expert.The CF algorithm achieves at most 16% improvement, and is 3% worse than the human in one case.We conclude that in terms of visual coherence the automatically produced results are at least at good as the annotated ground truth.
In Fig. 4 we visually compare the matching quality of the TV-L1 algorithm to the human expert.For this purpose we warp patches of size 200×200 using the flow vectors.Ideally the warps should be identical to their counterpart in the reference image.For simple comparison we show the difference image between reference and warped patch.In all cases, warps created by the algorithm appear more similar to the reference.The global deviation is smaller and the error patches appear smoother.
Fig. 2 compares NCC-scores of the TV-L1 flow and the human expert.In regions colored red the score of the algorithm was higher, green regions mark areas with a higher NCC-score for the expert.Clusters of green spots are located in areas affected by changes in environmental conditions, where expert knowledge is needed to overrule the observations.Otherwise red (TV-L1 better) clearly dominates.

CONCLUSIONS AND FUTURE WORK
We have evaluated three different optical flow techniques on the task of estimating the motion of glaciers.Especially results obtained from the TV-L1 based method look promising.Based on an independent image matching metric, and also by visual inspection, the estimates appear to be more accurate than manually measured correspondences.Regions with little visual coherence however can lead to distorted results.A solution could be to simply detect and exclude these areas from the estimation process.Going further, the regularization term could be adapted to the task by adopting a numerical model from glaciology.Finally an extension of the algorithm to the 3D domain could be interesting.

Figure 3 :
Figure 3: Depicted are patches in which the computed flow-fields differ grossly from the manual estimates.Errors occur if the true motion is not visible on the surface (col.1,2).Here the objective to match corresponding image content can distort the flow fields (col.3,4).Flow vectors are color coded with respect to their magnitude (blue=slow flow, red=fast flow).

Table 1 :
Deviations of the control sample set of the manual measurements (unit: pixel=m)Unteraarz Unteraarr Unteraar l

Table 4 :
Mean NCC scores of different algorithms