Combine Markov Random Fields and Marked Point Processes to Extract Building from Remotely Sensed Images

Automatic building extraction from remotely sensed images is a research topic much more significant than ever. One of the key issues is object and image representation. Markov random fields usually referring to the pixel level can not represent high-level knowledge well. On the contrary, marked point processes can not represent low-level information well even though they are a powerful model at object level. We propose to combine Markov random fields and marked point processes to represent both low-level information and high-level knowledge, and present a combined framework of modelling and estimation for building extraction from single remotely sensed image. At high level, rectangles are used to represent buildings, and a marked point process is constructed to represent the buildings on ground scene. Interactions between buildings are introduced into the the model to represent their relationships. At the low level, a MRF is used to represent the statistics of the image appearance. Histograms of colours are adopted to represent the building's appearance. The high-level model and the low-level model are combined by establishing correspondences between marked points and nodes of the MRF. We adopt reversible jump Markov Chain Monte Carlo (RJMCMC) techniques to explore the configuration space at the high level, and adopt a Graph Cut algorithm to optimize configuration at the low level. We propose a top-down schema to use results from high level to guide the optimization at low level, and propose a bottom-up schema to use results from low level to drive the sampling at high level. Experimental results demonstrate that better results can be achieved by adopting such hybrid representation.


INTRODUCTION
With the progresses in image capture techniques, more and more remotely sensed images with high spatial, spectral, temporal and radiometric resolutions are available.With the popularity of tools such as Google Map, more and more up-to-date geoinformation are demanded by people.As a traditional research topic in photogrammetry and remote sensing, building extraction from remotely sensed images is a topic much more significant than ever.
In spite of the research efforts of the past decades fully automatic extraction is still a challenging task.The key issue is representation of objects and images (Mayer, 1999, Sowmya and Trinder, 2000, Baltsavias, 2004).Statistical approaches provide a strong framework of modelling and estimation.Markov random fields and marked point processes represent context-dependent entities well (Winkler, 1995, Li, 2009, Baddeley and van Lieshout, 1993).Based on Markov random fields (MRF), low-level information referring to the single image pixels and interaction between neighbouring pixels are represented concisely.However, high-level knowledge, such as free semantic structures and variable topology, can not be represented by MRFs conveniently.Based on spatial point process, high-level knowledge can be introduced via marks attached to the points and the relationships between neighbouring points.While specific shapes can be represented by geometric marks, general shape can not be determined based on image content.This problem results from the weakness of representing low-level information.
Motivated by the complementary characteristics of Markov random fields and marked point processes, we combine them to represent both low-level information and high-level knowledge.
Based on this representation, we propose an automatic approach for extracting buildings from single remotely sensed image.

Markov random fields based representation
Markov random fields provide a natural representation of contextdependent entities (Besag, 1974, Geman andGeman, 1984).A set of sites are used to represent pixels or primitives, and a set of labels attached to each site are used to denote events that may happen at the site.Furthermore, a neighbourhood system is used to describe the interrelationships between sites.Benefiting from the equivalence between MRF's and Gibbs distribution (Hammersley and Clifford, 1971), MRF can be described by local characteristics.Moreover, MRF-MAP estimation can be rigorously achieved in case of binary classification by Graph Cut algorithm (Boykov et al., 2001) and approximately e. g. by belief propagation (Felzenszwalb and Huttenlocher, 2004).
Although there are some high-level MRF models (Li, 1994), their structures are fixed in modelling and estimation.They are not flexible enough to represent random structures.Successes are demonstrated in low level, especially for regular lattices corresponding to image grids.For example, image restoration (Felzenszwalb and Huttenlocher, 2004), stereo matching (Tappen and Freeman, 2003), image segmentation (Rother et al., 2004) or clustering (Zabih and Kolmogorov, 2004) are formulated as pixel labelling problems with labels denoting pixel intensity, disparity or object category respectively.
ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume I-3, 2012 XXII ISPRS Congress, 25 August -01 September 2012, Melbourne, Australia Researchers introduced strong priori information into MRFs to improve its performance of representation.(Kumar et al., 2010) combined pictorial structures and MRFs and proposed an object category specific MRFs model for detecting and segmenting instances of a particular object category.The object-specific shape prior is represented using pictorial structures, and it relies on a large library of exemplars.(Winn and Shotton, 2006) adopted a part labelling which densely covers the object and proposed a Layout Consistent Random Field (LayoutCRF) model to impose asymmetric local spatial constraints on these labels to ensure the consistent layout of parts.It can detect and segment partially occluded objects of a known category.They introduced shape priori for objects and layout of object parts.Therefore, single object or a small number of objects can be segmented cleanly.
Since the number of objects presented in remotely sensed image simultaneously is large, object shapes and scene topology are too complex to be modelled using above mentioned approaches.On the other hand, it is important to utilize such information to segment object precisely.

Marked point processes based representation
Marked point processes provide a useful representation of spatially distributed objects.A set of points randomly distributed are used to represent objects.The number of points, their positions, and their interactions are random.Furthermore, marks are attached to each point to represent high-level knowledge such as category or geometric shape.Marked point processes are flexible enough to model the scene at the object level.Given a proposed model, reversible jump Markov Chain Monte-Carlo (Green, 1995) can be adopted to explore the configuration space and annealing schema can be adopted to simulate the objective distribution to find the optimal solution.
The Ariana Research Group CNRS/INRIA/UNSA introduced marked point process into remotely sensed image analysis.They demonstrated that marked point processes have a great potential in object extraction from remotely sensed image.(Ortner et al., 2007, Ortner et al., 2008) adopted rectangle to represent building footprint, and proposed an approach for building footprint extraction from DSM. (Lafarge et al., 2008) adopted a 3D model to represented buildings, and proposed an approach for building reconstruction from DSM based on a library of 3D models.(Tournaire et al., 2010) adopted above framework and formulated the energy in an efficient way, easy to parameterization and fast to compute.
The devised marks, however, can only represent specific shapes.Due to computational limitation, it is impossible to adopt a huge number of marks to represent general shapes.And, general shapes can not be determined based on image content adaptively.Moreover, images are linked with the model via a data term computed using hypothesis testing schema, which can not make full use of low-level information.

BAYESIAN FRAMEWORK FOR BUILDING EXTRACTION
As a whole, we represent buildings as foreground X and the rest as background X, and formulate building extraction as foreground/background segmentation in a Bayesian manner.
Given the observed image I, the buildings X can be estimated by maximizing the posterior: where, Θ is a set of models and parameters, P (X, X|Θ) is the priori probability of a specific configuration X, X conditioned on Θ, and P (I|X, X; Θ) is the likelihood of observing an image I given the configuration X, X conditioned on Θ.
We will address issues of modelling and estimation in section 4 and section 5 respectively.

Hybrid representation
We combine marked point processes and Markov random fields and propose a hybrid representation for building extraction.As illustrated in Fig. 1, marked point process is adopted to represent the high-level knowledge, i.e. the buildings and their distribution; Markov random field is adopted to represent the low-level information, i.e. the properties of all pixels.

High-level model
At high level, only buildings are represented explicitly, the rests are represented implicitly.The buildings are modelled as a marked point process.

Marked point process
A rigorous definition of spatial point process involves measure theory which is difficult for the readers who have not studied the subject.Instead, we present only a simple description in this section.
Let S ⊂ R 2 be a compact set, and Ωn be the set of configurations X = {x1, . . ., xn} that consist of n unordered points of S, the probability density of a specific configuration X is defined as follows: (3) where, α is the normalizing constant, n(X) is the number of points, β is a positive constant, and φ (k) (xi 1 , . . ., xi k ) reflects the interactions among k-tuplets neighbouring points.
By attaching a geometric mark mi = (li, wi, diri) to each point xi ∈ X, we can augment a spatial point process to be a marked point process, where, the triplet denotes the length, width and main direction of a rectangle.The marked point is denoted as xi.All buildings are represented by a rectangle-marked point process.
Return to Bayesian formulation in section 3, h(X) is computed from a prior term and a data term: where, the prior term hp(X) measures the priori probability of different scenes, the data term h d (X) measures the coherence between the scene and the image, which is identified with the likelihood term.hp(X) and h d (X) correspond to P (X, X|Θ) and P (I|X, X; Θ) in Eq.( 2) respectively.

Priori
Most existing approaches rely on very specific priori (Ortner et al., 2007, Lafarge et al., 2008).(Lafarge et al., 2010, Benedek et al., 2010) adopted the relation overlap as a general interaction and developed a concise priori model.
In this paper, we argue that buildings can not overlap with each other and use this condition as a hard constraint.It is realized by defining the following density: where, θ0 = −∞ prevents overlapped rectangles to appear in configuration, o(X) is the number of pairs of overlapped rectangles: where, To reflect the sparse distribution of buildings, small distances between neighbouring buildings are penalized.We augment above model by defining the density of valid configuration as unconditional Strauss process (Strauss, 1975): where θ1 and θ2 are two parameters of this model, n(X) is the number of points in X, p(X, r) is the number of pairs of points that are nearer than r: p(X, r) = where, d(xi, xj) is the distance between xi and xj.
To reflect parallel of buildings, large difference of directions between neighbouring buildings are penalized.We augment above model by adding a direction term: where, q(X, r) is sum of square of direction differences between neighbouring buildings no far than r: q(X, r) = where, α( xi, xj) is the difference of the directions of xi and xj: where, dir( xi) is the main direction of the rectangle attaching to xi.Furthermore, α( xi, xj) is compared with 0, π/2, • • • , 2π and the minimal difference is adopted as the result.This measure can reflect both parallel and orthogonal of buildings.
Finally, we combine above models and get a full distribution as follows: hp(X) ∝ exp(θ0o(X) + θ1n(X) + θ2p(X, r) + θ3q(X, r)) where the first term assures that neighbouring buildings do not overlap with each other, the second and third terms assure that all building distribute sparsely, the last term assures that neighbouring buildings align with each other.θ0 ∈ Θ, θ1 ∈ Θ, θ2 ∈ Θ, θ3 ∈ Θ, r ∈ Θ are all parameters of the model.

Data term
The data term measures the coherence between the scene and the image, i.e. the likelihood of presenting an image given the scene: Since the low-level model links both high-level model and image, it serves as an intermediate model between marked point process and image, the data term is calculated based on low-level model.Details will be presented in section 4.4.2.

Low-level model
At the low level, both foreground and background are modelled explicitly and together as a Markov Random Field.

Markov random field
Since foreground and background can be denoted by two labels, we model them as an Ising model (Li, 2009).The probability of presenting a specific configuration f is computed as follows: where, Z is a normalizing constant called the partition function, which is common to all configurations and can be ignored in computation.Up(f ) and U d (f ) correspond to P (X, X|Θ) and P (I|X, X; Θ) in Eq.( 2) respectively.

Priori
The priori energy Up(f ) for Ising model is calculated as follows: where, i and j are horizontally or vertically neighbouring pixels, all neighbouring pixels with different labels contribute to the total energy.
Above priori term is based on the assumption that the random field varies smoothly everywhere.Every pair of pixels with different labels will increase the priori energy and decrease the probability.In fact, neighbouring pixels in foreground or background regions should have the same labels.neighbouring pixels across the region boundaries should have different labels.In other words, label field should be allowed to change at region boundaries without increasing the priori energy.
To reflect such priori knowledge, neighbouring pixels across regions should contribute nothing to the priori energy, while as neighbouring pixels within regions should contribute to the priori energy as traditional way.We augment above priori term to be: where, if i and j across foreground and background regions, βi,j should be 0; otherwise, it should be 1.
Such discontinuity preserving constraint is more reasonable than the simplest constraint making configuration varies smoothly everywhere.However, It is difficult to express the discontinuity preserving constraints because nothing is known in advance about the regions and their boundaries.In the proposed hybrid representation, the high-level model provides a guess of such knowledge, it can be utilized to calculate βi,j.Details will be presented in section 4.4.1.

Data term
The data term U d (f ) corresponds to the likelihood is defined as follows: where, the energy U (Ii|f ) is calculated as follows: where, H b ∈ Θ and H f ∈ Θ, are two normalized histograms for background and foreground respectively.P (Ii|H b ) and P (Ii|H f ) measures the likelihood of the pixel with colour Ii belonging to background and foreground respectively.

Linking high-level and low-level models
Each marked point at the high level denotes one building, and it corresponds to one rectangular region in the Markov random field at the low level.The high-level model and the low-level model are combined together by establishing correspondences of marked points at the high level and regions (each one consists of a set of pixels) at the low level.High-level knowledge is introduced as a priori term in the MRF and low-level information is introduced into data term in the marked point process.In this way, a flexible and robust representation is achieved.

Priori
As pointed out in section 4.3.2,high-level knowledge can be utilized to construct a discontinuity preserving a priori term for the low-level model.
Suppose that there are a set of marked points at the high level, we can get a set of rectangular regions at low level by projecting the marked points on to the grid of Markov random field.Each pixel i has one label fi which denotes its class, i. e. foreground or background.Without loss of generality, suppose that i and j are neighbouring pixels.The discontinuity preserving priori term is constructed by defining βi,j as follows: i. e. the label configurations of the Markov random field are not evaluated at the borders induced by the marked point process.

Data term
Suppose that there is a set of marked points at the high level.We can obtain a set of rectangular regions at the low level by projecting the marked points on to the grid of Markov random field.Each pixel i has one label fi which denotes its class, i. e. foreground or background.Using Eq.( 19), we can calculate the data term for the high level model.
More specifically, summing the data terms over one rectangular region, we get the data term corresponding to one marked point at the high level.

OPTIMIZATION
We adopt simulated annealing to simulate the posterior distribution so that an optimal configuration can be achieved as the temperature gradually approaches zero.It iteratively simulates the distribution h 1 T (X) with T gradually decreasing to 0.
Furthermore, we use reversible jump Markov random Monte Carlo (rjMCMC) techniques to explore the configuration space at the high level, an uniform birth and death kernel and a translation kernel are developed to generate new states according the current state.The former randomly generates a point together with a rectangle (length, width and direction) in the image region, while the latter randomly selects an existing rectangle and adjusts its parameters randomly.The new state is adopted with an accept rate to keep the detailed balance.
What distinguishes our approach from existing ones is that a topdown schema and a bottom-up schema are proposed for random sampling.In each sample, it first generates a new state at high level, then uses it to guide the optimization at low level, then uses the optimization results to adjust state at high level.

Top down schema
Since there are a large number of buildings presented in remotely sensed image simultaneously, previous MRF based approaches need seed pixels provided manually.Otherwise, neighbouring buildings can not be distinguished well without knowledge about their spatial distribution.We, however, use the results at the high level to provide the information about the spatial distribution and approximate shapes of buildings.
Low-level optimization is conducted only when a new marked point is birthed.Given a new states, i. e. a new rectangle-marked point.we use it to guide the optimization at low level: 1. Project the rectangle into image to get projected regions; 2. Construct discontinuity preserving priori term for low-level model; 3. Adopt Graph Cut algorithm (Greig et al., 1989) to optimize the proposed object function with discontinuity preserving priori term, the optimization is conducted in the projected regions and it results in some building regions.

Bottom up schema
Motivated by the data driven MCMC (Tu and Zhu, 2002), we use results at low level to drive the sampling, i. e. compute new state according results of optimization at low level.In their data driven MCMC, the low-level results are computed by edge detection or segmentation and are fixed in the process of MCMC sampling.In our approach, the low-level results are computed from low level optimization and vary in the process of MCMC sampling.
Given the optimization results achieved at low level, i. e. some building regions at low level, we use them to adjust the new state at high level: 1. Select the largest one and find a minimal rectangle enclosing the selected region; 2. Adjust the new marked point to be the rectangular region found above; 3. Calculate the data term based on the optimization result instead of the original marked point; 4. Calculate the accept rate using the data term calculated above; 5. Move the current state to the (adjusted) new state with the accept rate.Benefited from optimization at low level, more precise building region can be found in each random sampling, this may improve the definition of both shape and data term.As a result, this schema may drive rjMCMC sampling to achieve better results.Theoretical foundation is guaranteed by simulated annealing and rjMCMC technique, which is our computation framework.

Results and comparison
We apply our approach to extract buildings from three satellite images of developed urban or suburban areas.For comparison, we also apply marked point processes based approach and Markov random fields based approach to extract buildings.
The original images, manually annotated reference images, and extraction results are presented in Fig.
(2).As illustrated, there are some clear errors in the third row achieved by marked point processes based approach.Since the data term for a rectangle is calculated as a whole but not pixelwise, some regions that contain buildings may be recognized as a building region by mistake.Theoretically, infinite sampling can remove such cases, however, it can not be achieved in practice.
As illustrated, there are many buildings missed in the fourth row achieved by Markov random fields based approach.Since the priori term (smooth term) drive the results as smooth as possible, some regions with low density of being buildings are segmented as background by mistake.On the contrary, some regions between neighboring buildings are segmented as foreground.
As illustrated, the last row achieved by our hybrid representation is better than above rows.Benefited from the hybrid representation, both high-level knowledge and low-level information are well-represented and utilized in the process of building extraction.The point distribution at high level provides a topology structure of the scene.Based on the topology, the optimization at the low level is expected to achieve robust results.The detailed data terms computed at low level improve foreground and background distinguishing at high level.
We also recorded the number of building pixels extracted as building pixels, building pixels extracted as non-building pixels, and non-building pixels extracted as building pixels.They are divided by the total number of building pixels and presented in Tab.
(1, 2 and 3) respectively.As illustrated, the results achieved by hybrid representation are much better than those of Markov random fields based approach, they are also better than those of marked point processes based approach, especially indicated by False Negative, which means the number of non-building pixels extracted as building pixels.To our knowledge, it is the first work on the combination of marked point process and Markov random fields.Therefore, there are many issues to be investigated in the near future.First, the optimization schema can be improved greatly since the interactions between high-level model and low-level model are not fully utilized.Second, much more information from the image data need to be explored to improve the extraction quality since histograms of colours do not fully represent information contained in the images, this can be seen in the density images calculated using histograms.
Figure 2: Extraction results: the first row illustrates the original images, the second row illustrates the reference extraction results, which are annotated manually, the rest rows illustrate extraction results based on Marked point processes, Markov random fields, and hybrid representation.

Table 1 :
Quantitative evaluation on first image

Table 2 :
Quantitative evaluation on second image True False Positive False NegativeThis paper presents a hybrid representation for buildings in remotely sensed image and an approach for building extraction from single remotely sensed image.First, it formulates building extraction in a Bayesian framework.Then, it addresses modelling issue and optimization issue respectively.Buildings are modelled at two levels.At the high level, marked point processes are used to represent such high-level knowledge as topology structure of a scene.At the low level, a Markov random field is used to represent pixel colour and interaction.After establishing a link between high-level model and low-level model, it proposes a topdown schema and a bottom-up schema optimizing an objective function.Benefited from the hybrid representation and optimization schema, good extraction results are achieved as demonstrated by experiments presented in this paper.