41 research outputs found

    Submodular relaxation for inference in Markov random fields

    Full text link
    In this paper we address the problem of finding the most probable state of a discrete Markov random field (MRF), also known as the MRF energy minimization problem. The task is known to be NP-hard in general and its practical importance motivates numerous approximate algorithms. We propose a submodular relaxation approach (SMR) based on a Lagrangian relaxation of the initial problem. Unlike the dual decomposition approach of Komodakis et al., 2011 SMR does not decompose the graph structure of the initial problem but constructs a submodular energy that is minimized within the Lagrangian relaxation. Our approach is applicable to both pairwise and high-order MRFs and allows to take into account global potentials of certain types. We study theoretical properties of the proposed approach and evaluate it experimentally.Comment: This paper is accepted for publication in IEEE Transactions on Pattern Analysis and Machine Intelligenc

    A Combinatorial Solution to Non-Rigid 3D Shape-to-Image Matching

    Get PDF
    We propose a combinatorial solution for the problem of non-rigidly matching a 3D shape to 3D image data. To this end, we model the shape as a triangular mesh and allow each triangle of this mesh to be rigidly transformed to achieve a suitable matching to the image. By penalising the distance and the relative rotation between neighbouring triangles our matching compromises between image and shape information. In this paper, we resolve two major challenges: Firstly, we address the resulting large and NP-hard combinatorial problem with a suitable graph-theoretic approach. Secondly, we propose an efficient discretisation of the unbounded 6-dimensional Lie group SE(3). To our knowledge this is the first combinatorial formulation for non-rigid 3D shape-to-image matching. In contrast to existing local (gradient descent) optimisation methods, we obtain solutions that do not require a good initialisation and that are within a bound of the optimal solution. We evaluate the proposed method on the two problems of non-rigid 3D shape-to-shape and non-rigid 3D shape-to-image registration and demonstrate that it provides promising results.Comment: 10 pages, 7 figure

    Optimization of Markov Random Fields in Computer Vision

    No full text
    A large variety of computer vision tasks can be formulated using Markov Random Fields (MRF). Except in certain special cases, optimizing an MRF is intractable, due to a large number of variables and complex dependencies between them. In this thesis, we present new algorithms to perform inference in MRFs, that are either more efficient (in terms of running time and/or memory usage) or more effective (in terms of solution quality), than the state-of-the-art methods. First, we introduce a memory efficient max-flow algorithm for multi-label submodular MRFs. In fact, such MRFs have been shown to be optimally solvable using max-flow based on an encoding of the labels proposed by Ishikawa, in which each variable XiX_i is represented by ℓ\ell nodes (where ℓ\ell is the number of labels) arranged in a column. However, this method in general requires 2 ℓ22\,\ell^2 edges for each pair of neighbouring variables. This makes it inapplicable to realistic problems with many variables and labels, due to excessive memory requirement. By contrast, our max-flow algorithm stores 2 ℓ2\,\ell values per variable pair, requiring much less storage. Consequently, our algorithm makes it possible to optimally solve multi-label submodular problems involving large numbers of variables and labels on a standard computer. Next, we present a move-making style algorithm for multi-label MRFs with robust non-convex priors. In particular, our algorithm iteratively approximates the original MRF energy with an appropriately weighted surrogate energy that is easier to minimize. Furthermore, it guarantees that the original energy decreases at each iteration. To this end, we consider the scenario where the weighted surrogate energy is multi-label submodular (i.e., it can be optimally minimized by max-flow), and show that our algorithm then lets us handle of a large variety of non-convex priors. Finally, we consider the fully connected Conditional Random Field (dense CRF) with Gaussian pairwise potentials that has proven popular and effective for multi-class semantic segmentation. While the energy of a dense CRF can be minimized accurately using a Linear Programming (LP) relaxation, the state-of-the-art algorithm is too slow to be useful in practice. To alleviate this deficiency, we introduce an efficient LP minimization algorithm for dense CRFs. To this end, we develop a proximal minimization framework, where the dual of each proximal problem is optimized via block-coordinate descent. We show that each block of variables can be optimized in a time linear in the number of pixels and labels. Consequently, our algorithm enables efficient and effective optimization of dense CRFs with Gaussian pairwise potentials. We evaluated all our algorithms on standard energy minimization datasets consisting of computer vision problems, such as stereo, inpainting and semantic segmentation. The experiments at the end of each chapter provide compelling evidence that all our approaches are either more efficient or more effective than all existing baselines

    Hinge-Loss Markov Random Fields and Probabilistic Soft Logic: A Scalable Approach to Structured Prediction

    Get PDF
    A fundamental challenge in developing impactful artificial intelligence technologies is balancing the ability to model rich, structured domains with the ability to scale to big data. Many important problem areas are both richly structured and large scale, from social and biological networks, to knowledge graphs and the Web, to images, video, and natural language. In this thesis I introduce two new formalisms for modeling structured data, distinguished from previous approaches by their ability to both capture rich structure and scale to big data. The first, hinge-loss Markov random fields (HL-MRFs), is a new kind of probabilistic graphical model that generalizes different approaches to convex inference. I unite three views of inference from the randomized algorithms, probabilistic graphical models, and fuzzy logic communities, showing that all three views lead to the same inference objective. I then derive HL-MRFs by generalizing this unified objective. The second new formalism, probabilistic soft logic (PSL), is a probabilistic programming language that makes HL-MRFs easy to define, refine, and reuse for relational data. PSL uses a syntax based on first-order logic to compactly specify complex models. I next introduce an algorithm for inferring most-probable variable assignments (MAP inference) for HL-MRFs that is extremely scalable, much more so than commercially available software, because it uses message passing to leverage the sparse dependency structures common in inference tasks. I then show how to learn the parameters of HL-MRFs using a number of learning objectives. The learned HL-MRFs are as accurate as traditional, discrete models, but much more scalable. To enable HL-MRFs and PSL to capture even richer dependencies, I then extend learning to support latent variables, i.e., variables without training labels. To overcome the bottleneck of repeated inferences required during learning, I introduce paired-dual learning, which interleaves inference and parameter updates. Paired-dual learning learns accurate models and is also scalable, often completing before traditional methods make even one parameter update. Together, these algorithms enable HL-MRFs and PSL to model rich, structured data at scales not previously possible

    A Projected Gradient Descent Method for CRF Inference allowing End-To-End Training of Arbitrary Pairwise Potentials

    Full text link
    Are we using the right potential functions in the Conditional Random Field models that are popular in the Vision community? Semantic segmentation and other pixel-level labelling tasks have made significant progress recently due to the deep learning paradigm. However, most state-of-the-art structured prediction methods also include a random field model with a hand-crafted Gaussian potential to model spatial priors, label consistencies and feature-based image conditioning. In this paper, we challenge this view by developing a new inference and learning framework which can learn pairwise CRF potentials restricted only by their dependence on the image pixel values and the size of the support. Both standard spatial and high-dimensional bilateral kernels are considered. Our framework is based on the observation that CRF inference can be achieved via projected gradient descent and consequently, can easily be integrated in deep neural networks to allow for end-to-end training. It is empirically demonstrated that such learned potentials can improve segmentation accuracy and that certain label class interactions are indeed better modelled by a non-Gaussian potential. In addition, we compare our inference method to the commonly used mean-field algorithm. Our framework is evaluated on several public benchmarks for semantic segmentation with improved performance compared to previous state-of-the-art CNN+CRF models.Comment: Presented at EMMCVPR 2017 conferenc

    Discrete Optimization in Early Vision - Model Tractability Versus Fidelity

    Get PDF
    Early vision is the process occurring before any semantic interpretation of an image takes place. Motion estimation, object segmentation and detection are all parts of early vision, but recognition is not. Some models in early vision are easy to perform inference with---they are tractable. Others describe the reality well---they have high fidelity. This thesis improves the tractability-fidelity trade-off of the current state of the art by introducing new discrete methods for image segmentation and other problems of early vision. The first part studies pseudo-boolean optimization, both from a theoretical perspective as well as a practical one by introducing new algorithms. The main result is the generalization of the roof duality concept to polynomials of higher degree than two. Another focus is parallelization; discrete optimization methods for multi-core processors, computer clusters, and graphical processing units are presented. Remaining in an image segmentation context, the second part studies parametric problems where a set of model parameters and a segmentation are estimated simultaneously. For a small number of parameters these problems can still be optimally solved. One application is an optimal method for solving the two-phase Mumford-Shah functional. The third part shifts the focus to curvature regularization---where the commonly used length and area penalization is replaced by curvature in two and three dimensions. These problems can be discretized over a mesh and special attention is given to the mesh geometry. Specifically, hexagonal meshes in the plane are compared to square ones and a method for generating adaptive meshes is introduced and evaluated. The framework is then extended to curvature regularization of surfaces. Finally, the thesis is concluded by three applications to early vision problems: cardiac MRI segmentation, image registration, and cell classification

    Optimization for Image Segmentation

    Get PDF
    Image segmentation, i.e., assigning each pixel a discrete label, is an essential task in computer vision with lots of applications. Major techniques for segmentation include for example Markov Random Field (MRF), Kernel Clustering (KC), and nowadays popular Convolutional Neural Networks (CNN). In this work, we focus on optimization for image segmentation. Techniques like MRF, KC, and CNN optimize MRF energies, KC criteria, or CNN losses respectively, and their corresponding optimization is very different. We are interested in the synergy and the complementary benefits of MRF, KC, and CNN for interactive segmentation and semantic segmentation. Our first contribution is pseudo-bound optimization for binary MRF energies that are high-order or non-submodular. Secondly, we propose Kernel Cut, a novel formulation for segmentation, which combines MRF regularization with Kernel Clustering. We show why to combine KC with MRF and how to optimize the joint objective. In the third part, we discuss how deep CNN segmentation can benefit from non-deep (i.e., shallow) methods like MRF and KC. In particular, we propose regularized losses for weakly-supervised CNN segmentation, in which we can integrate MRF energy or KC criteria as part of the losses. Minimization of regularized losses is a principled approach to semi-supervised learning, in general. Our regularized loss method is very simple and allows different kinds of regularization losses for CNN segmentation. We also study the optimization of regularized losses beyond gradient descent. Our regularized losses approach achieves state-of-the-art accuracy in semantic segmentation with near full supervision quality

    A learning framework for higher-order consistency models in multi-class pixel labeling problems

    No full text
    Recently, higher-order Markov random field (MRF) models have been successfully applied to problems in computer vision, especially scene understanding problems. One successful higher-order MRF model for scene understanding is the consistency model [Kohli and Kumar, 2010; Kohli et al., 2009] and earlier work by Ladicky et al. [2009, 2013] which contain higher-order potentials composed of lower linear envelope functions. In semantic image segmentation problems, which seek to identify the pixels of images with pre-defined labels of objects and backgrounds, this model encourages consistent label assignments over segmented regions of images. However, solving this MRF problem exactly is generally NP-hard; instead, efficient approximate inference algorithms are used. Furthermore, the lower linear envelope functions involve a number of parameters to learn. But, the typical cross-validation used for pairwise MRF models is not a practical method for estimating such a large number of parameters. Nevertheless, few works have proposed efficient learning methods to deal with the large number of parameters in these consistency models. In this thesis, we propose a unified inference and learning framework for the consistency model. We investigate various issues and present solutions for inference and learning with this higher-order MRF model as follows. First, we derive two variants of the consistency model for multi-class pixel labeling tasks. Our model defines an energy function scoring any given label assignments over an image. In order to perform Maximum a posteriori (MAP) inference in this model, we minimize the energy function using move-making algorithms in which the higher-order problems are transformed into tractable pairwise problems. Then, we employ a max-margin framework for learning optimal parameters. This learning framework provides a generalized approach for searching the large parameter space. Second, we propose a novel use of the Gaussian mixture model (GMM) for encoding consistency constraints over a large set of pixels. Here, we use various oversegmentation methods to define coherent regions for the consistency potentials. In general, Mean shift (MS) produces locally coherent regions, and GMM provides globally coherent regions, which do not need to be contiguous. Our model exploits both local and global information together and improves the labeling accuracy on real data sets. Accordingly, we use multiple higher-order terms associated with each over-segmentation method. Our learning framework allows us to deal with the large number of parameters involved with multiple higher-order terms. Next, we explore a dual decomposition (DD) method for our multi-class consistency model. The dual decomposition MRF (DD-MRF) is an alternative method for optimizing the energy function. In dual decomposition, a complex MRF problem is decomposed into many easy subproblems and we optimize the relaxed dual problem using a projected subgradient method. At convergence, we expect a global optimum in the dual space because it is a concave maximization problem. To optimize our higher-order DD-MRF exactly, we propose an exact minimization algorithm for solving the higher-order subproblems. Moreover, the minimization algorithm is much more efficient than graph-cuts. The dual decomposition approach also solves the max-margin learning problem by minimizing the dual losses derived from DD-MRF. Here, our minimization algorithm allows us to optimize the DD learning exactly and efficiently, which in most cases finds better parameters than the previous learning approach. Last, we focus on improving labeling accuracies of our higher-order model by combining mid-level features, which we call region features. The region features help customize the general envelope functions for individual segmented regions. By assigning specified weights to the envelope functions, we can choose subsets of highly likely labels for each segmented region. We train multiple classifiers with region features and aggregate them to increase prediction performance of possible labels for each region. Importantly, introducing these region features does not change the previous inference and learning algorithms
    corecore