40 research outputs found

    M-Best-Diverse Labelings for Submodular Energies and Beyond

    Get PDF
    Abstract We consider the problem of finding M best diverse solutions of energy minimization problems for graphical models. Contrary to the sequential method of Batra et al., which greedily finds one solution after another, we infer all M solutions jointly. It was shown recently that such jointly inferred labelings not only have smaller total energy but also qualitatively outperform the sequentially obtained ones. The only obstacle for using this new technique is the complexity of the corresponding inference problem, since it is considerably slower algorithm than the method of Batra et al. In this work we show that the joint inference of M best diverse solutions can be formulated as a submodular energy minimization if the original MAP-inference problem is submodular, hence fast inference techniques can be used. In addition to the theoretical results we provide practical algorithms that outperform the current state-of-the-art and can be used in both submodular and non-submodular case

    Advances in Graph-Cut Optimization: Multi-Surface Models, Label Costs, and Hierarchical Costs

    Get PDF
    Computer vision is full of problems that are elegantly expressed in terms of mathematical optimization, or energy minimization. This is particularly true of low-level inference problems such as cleaning up noisy signals, clustering and classifying data, or estimating 3D points from images. Energies let us state each problem as a clear, precise objective function. Minimizing the correct energy would, hypothetically, yield a good solution to the corresponding problem. Unfortunately, even for low-level problems we are confronted by energies that are computationally hard—often NP-hard—to minimize. As a consequence, a rather large portion of computer vision research is dedicated to proposing better energies and better algorithms for energies. This dissertation presents work along the same line, specifically new energies and algorithms based on graph cuts. We present three distinct contributions. First we consider biomedical segmentation where the object of interest comprises multiple distinct regions of uncertain shape (e.g. blood vessels, airways, bone tissue). We show that this common yet difficult scenario can be modeled as an energy over multiple interacting surfaces, and can be globally optimized by a single graph cut. Second, we introduce multi-label energies with label costs and provide algorithms to minimize them. We show how label costs are useful for clustering and robust estimation problems in vision. Third, we characterize a class of energies with hierarchical costs and propose a novel hierarchical fusion algorithm with improved approximation guarantees. Hierarchical costs are natural for modeling an array of difficult problems, e.g. segmentation with hierarchical context, simultaneous estimation of motions and homographies, or detecting hierarchies of patterns

    Higher-order inference in conditional random fields using submodular functions

    Get PDF
    Higher-order and dense conditional random fields (CRFs) are expressive graphical models which have been very successful in low-level computer vision applications such as semantic segmentation, and stereo matching. These models are able to capture long-range interactions and higher-order image statistics much better than pairwise CRFs. This expressive power comes at a price though - inference problems in these models are computationally very demanding. This is a particular challenge in computer vision, where fast inference is important and the problem involves millions of pixels. In this thesis, we look at how submodular functions can help us designing efficient inference methods for higher-order and dense CRFs. Submodular functions are special discrete functions that have important properties from an optimisation perspective, and are closely related to convex functions. We use submodularity in a two-fold manner: (a) to design efficient MAP inference algorithm for a robust higher-order model that generalises the widely-used truncated convex models, and (b) to glean insights into a recently proposed variational inference algorithm which give us a principled approach for applying it efficiently to higher-order and dense CRFs

    Complexity of Discrete Energy Minimization Problems

    Full text link
    Discrete energy minimization is widely-used in computer vision and machine learning for problems such as MAP inference in graphical models. The problem, in general, is notoriously intractable, and finding the global optimal solution is known to be NP-hard. However, is it possible to approximate this problem with a reasonable ratio bound on the solution quality in polynomial time? We show in this paper that the answer is no. Specifically, we show that general energy minimization, even in the 2-label pairwise case, and planar energy minimization with three or more labels are exp-APX-complete. This finding rules out the existence of any approximation algorithm with a sub-exponential approximation ratio in the input size for these two problems, including constant factor approximations. Moreover, we collect and review the computational complexity of several subclass problems and arrange them on a complexity scale consisting of three major complexity classes -- PO, APX, and exp-APX, corresponding to problems that are solvable, approximable, and inapproximable in polynomial time. Problems in the first two complexity classes can serve as alternative tractable formulations to the inapproximable ones. This paper can help vision researchers to select an appropriate model for an application or guide them in designing new algorithms.Comment: ECCV'16 accepte

    Exploring Aspects of Image Segmentation: Diversity, Global Reasoning, and Panoptic Formulation

    Get PDF
    Image segmentation is the task of partitioning an image intomeaningful regions. It is a fundamental part of the visual scene understanding problem with many real-world applications, such as photo-editing, robotics, navigation, autonomous driving and bio-imaging. It has been extensively studied for several decades and has transformed into a set of problems which define meaningfulness of regions differently. The set includes two high-level tasks: semantic segmentation (each region assigned with a semantic label) and instance segmentation (each region representing object instance). Due to their practical importance, both tasks attract a lot of research attention. In this work we explore several aspects of these tasks and propose novel approaches and new paradigms. While most research efforts are directed at developing models that produce a single best segmentation, we consider the task of producing multiple diverse solutions given a single input image. This allows to hedge against the intrinsic ambiguity of segmentation task. We propose a new global model with multiple solutions for a trained segmentation model. This new model generalizes previously proposed approaches for the task. We present several approximate and exact inference techniques that suit a wide spectrum of possible applications and demonstrate superior performance comparing to previous methods. Then, we present a new bottom-up paradigm for the instance segmentation task. The new scheme is substantially different from the previous approaches that produce each instance independently. Our approach named InstanceCut reasons globally about the optimal partitioning of an image into instances based on local clues. We use two types of local pixel-level clues extracted by efficient fully convolutional networks: (i) an instance-agnostic semantic segmentation and (ii) instance boundaries. Despite the conceptual simplicity of our approach, it demonstrates promising performance. Finally, we put forward a novel Panoptic Segmentation task. It unifies semantic and instance segmentation tasks. The proposed task requires generating a coherent scene segmentation that is rich and complete, an important step towards real-world vision systems. While early work in computer vision addressed related image/scene parsing tasks, these are not currently popular, possibly due to lack of appropriate metrics or associated recognition challenges. To address this, we first offer a novel panoptic quality metric that captures performance for all classes (stuff and things) in an interpretable and unified manner. Using this metric, we perform a rigorous study of both human and machine performance for panoptic segmentation on three existing datasets, revealing interesting insights about the task. The aim of our work is to revive the interest of the community in a more unified view of image segmentation
    corecore