40 research outputs found
M-Best-Diverse Labelings for Submodular Energies and Beyond
Abstract We consider the problem of finding M best diverse solutions of energy minimization problems for graphical models. Contrary to the sequential method of Batra et al., which greedily finds one solution after another, we infer all M solutions jointly. It was shown recently that such jointly inferred labelings not only have smaller total energy but also qualitatively outperform the sequentially obtained ones. The only obstacle for using this new technique is the complexity of the corresponding inference problem, since it is considerably slower algorithm than the method of Batra et al. In this work we show that the joint inference of M best diverse solutions can be formulated as a submodular energy minimization if the original MAP-inference problem is submodular, hence fast inference techniques can be used. In addition to the theoretical results we provide practical algorithms that outperform the current state-of-the-art and can be used in both submodular and non-submodular case
Advances in Graph-Cut Optimization: Multi-Surface Models, Label Costs, and Hierarchical Costs
Computer vision is full of problems that are elegantly expressed in terms of mathematical optimization, or energy minimization. This is particularly true of low-level inference problems such as cleaning up noisy signals, clustering and classifying data, or estimating 3D points from images. Energies let us state each problem as a clear, precise objective function. Minimizing the correct energy would, hypothetically, yield a good solution to the corresponding problem. Unfortunately, even for low-level problems we are confronted by energies that are computationally hard—often NP-hard—to minimize. As a consequence, a rather large portion of computer vision research is dedicated to proposing better energies and better algorithms for energies. This dissertation presents work along the same line, specifically new energies and algorithms based on graph cuts.
We present three distinct contributions. First we consider biomedical segmentation where the object of interest comprises multiple distinct regions of uncertain shape (e.g. blood vessels, airways, bone tissue). We show that this common yet difficult scenario can be modeled as an energy over multiple interacting surfaces, and can be globally optimized by a single graph cut. Second, we introduce multi-label energies with label costs and provide algorithms to minimize them. We show how label costs are useful for clustering and robust estimation problems in vision. Third, we characterize a class of energies with hierarchical costs and propose a novel hierarchical fusion algorithm with improved approximation guarantees. Hierarchical costs are natural for modeling an array of difficult problems, e.g. segmentation with hierarchical context, simultaneous estimation of motions and homographies, or detecting hierarchies of patterns
Higher-order inference in conditional random fields using submodular functions
Higher-order and dense conditional random fields (CRFs) are expressive graphical
models which have been very successful in low-level computer vision applications
such as semantic segmentation, and stereo matching. These models are able to
capture long-range interactions and higher-order image statistics much better
than pairwise CRFs. This expressive power comes at a price though - inference
problems in these models are computationally very demanding. This is a
particular challenge in computer vision, where fast inference is important and
the problem involves millions of pixels.
In this thesis, we look at how submodular functions can help us designing
efficient inference methods for higher-order and dense CRFs. Submodular
functions are special discrete functions that have important properties from
an optimisation perspective, and are closely related to convex functions. We
use submodularity in a two-fold manner: (a) to design efficient MAP inference
algorithm for a robust higher-order model that generalises the widely-used
truncated convex models, and (b) to glean insights into a recently proposed
variational inference algorithm which give us a principled approach for applying
it efficiently to higher-order and dense CRFs
Complexity of Discrete Energy Minimization Problems
Discrete energy minimization is widely-used in computer vision and machine
learning for problems such as MAP inference in graphical models. The problem,
in general, is notoriously intractable, and finding the global optimal solution
is known to be NP-hard. However, is it possible to approximate this problem
with a reasonable ratio bound on the solution quality in polynomial time? We
show in this paper that the answer is no. Specifically, we show that general
energy minimization, even in the 2-label pairwise case, and planar energy
minimization with three or more labels are exp-APX-complete. This finding rules
out the existence of any approximation algorithm with a sub-exponential
approximation ratio in the input size for these two problems, including
constant factor approximations. Moreover, we collect and review the
computational complexity of several subclass problems and arrange them on a
complexity scale consisting of three major complexity classes -- PO, APX, and
exp-APX, corresponding to problems that are solvable, approximable, and
inapproximable in polynomial time. Problems in the first two complexity classes
can serve as alternative tractable formulations to the inapproximable ones.
This paper can help vision researchers to select an appropriate model for an
application or guide them in designing new algorithms.Comment: ECCV'16 accepte
Exploring Aspects of Image Segmentation: Diversity, Global Reasoning, and Panoptic Formulation
Image segmentation is the task of partitioning an image intomeaningful regions. It is a fundamental part of the visual scene understanding problem with many real-world applications, such as photo-editing, robotics, navigation, autonomous driving and bio-imaging. It has been extensively studied for several decades and has transformed into a set of problems which define meaningfulness of regions differently. The set includes two high-level tasks: semantic segmentation (each region assigned with a semantic label) and instance segmentation (each region representing object instance). Due to their practical importance, both tasks attract a lot of research attention. In this work we explore several aspects of these tasks and propose novel approaches and new paradigms.
While most research efforts are directed at developing models that produce a single best segmentation, we consider the task of producing multiple diverse solutions given a single input image. This allows to hedge against the intrinsic ambiguity of segmentation task. We propose a new global model with multiple solutions for a trained segmentation model. This new model generalizes previously proposed approaches for the task. We present several approximate and exact inference techniques that suit a wide spectrum of possible applications and demonstrate superior performance comparing to previous methods.
Then, we present a new bottom-up paradigm for the instance segmentation task. The new scheme is substantially different from the previous approaches that produce each instance independently. Our approach named InstanceCut reasons globally about the optimal partitioning of an image into instances based on local clues. We use two types of local pixel-level clues extracted by efficient fully convolutional networks: (i) an instance-agnostic semantic segmentation and (ii) instance boundaries. Despite the conceptual simplicity of our approach, it demonstrates promising performance.
Finally, we put forward a novel Panoptic Segmentation task. It unifies semantic and instance segmentation tasks. The proposed task requires generating a coherent scene segmentation that is rich and complete, an important step towards real-world vision systems. While early work in computer vision addressed related image/scene parsing tasks, these are not currently popular, possibly due to lack of appropriate metrics or associated recognition challenges. To address this, we first offer a novel panoptic quality metric that captures performance for all classes (stuff and things) in an interpretable and unified manner. Using this metric, we perform a rigorous study of both human and machine performance for panoptic segmentation on three existing datasets, revealing interesting insights about the task. The aim of our work is to revive the interest of the community in a more unified view of image segmentation