7 research outputs found
Higher Order Energies for Image Segmentation
A novel energy minimization method for general higher-order binary energy functions is proposed in this paper. We first relax a discrete higher-order function to a continuous one, and use the Taylor expansion to obtain an approximate lower-order function, which is optimized by the quadratic pseudo-boolean optimization (QPBO) or other discrete optimizers. The minimum solution of this lower-order function is then used as a new local point, where we expand the original higher-order energy function again. Our algorithm does not restrict to any specific form of the higher-order binary function or bring in extra auxiliary variables. For concreteness, we show an application of segmentation with the appearance entropy, which is efficiently solved by our method. Experimental results demonstrate that our method outperforms state-of-the-art methods
Variational Methods for Human Modeling
A large part of computer vision research is devoted to building models
and algorithms aimed at understanding human appearance and behaviour
from images and videos. Ultimately, we want to build automated systems
that are at least as capable as people when it comes to
interpreting humans. Most of the tasks that we want these systems to
solve can be posed as a problem of inference in probabilistic
models. Although probabilistic inference in general is a very hard
problem of its own, there exists a very powerful class of inference
algorithms, variational inference, which allows us to build efficient
solutions for a wide range of problems.
In this thesis, we consider a variety of computer vision problems
targeted at modeling human appearance and behaviour, including
detection, activity recognition, semantic segmentation and facial
geometry modeling. For each of those problems, we develop novel methods
that use variational inference to improve the capabilities
of the existing systems.
First, we introduce a novel method for detecting multiple potentially
occluded people in depth images, which we call DPOM. Unlike many other
approaches, our method does probabilistic reasoning jointly,
and thus allows to propagate knowledge about one part of the image
evidence to reason about the rest. This is particularly
important in crowded scenes involving many people, since it helps to
handle ambiguous situations resulting from severe occlusions. We
demonstrate that our approach outperforms existing methods on multiple
datasets.
Second, we develop a new algorithm for variational inference that
works for a large class of probabilistic models, which includes, among
others, DPOM and some of the state-of-the-art models for semantic
segmentation. We provide a formal proof that our method converges,
and demonstrate experimentally that it brings better performance than
the state-of-the-art on several real-world tasks, which include
semantic segmentation and people detection. Importantly, we show that
parallel variational inference in discrete random fields can be seen
as a special case of proximal gradient descent, which allows us to
benefit from many of the advances in gradient-based optimization.
Third, we propose a unified framework for multi-human scene
understanding which simultaneously solves three tasks: multi-person
detection, individual action recognition and collective activity
recognition. Within our framework, we introduce a novel multi-person
detection scheme, which relies on variational inference and
jointly refines detection hypotheses instead of relying on
suboptimal post-processing. Ultimately, our model takes as an inputs a
frame sequence and produces a comprehensive description of the
scene. Finally, we experimentally demonstrate that our method brings
better performance than the state-of-the-art.
Fourth, we propose a new approach for learning facial geometry with
deep probabilistic models and variational methods. Our model is based
on a variational autoencoder with multiple sets of hidden variables,
which are capturing various levels of deformations, ranging from
global to local, high-frequency ones. We experimentally demonstrate
the power of the model on a variety of fitting tasks. Our model is
completely data-driven and can be learned from a relatively small
number of individuals
Mean-Field methods for Structured Deep-Learning in Computer Vision
In recent years, Machine Learning based Computer Vision techniques made impressive progress. These algorithms proved particularly efficient for image classification or detection of isolated objects. From a probabilistic perspective, these methods can predict marginals, over single or multiple variables, independently, with high accuracy.
However, in many tasks of practical interest, we need to predict jointly several correlated variables.
Practical applications include people detection in crowded scenes, image segmentation, surface reconstruction, 3D pose estimation and others. A large part of the research effort in today's computer-vision community aims at finding task-specific solutions to these problems, while leveraging the power of Deep-Learning based classifiers. In this thesis, we present our journey towards a generic and practical solution based on mean-field (MF) inference.
Mean-field is a Statistical Physics-inspired method which has long been used in Computer-Vision as a variational approximation to posterior distributions over complex Conditional Random Fields. Standard
mean-field optimization is based on coordinate descent
and in many situations can be impractical.
We therefore propose a novel proximal gradient-based
approach to optimizing the variational objective. It
is naturally parallelizable and easy to implement.
We prove its convergence, and then demonstrate that, in
practice, it yields faster convergence and often finds better
optima than more traditional mean-field optimization techniques.
Then, we show that we can replace the fully factorized distribution of mean-field by a weighted mixture of such distributions, that similarly minimizes the KL-Divergence to the true posterior. Our extension of the clamping method proposed in previous works allows us to both produce a more descriptive approximation of the true posterior and, inspired by the diverse MAP paradigms, fit a mixture of mean-field approximations. We demonstrate that this positively impacts real-world algorithms that initially relied on mean-fields.
One of the important properties of the mean-field inference algorithms is that the closed-form updates are fully differentiable operations. This naturally allows to do parameter learning by simply unrolling multiple iterations of the updates, the so-called back-mean-field algorithm. We derive a novel and efficient structured learning method for multi-modal posterior distribution based on the Multi-Modal Mean-Field approximation, which can be seamlessly combined to modern gradient-based learning methods such as CNNs.
Finally, we explore in more details the specific problem of structured learning and prediction for multiple-people detection in crowded scenes. We then present a mean-field based structured deep-learning detection algorithm that provides state of the art results on this dataset
Optimization for Image Segmentation
Image segmentation, i.e., assigning each pixel a discrete label, is an essential task in computer vision with lots of applications. Major techniques for segmentation include for example Markov Random Field (MRF), Kernel Clustering (KC), and nowadays popular Convolutional Neural Networks (CNN). In this work, we focus on optimization for image segmentation. Techniques like MRF, KC, and CNN optimize MRF energies, KC criteria, or CNN losses respectively, and their corresponding optimization is very different. We are interested in the synergy and the complementary benefits of MRF, KC, and CNN for interactive segmentation and semantic segmentation. Our first contribution is pseudo-bound optimization for binary MRF energies that are high-order or non-submodular. Secondly, we propose Kernel Cut, a novel formulation for segmentation, which combines MRF regularization with Kernel Clustering. We show why to combine KC with MRF and how to optimize the joint objective. In the third part, we discuss how deep CNN segmentation can benefit from non-deep (i.e., shallow) methods like MRF and KC. In particular, we propose regularized losses for weakly-supervised CNN segmentation, in which we can integrate MRF energy or KC criteria as part of the losses. Minimization of regularized losses is a principled approach to semi-supervised learning, in general. Our regularized loss method is very simple and allows different kinds of regularization losses for CNN segmentation. We also study the optimization of regularized losses beyond gradient descent. Our regularized losses approach achieves state-of-the-art accuracy in semantic segmentation with near full supervision quality