545 research outputs found
A Projected Gradient Descent Method for CRF Inference allowing End-To-End Training of Arbitrary Pairwise Potentials
Are we using the right potential functions in the Conditional Random Field
models that are popular in the Vision community? Semantic segmentation and
other pixel-level labelling tasks have made significant progress recently due
to the deep learning paradigm. However, most state-of-the-art structured
prediction methods also include a random field model with a hand-crafted
Gaussian potential to model spatial priors, label consistencies and
feature-based image conditioning.
In this paper, we challenge this view by developing a new inference and
learning framework which can learn pairwise CRF potentials restricted only by
their dependence on the image pixel values and the size of the support. Both
standard spatial and high-dimensional bilateral kernels are considered. Our
framework is based on the observation that CRF inference can be achieved via
projected gradient descent and consequently, can easily be integrated in deep
neural networks to allow for end-to-end training. It is empirically
demonstrated that such learned potentials can improve segmentation accuracy and
that certain label class interactions are indeed better modelled by a
non-Gaussian potential. In addition, we compare our inference method to the
commonly used mean-field algorithm. Our framework is evaluated on several
public benchmarks for semantic segmentation with improved performance compared
to previous state-of-the-art CNN+CRF models.Comment: Presented at EMMCVPR 2017 conferenc
Efficient Relaxations for Dense CRFs with Sparse Higher Order Potentials
Dense conditional random fields (CRFs) have become a popular framework for
modelling several problems in computer vision such as stereo correspondence and
multi-class semantic segmentation. By modelling long-range interactions, dense
CRFs provide a labelling that captures finer detail than their sparse
counterparts. Currently, the state-of-the-art algorithm performs mean-field
inference using a filter-based method but fails to provide a strong theoretical
guarantee on the quality of the solution. A question naturally arises as to
whether it is possible to obtain a maximum a posteriori (MAP) estimate of a
dense CRF using a principled method. Within this paper, we show that this is
indeed possible. We will show that, by using a filter-based method, continuous
relaxations of the MAP problem can be optimised efficiently using
state-of-the-art algorithms. Specifically, we will solve a quadratic
programming (QP) relaxation using the Frank-Wolfe algorithm and a linear
programming (LP) relaxation by developing a proximal minimisation framework. By
exploiting labelling consistency in the higher-order potentials and utilising
the filter-based method, we are able to formulate the above algorithms such
that each iteration has a complexity linear in the number of classes and random
variables. The presented algorithms can be applied to any labelling problem
using a dense CRF with sparse higher-order potentials. In this paper, we use
semantic segmentation as an example application as it demonstrates the ability
of the algorithm to scale to dense CRFs with large dimensions. We perform
experiments on the Pascal dataset to indicate that the presented algorithms are
able to attain lower energies than the mean-field inference method
Blending Learning and Inference in Structured Prediction
In this paper we derive an efficient algorithm to learn the parameters of
structured predictors in general graphical models. This algorithm blends the
learning and inference tasks, which results in a significant speedup over
traditional approaches, such as conditional random fields and structured
support vector machines. For this purpose we utilize the structures of the
predictors to describe a low dimensional structured prediction task which
encourages local consistencies within the different structures while learning
the parameters of the model. Convexity of the learning task provides the means
to enforce the consistencies between the different parts. The
inference-learning blending algorithm that we propose is guaranteed to converge
to the optimum of the low dimensional primal and dual programs. Unlike many of
the existing approaches, the inference-learning blending allows us to learn
efficiently high-order graphical models, over regions of any size, and very
large number of parameters. We demonstrate the effectiveness of our approach,
while presenting state-of-the-art results in stereo estimation, semantic
segmentation, shape reconstruction, and indoor scene understanding
Efficient SDP Inference for Fully-connected CRFs Based on Low-rank Decomposition
Conditional Random Fields (CRF) have been widely used in a variety of
computer vision tasks. Conventional CRFs typically define edges on neighboring
image pixels, resulting in a sparse graph such that efficient inference can be
performed. However, these CRFs fail to model long-range contextual
relationships. Fully-connected CRFs have thus been proposed. While there are
efficient approximate inference methods for such CRFs, usually they are
sensitive to initialization and make strong assumptions. In this work, we
develop an efficient, yet general algorithm for inference on fully-connected
CRFs. The algorithm is based on a scalable SDP algorithm and the low- rank
approximation of the similarity/kernel matrix. The core of the proposed
algorithm is a tailored quasi-Newton method that takes advantage of the
low-rank matrix approximation when solving the specialized SDP dual problem.
Experiments demonstrate that our method can be applied on fully-connected CRFs
that cannot be solved previously, such as pixel-level image co-segmentation.Comment: 15 pages. A conference version of this work appears in Proc. IEEE
Conference on Computer Vision and Pattern Recognition, 201
Deep Occlusion Reasoning for Multi-Camera Multi-Target Detection
People detection in single 2D images has improved greatly in recent years.
However, comparatively little of this progress has percolated into multi-camera
multi-people tracking algorithms, whose performance still degrades severely
when scenes become very crowded. In this work, we introduce a new architecture
that combines Convolutional Neural Nets and Conditional Random Fields to
explicitly model those ambiguities. One of its key ingredients are high-order
CRF terms that model potential occlusions and give our approach its robustness
even when many people are present. Our model is trained end-to-end and we show
that it outperforms several state-of-art algorithms on challenging scenes
Bethe Projections for Non-Local Inference
Many inference problems in structured prediction are naturally solved by
augmenting a tractable dependency structure with complex, non-local auxiliary
objectives. This includes the mean field family of variational inference
algorithms, soft- or hard-constrained inference using Lagrangian relaxation or
linear programming, collective graphical models, and forms of semi-supervised
learning such as posterior regularization. We present a method to
discriminatively learn broad families of inference objectives, capturing
powerful non-local statistics of the latent variables, while maintaining
tractable and provably fast inference using non-Euclidean projected gradient
descent with a distance-generating function given by the Bethe entropy. We
demonstrate the performance and flexibility of our method by (1) extracting
structured citations from research papers by learning soft global constraints,
(2) achieving state-of-the-art results on a widely-used handwriting recognition
task using a novel learned non-convex inference procedure, and (3) providing a
fast and highly scalable algorithm for the challenging problem of inference in
a collective graphical model applied to bird migration.Comment: minor bug fix to appendix. appeared in UAI 201
- …