162 research outputs found
Learning Sparse High Dimensional Filters: Image Filtering, Dense CRFs and Bilateral Neural Networks
Bilateral filters have wide spread use due to their edge-preserving
properties. The common use case is to manually choose a parametric filter type,
usually a Gaussian filter. In this paper, we will generalize the
parametrization and in particular derive a gradient descent algorithm so the
filter parameters can be learned from data. This derivation allows to learn
high dimensional linear filters that operate in sparsely populated feature
spaces. We build on the permutohedral lattice construction for efficient
filtering. The ability to learn more general forms of high-dimensional filters
can be used in several diverse applications. First, we demonstrate the use in
applications where single filter applications are desired for runtime reasons.
Further, we show how this algorithm can be used to learn the pairwise
potentials in densely connected conditional random fields and apply these to
different image segmentation tasks. Finally, we introduce layers of bilateral
filters in CNNs and propose bilateral neural networks for the use of
high-dimensional sparse data. This view provides new ways to encode model
structure into network architectures. A diverse set of experiments empirically
validates the usage of general forms of filters
Video Propagation Networks
We propose a technique that propagates information forward through video
data. The method is conceptually simple and can be applied to tasks that
require the propagation of structured information, such as semantic labels,
based on video content. We propose a 'Video Propagation Network' that processes
video frames in an adaptive manner. The model is applied online: it propagates
information forward without the need to access future frames. In particular we
combine two components, a temporal bilateral network for dense and video
adaptive filtering, followed by a spatial network to refine features and
increased flexibility. We present experiments on video object segmentation and
semantic video segmentation and show increased performance comparing to the
best previous task-specific methods, while having favorable runtime.
Additionally we demonstrate our approach on an example regression task of color
propagation in a grayscale video.Comment: Appearing in Computer Vision and Pattern Recognition, 2017 (CVPR'17
On a fast bilateral filtering formulation using functional rearrangements
We introduce an exact reformulation of a broad class of neighborhood filters,
among which the bilateral filters, in terms of two functional rearrangements:
the decreasing and the relative rearrangements.
Independently of the image spatial dimension (one-dimensional signal, image,
volume of images, etc.), we reformulate these filters as integral operators
defined in a one-dimensional space corresponding to the level sets measures.
We prove the equivalence between the usual pixel-based version and the
rearranged version of the filter. When restricted to the discrete setting, our
reformulation of bilateral filters extends previous results for the so-called
fast bilateral filtering. We, in addition, prove that the solution of the
discrete setting, understood as constant-wise interpolators, converges to the
solution of the continuous setting.
Finally, we numerically illustrate computational aspects concerning quality
approximation and execution time provided by the rearranged formulation.Comment: 29 pages, Journal of Mathematical Imaging and Vision, 2015. arXiv
admin note: substantial text overlap with arXiv:1406.712
Efficient Linear Programming for Dense CRFs
The fully connected conditional random field (CRF) with Gaussian pairwise
potentials has proven popular and effective for multi-class semantic
segmentation. While the energy of a dense CRF can be minimized accurately using
a linear programming (LP) relaxation, the state-of-the-art algorithm is too
slow to be useful in practice. To alleviate this deficiency, we introduce an
efficient LP minimization algorithm for dense CRFs. To this end, we develop a
proximal minimization framework, where the dual of each proximal problem is
optimized via block coordinate descent. We show that each block of variables
can be efficiently optimized. Specifically, for one block, the problem
decomposes into significantly smaller subproblems, each of which is defined
over a single pixel. For the other block, the problem is optimized via
conditional gradient descent. This has two advantages: 1) the conditional
gradient can be computed in a time linear in the number of pixels and labels;
and 2) the optimal step size can be computed analytically. Our experiments on
standard datasets provide compelling evidence that our approach outperforms all
existing baselines including the previous LP based approach for dense CRFs.Comment: 24 pages, 10 figures and 4 table
Learning Task-Specific Generalized Convolutions in the Permutohedral Lattice
Dense prediction tasks typically employ encoder-decoder architectures, but
the prevalent convolutions in the decoder are not image-adaptive and can lead
to boundary artifacts. Different generalized convolution operations have been
introduced to counteract this. We go beyond these by leveraging guidance data
to redefine their inherent notion of proximity. Our proposed network layer
builds on the permutohedral lattice, which performs sparse convolutions in a
high-dimensional space allowing for powerful non-local operations despite small
filters. Multiple features with different characteristics span this
permutohedral space. In contrast to prior work, we learn these features in a
task-specific manner by generalizing the basic permutohedral operations to
learnt feature representations. As the resulting objective is complex, a
carefully designed framework and learning procedure are introduced, yielding
rich feature embeddings in practice. We demonstrate the general applicability
of our approach in different joint upsampling tasks. When adding our network
layer to state-of-the-art networks for optical flow and semantic segmentation,
boundary artifacts are removed and the accuracy is improved.Comment: To appear at GCPR 201
Sample and Filter: Nonparametric Scene Parsing via Efficient Filtering
Scene parsing has attracted a lot of attention in computer vision. While
parametric models have proven effective for this task, they cannot easily
incorporate new training data. By contrast, nonparametric approaches, which
bypass any learning phase and directly transfer the labels from the training
data to the query images, can readily exploit new labeled samples as they
become available. Unfortunately, because of the computational cost of their
label transfer procedures, state-of-the-art nonparametric methods typically
filter out most training images to only keep a few relevant ones to label the
query. As such, these methods throw away many images that still contain
valuable information and generally obtain an unbalanced set of labeled samples.
In this paper, we introduce a nonparametric approach to scene parsing that
follows a sample-and-filter strategy. More specifically, we propose to sample
labeled superpixels according to an image similarity score, which allows us to
obtain a balanced set of samples. We then formulate label transfer as an
efficient filtering procedure, which lets us exploit more labeled samples than
existing techniques. Our experiments evidence the benefits of our approach over
state-of-the-art nonparametric methods on two benchmark datasets.Comment: Please refer to the CVPR-2016 version of this manuscrip
- …