73 research outputs found
Video Propagation Networks
We propose a technique that propagates information forward through video
data. The method is conceptually simple and can be applied to tasks that
require the propagation of structured information, such as semantic labels,
based on video content. We propose a 'Video Propagation Network' that processes
video frames in an adaptive manner. The model is applied online: it propagates
information forward without the need to access future frames. In particular we
combine two components, a temporal bilateral network for dense and video
adaptive filtering, followed by a spatial network to refine features and
increased flexibility. We present experiments on video object segmentation and
semantic video segmentation and show increased performance comparing to the
best previous task-specific methods, while having favorable runtime.
Additionally we demonstrate our approach on an example regression task of color
propagation in a grayscale video.Comment: Appearing in Computer Vision and Pattern Recognition, 2017 (CVPR'17
Semantic Video CNNs through Representation Warping
In this work, we propose a technique to convert CNN models for semantic
segmentation of static images into CNNs for video data. We describe a warping
method that can be used to augment existing architectures with very little
extra computational cost. This module is called NetWarp and we demonstrate its
use for a range of network architectures. The main design principle is to use
optical flow of adjacent frames for warping internal network representations
across time. A key insight of this work is that fast optical flow methods can
be combined with many different CNN architectures for improved performance and
end-to-end training. Experiments validate that the proposed approach incurs
only little extra computational cost, while improving performance, when video
streams are available. We achieve new state-of-the-art results on the CamVid
and Cityscapes benchmark datasets and show consistent improvements over
different baseline networks. Our code and models will be available at
http://segmentation.is.tue.mpg.deComment: ICCV 201
Learning Sparse High Dimensional Filters: Image Filtering, Dense CRFs and Bilateral Neural Networks
Bilateral filters have wide spread use due to their edge-preserving
properties. The common use case is to manually choose a parametric filter type,
usually a Gaussian filter. In this paper, we will generalize the
parametrization and in particular derive a gradient descent algorithm so the
filter parameters can be learned from data. This derivation allows to learn
high dimensional linear filters that operate in sparsely populated feature
spaces. We build on the permutohedral lattice construction for efficient
filtering. The ability to learn more general forms of high-dimensional filters
can be used in several diverse applications. First, we demonstrate the use in
applications where single filter applications are desired for runtime reasons.
Further, we show how this algorithm can be used to learn the pairwise
potentials in densely connected conditional random fields and apply these to
different image segmentation tasks. Finally, we introduce layers of bilateral
filters in CNNs and propose bilateral neural networks for the use of
high-dimensional sparse data. This view provides new ways to encode model
structure into network architectures. A diverse set of experiments empirically
validates the usage of general forms of filters
Learning Inference Models for Computer Vision
Computer vision can be understood as the ability to perform 'inference' on image data. Breakthroughs in computer vision technology are often marked by advances in inference techniques, as even the model design is often dictated by the complexity of inference in them. This thesis proposes learning based inference schemes and demonstrates applications in computer vision. We propose techniques for inference in both generative and discriminative computer vision models.
Despite their intuitive appeal, the use of generative models in vision is hampered by the difficulty of posterior inference, which is often too complex or too slow to be practical. We propose techniques for improving inference in two widely used techniques: Markov Chain Monte Carlo (MCMC) sampling and message-passing inference. Our inference strategy is to learn separate discriminative models that assist Bayesian inference in a generative model. Experiments on a range of generative vision models show that the proposed techniques accelerate the inference process and/or converge to better solutions.
A main complication in the design of discriminative models is the inclusion of prior knowledge in a principled way. For better inference in discriminative models, we propose techniques that modify the original model itself, as inference is simple evaluation of the model. We concentrate on convolutional neural network (CNN) models and propose a generalization of standard spatial convolutions, which are the basic building blocks of CNN architectures, to bilateral convolutions. First, we generalize the existing use of bilateral filters and then propose new neural network architectures with learnable bilateral filters, which we call `Bilateral Neural Networks'. We show how the bilateral filtering modules can be used for modifying existing CNN architectures for better image
segmentation and propose a neural network approach for temporal information propagation in videos. Experiments demonstrate the potential of the proposed bilateral networks on a wide range of vision tasks and datasets.
In summary, we propose learning based techniques for better inference in several computer vision models ranging from inverse graphics to freely parameterized neural networks. In generative vision models, our inference techniques alleviate some of the crucial hurdles in Bayesian posterior inference, paving new ways for the use of model based machine learning in vision. In discriminative CNN models, the proposed filter generalizations aid in the design of new neural network architectures that can handle sparse high-dimensional data as well as provide a way for incorporating prior knowledge into CNNs
Permutohedral Lattice CNNs
This paper presents a convolutional layer that is able to process sparse
input features. As an example, for image recognition problems this allows an
efficient filtering of signals that do not lie on a dense grid (like pixel
position), but of more general features (such as color values). The presented
algorithm makes use of the permutohedral lattice data structure. The
permutohedral lattice was introduced to efficiently implement a bilateral
filter, a commonly used image processing operation. Its use allows for a
generalization of the convolution type found in current (spatial) convolutional
network architectures
Consensus Message Passing for Layered Graphical Models
Generative models provide a powerful framework for probabilistic reasoning.
However, in many domains their use has been hampered by the practical
difficulties of inference. This is particularly the case in computer vision,
where models of the imaging process tend to be large, loopy and layered. For
this reason bottom-up conditional models have traditionally dominated in such
domains. We find that widely-used, general-purpose message passing inference
algorithms such as Expectation Propagation (EP) and Variational Message Passing
(VMP) fail on the simplest of vision models. With these models in mind, we
introduce a modification to message passing that learns to exploit their
layered structure by passing 'consensus' messages that guide inference towards
good solutions. Experiments on a variety of problems show that the proposed
technique leads to significantly more accurate inference results, not only when
compared to standard EP and VMP, but also when compared to competitive
bottom-up conditional models.Comment: Appearing in Proceedings of the 18th International Conference on
Artificial Intelligence and Statistics (AISTATS) 201
- …