32,737 research outputs found
Learning how to be robust: Deep polynomial regression
Polynomial regression is a recurrent problem with a large number of
applications. In computer vision it often appears in motion analysis. Whatever
the application, standard methods for regression of polynomial models tend to
deliver biased results when the input data is heavily contaminated by outliers.
Moreover, the problem is even harder when outliers have strong structure.
Departing from problem-tailored heuristics for robust estimation of parametric
models, we explore deep convolutional neural networks. Our work aims to find a
generic approach for training deep regression models without the explicit need
of supervised annotation. We bypass the need for a tailored loss function on
the regression parameters by attaching to our model a differentiable hard-wired
decoder corresponding to the polynomial operation at hand. We demonstrate the
value of our findings by comparing with standard robust regression methods.
Furthermore, we demonstrate how to use such models for a real computer vision
problem, i.e., video stabilization. The qualitative and quantitative
experiments show that neural networks are able to learn robustness for general
polynomial regression, with results that well overpass scores of traditional
robust estimation methods.Comment: 18 pages, conferenc
Video Frame Interpolation via Adaptive Separable Convolution
Standard video frame interpolation methods first estimate optical flow
between input frames and then synthesize an intermediate frame guided by
motion. Recent approaches merge these two steps into a single convolution
process by convolving input frames with spatially adaptive kernels that account
for motion and re-sampling simultaneously. These methods require large kernels
to handle large motion, which limits the number of pixels whose kernels can be
estimated at once due to the large memory demand. To address this problem, this
paper formulates frame interpolation as local separable convolution over input
frames using pairs of 1D kernels. Compared to regular 2D kernels, the 1D
kernels require significantly fewer parameters to be estimated. Our method
develops a deep fully convolutional neural network that takes two input frames
and estimates pairs of 1D kernels for all pixels simultaneously. Since our
method is able to estimate kernels and synthesizes the whole video frame at
once, it allows for the incorporation of perceptual loss to train the neural
network to produce visually pleasing frames. This deep neural network is
trained end-to-end using widely available video data without any human
annotation. Both qualitative and quantitative experiments show that our method
provides a practical solution to high-quality video frame interpolation.Comment: ICCV 2017, http://graphics.cs.pdx.edu/project/sepconv
Geodesic Distance Histogram Feature for Video Segmentation
This paper proposes a geodesic-distance-based feature that encodes global
information for improved video segmentation algorithms. The feature is a joint
histogram of intensity and geodesic distances, where the geodesic distances are
computed as the shortest paths between superpixels via their boundaries. We
also incorporate adaptive voting weights and spatial pyramid configurations to
include spatial information into the geodesic histogram feature and show that
this further improves results. The feature is generic and can be used as part
of various algorithms. In experiments, we test the geodesic histogram feature
by incorporating it into two existing video segmentation frameworks. This leads
to significantly better performance in 3D video segmentation benchmarks on two
datasets
Visual motion processing and human tracking behavior
The accurate visual tracking of a moving object is a human fundamental skill
that allows to reduce the relative slip and instability of the object's image
on the retina, thus granting a stable, high-quality vision. In order to
optimize tracking performance across time, a quick estimate of the object's
global motion properties needs to be fed to the oculomotor system and
dynamically updated. Concurrently, performance can be greatly improved in terms
of latency and accuracy by taking into account predictive cues, especially
under variable conditions of visibility and in presence of ambiguous retinal
information. Here, we review several recent studies focusing on the integration
of retinal and extra-retinal information for the control of human smooth
pursuit.By dynamically probing the tracking performance with well established
paradigms in the visual perception and oculomotor literature we provide the
basis to test theoretical hypotheses within the framework of dynamic
probabilistic inference. We will in particular present the applications of
these results in light of state-of-the-art computer vision algorithms
- …