95,800 research outputs found
Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks
We study the problem of synthesizing a number of likely future frames from a
single input image. In contrast to traditional methods, which have tackled this
problem in a deterministic or non-parametric way, we propose a novel approach
that models future frames in a probabilistic manner. Our probabilistic model
makes it possible for us to sample and synthesize many possible future frames
from a single input image. Future frame synthesis is challenging, as it
involves low- and high-level image and motion understanding. We propose a novel
network structure, namely a Cross Convolutional Network to aid in synthesizing
future frames; this network structure encodes image and motion information as
feature maps and convolutional kernels, respectively. In experiments, our model
performs well on synthetic data, such as 2D shapes and animated game sprites,
as well as on real-wold videos. We also show that our model can be applied to
tasks such as visual analogy-making, and present an analysis of the learned
network representations.Comment: The first two authors contributed equally to this wor
Video Frame Interpolation via Adaptive Separable Convolution
Standard video frame interpolation methods first estimate optical flow
between input frames and then synthesize an intermediate frame guided by
motion. Recent approaches merge these two steps into a single convolution
process by convolving input frames with spatially adaptive kernels that account
for motion and re-sampling simultaneously. These methods require large kernels
to handle large motion, which limits the number of pixels whose kernels can be
estimated at once due to the large memory demand. To address this problem, this
paper formulates frame interpolation as local separable convolution over input
frames using pairs of 1D kernels. Compared to regular 2D kernels, the 1D
kernels require significantly fewer parameters to be estimated. Our method
develops a deep fully convolutional neural network that takes two input frames
and estimates pairs of 1D kernels for all pixels simultaneously. Since our
method is able to estimate kernels and synthesizes the whole video frame at
once, it allows for the incorporation of perceptual loss to train the neural
network to produce visually pleasing frames. This deep neural network is
trained end-to-end using widely available video data without any human
annotation. Both qualitative and quantitative experiments show that our method
provides a practical solution to high-quality video frame interpolation.Comment: ICCV 2017, http://graphics.cs.pdx.edu/project/sepconv
Visual Dynamics: Stochastic Future Generation via Layered Cross Convolutional Networks
We study the problem of synthesizing a number of likely future frames from a
single input image. In contrast to traditional methods that have tackled this
problem in a deterministic or non-parametric way, we propose to model future
frames in a probabilistic manner. Our probabilistic model makes it possible for
us to sample and synthesize many possible future frames from a single input
image. To synthesize realistic movement of objects, we propose a novel network
structure, namely a Cross Convolutional Network; this network encodes image and
motion information as feature maps and convolutional kernels, respectively. In
experiments, our model performs well on synthetic data, such as 2D shapes and
animated game sprites, and on real-world video frames. We present analyses of
the learned network representations, showing it is implicitly learning a
compact encoding of object appearance and motion. We also demonstrate a few of
its applications, including visual analogy-making and video extrapolation.Comment: Journal preprint of arXiv:1607.02586 (IEEE TPAMI, 2019). The first
two authors contributed equally to this work. Project page:
http://visualdynamics.csail.mit.ed
- …