5,828 research outputs found
Visual Dynamics: Stochastic Future Generation via Layered Cross Convolutional Networks
We study the problem of synthesizing a number of likely future frames from a
single input image. In contrast to traditional methods that have tackled this
problem in a deterministic or non-parametric way, we propose to model future
frames in a probabilistic manner. Our probabilistic model makes it possible for
us to sample and synthesize many possible future frames from a single input
image. To synthesize realistic movement of objects, we propose a novel network
structure, namely a Cross Convolutional Network; this network encodes image and
motion information as feature maps and convolutional kernels, respectively. In
experiments, our model performs well on synthetic data, such as 2D shapes and
animated game sprites, and on real-world video frames. We present analyses of
the learned network representations, showing it is implicitly learning a
compact encoding of object appearance and motion. We also demonstrate a few of
its applications, including visual analogy-making and video extrapolation.Comment: Journal preprint of arXiv:1607.02586 (IEEE TPAMI, 2019). The first
two authors contributed equally to this work. Project page:
http://visualdynamics.csail.mit.ed
Kernelized Similarity Learning and Embedding for Dynamic Texture Synthesis
Dynamic texture (DT) exhibits statistical stationarity in the spatial domain
and stochastic repetitiveness in the temporal dimension, indicating that
different frames of DT possess a high similarity correlation that is critical
prior knowledge. However, existing methods cannot effectively learn a promising
synthesis model for high-dimensional DT from a small number of training data.
In this paper, we propose a novel DT synthesis method, which makes full use of
similarity prior knowledge to address this issue. Our method bases on the
proposed kernel similarity embedding, which not only can mitigate the
high-dimensionality and small sample issues, but also has the advantage of
modeling nonlinear feature relationship. Specifically, we first raise two
hypotheses that are essential for DT model to generate new frames using
similarity correlation. Then, we integrate kernel learning and extreme learning
machine into a unified synthesis model to learn kernel similarity embedding for
representing DT. Extensive experiments on DT videos collected from the internet
and two benchmark datasets, i.e., Gatech Graphcut Textures and Dyntex,
demonstrate that the learned kernel similarity embedding can effectively
exhibit the discriminative representation for DT. Accordingly, our method is
capable of preserving the long-term temporal continuity of the synthesized DT
sequences with excellent sustainability and generalization. Meanwhile, it
effectively generates realistic DT videos with fast speed and low computation,
compared with the state-of-the-art methods. The code and more synthesis videos
are available at our project page
https://shiming-chen.github.io/Similarity-page/Similarit.html.Comment: 13 pages, 12 figures, 2 table
Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks
We study the problem of synthesizing a number of likely future frames from a
single input image. In contrast to traditional methods, which have tackled this
problem in a deterministic or non-parametric way, we propose a novel approach
that models future frames in a probabilistic manner. Our probabilistic model
makes it possible for us to sample and synthesize many possible future frames
from a single input image. Future frame synthesis is challenging, as it
involves low- and high-level image and motion understanding. We propose a novel
network structure, namely a Cross Convolutional Network to aid in synthesizing
future frames; this network structure encodes image and motion information as
feature maps and convolutional kernels, respectively. In experiments, our model
performs well on synthetic data, such as 2D shapes and animated game sprites,
as well as on real-wold videos. We also show that our model can be applied to
tasks such as visual analogy-making, and present an analysis of the learned
network representations.Comment: The first two authors contributed equally to this wor
- …