140 research outputs found
Kernelized Similarity Learning and Embedding for Dynamic Texture Synthesis
Dynamic texture (DT) exhibits statistical stationarity in the spatial domain
and stochastic repetitiveness in the temporal dimension, indicating that
different frames of DT possess a high similarity correlation that is critical
prior knowledge. However, existing methods cannot effectively learn a promising
synthesis model for high-dimensional DT from a small number of training data.
In this paper, we propose a novel DT synthesis method, which makes full use of
similarity prior knowledge to address this issue. Our method bases on the
proposed kernel similarity embedding, which not only can mitigate the
high-dimensionality and small sample issues, but also has the advantage of
modeling nonlinear feature relationship. Specifically, we first raise two
hypotheses that are essential for DT model to generate new frames using
similarity correlation. Then, we integrate kernel learning and extreme learning
machine into a unified synthesis model to learn kernel similarity embedding for
representing DT. Extensive experiments on DT videos collected from the internet
and two benchmark datasets, i.e., Gatech Graphcut Textures and Dyntex,
demonstrate that the learned kernel similarity embedding can effectively
exhibit the discriminative representation for DT. Accordingly, our method is
capable of preserving the long-term temporal continuity of the synthesized DT
sequences with excellent sustainability and generalization. Meanwhile, it
effectively generates realistic DT videos with fast speed and low computation,
compared with the state-of-the-art methods. The code and more synthesis videos
are available at our project page
https://shiming-chen.github.io/Similarity-page/Similarit.html.Comment: 13 pages, 12 figures, 2 table
Two-Stream Convolutional Networks for Dynamic Texture Synthesis
This thesis introduces a two-stream model for dynamic texture synthesis. The model is based on pre-trained convolutional networks (ConvNets) that target two independent tasks: (i) object recognition, and (ii) optical flow regression. Given an input dynamic texture, statistics of filter responses from the object recognition and optical flow ConvNets encapsulate the per-frame appearance and dynamics of the input texture, respectively. To synthesize a dynamic texture, a randomly initialized input sequence is optimized to match the feature statistics from each stream of an example texture. In addition, the synthesis approach is applied to combine the texture appearance from one texture with the dynamics of another to generate entirely novel dynamic textures. Overall, the proposed approach generates high quality samples that match both the framewise appearance and temporal evolution of input texture. Finally, a quantitative evaluation of the proposed dynamic texture synthesis approach is performed via a large-scale user study
Dynamic Variational Autoencoders for Visual Process Modeling
This work studies the problem of modeling visual processes by leveraging deep
generative architectures for learning linear, Gaussian representations from
observed sequences. We propose a joint learning framework, combining a vector
autoregressive model and Variational Autoencoders. This results in an
architecture that allows Variational Autoencoders to simultaneously learn a
non-linear observation as well as a linear state model from sequences of
frames. We validate our approach on artificial sequences and dynamic textures
Learning Generative ConvNets via Multi-grid Modeling and Sampling
This paper proposes a multi-grid method for learning energy-based generative
ConvNet models of images. For each grid, we learn an energy-based probabilistic
model where the energy function is defined by a bottom-up convolutional neural
network (ConvNet or CNN). Learning such a model requires generating synthesized
examples from the model. Within each iteration of our learning algorithm, for
each observed training image, we generate synthesized images at multiple grids
by initializing the finite-step MCMC sampling from a minimal 1 x 1 version of
the training image. The synthesized image at each subsequent grid is obtained
by a finite-step MCMC initialized from the synthesized image generated at the
previous coarser grid. After obtaining the synthesized examples, the parameters
of the models at multiple grids are updated separately and simultaneously based
on the differences between synthesized and observed examples. We show that this
multi-grid method can learn realistic energy-based generative ConvNet models,
and it outperforms the original contrastive divergence (CD) and persistent CD.Comment: CVPR 201
Motion-Based Generator Model: Unsupervised Disentanglement of Appearance, Trackable and Intrackable Motions in Dynamic Patterns
Dynamic patterns are characterized by complex spatial and motion patterns.
Understanding dynamic patterns requires a disentangled representational model
that separates the factorial components. A commonly used model for dynamic
patterns is the state space model, where the state evolves over time according
to a transition model and the state generates the observed image frames
according to an emission model. To model the motions explicitly, it is natural
for the model to be based on the motions or the displacement fields of the
pixels. Thus in the emission model, we let the hidden state generate the
displacement field, which warps the trackable component in the previous image
frame to generate the next frame while adding a simultaneously emitted residual
image to account for the change that cannot be explained by the deformation.
The warping of the previous image is about the trackable part of the change of
image frame, while the residual image is about the intrackable part of the
image. We use a maximum likelihood algorithm to learn the model that iterates
between inferring latent noise vectors that drive the transition model and
updating the parameters given the inferred latent vectors. Meanwhile we adopt a
regularization term to penalize the norms of the residual images to encourage
the model to explain the change of image frames by trackable motion. Unlike
existing methods on dynamic patterns, we learn our model in unsupervised
setting without ground truth displacement fields. In addition, our model
defines a notion of intrackability by the separation of warped component and
residual component in each image frame. We show that our method can synthesize
realistic dynamic pattern, and disentangling appearance, trackable and
intrackable motions. The learned models are useful for motion transfer, and it
is natural to adopt it to define and measure intrackability of a dynamic
pattern
- …