10 research outputs found
Learning Dynamic Generator Model by Alternating Back-Propagation Through Time
This paper studies the dynamic generator model for spatial-temporal processes
such as dynamic textures and action sequences in video data. In this model,
each time frame of the video sequence is generated by a generator model, which
is a non-linear transformation of a latent state vector, where the non-linear
transformation is parametrized by a top-down neural network. The sequence of
latent state vectors follows a non-linear auto-regressive model, where the
state vector of the next frame is a non-linear transformation of the state
vector of the current frame as well as an independent noise vector that
provides randomness in the transition. The non-linear transformation of this
transition model can be parametrized by a feedforward neural network. We show
that this model can be learned by an alternating back-propagation through time
algorithm that iteratively samples the noise vectors and updates the parameters
in the transition model and the generator model. We show that our training
method can learn realistic models for dynamic textures and action patterns.Comment: 10 page
Dynamic Variational Autoencoders for Visual Process Modeling
This work studies the problem of modeling visual processes by leveraging deep
generative architectures for learning linear, Gaussian representations from
observed sequences. We propose a joint learning framework, combining a vector
autoregressive model and Variational Autoencoders. This results in an
architecture that allows Variational Autoencoders to simultaneously learn a
non-linear observation as well as a linear state model from sequences of
frames. We validate our approach on artificial sequences and dynamic textures
Motion-Based Generator Model: Unsupervised Disentanglement of Appearance, Trackable and Intrackable Motions in Dynamic Patterns
Dynamic patterns are characterized by complex spatial and motion patterns.
Understanding dynamic patterns requires a disentangled representational model
that separates the factorial components. A commonly used model for dynamic
patterns is the state space model, where the state evolves over time according
to a transition model and the state generates the observed image frames
according to an emission model. To model the motions explicitly, it is natural
for the model to be based on the motions or the displacement fields of the
pixels. Thus in the emission model, we let the hidden state generate the
displacement field, which warps the trackable component in the previous image
frame to generate the next frame while adding a simultaneously emitted residual
image to account for the change that cannot be explained by the deformation.
The warping of the previous image is about the trackable part of the change of
image frame, while the residual image is about the intrackable part of the
image. We use a maximum likelihood algorithm to learn the model that iterates
between inferring latent noise vectors that drive the transition model and
updating the parameters given the inferred latent vectors. Meanwhile we adopt a
regularization term to penalize the norms of the residual images to encourage
the model to explain the change of image frames by trackable motion. Unlike
existing methods on dynamic patterns, we learn our model in unsupervised
setting without ground truth displacement fields. In addition, our model
defines a notion of intrackability by the separation of warped component and
residual component in each image frame. We show that our method can synthesize
realistic dynamic pattern, and disentangling appearance, trackable and
intrackable motions. The learned models are useful for motion transfer, and it
is natural to adopt it to define and measure intrackability of a dynamic
pattern
Kernelized Similarity Learning and Embedding for Dynamic Texture Synthesis
Dynamic texture (DT) exhibits statistical stationarity in the spatial domain
and stochastic repetitiveness in the temporal dimension, indicating that
different frames of DT possess a high similarity correlation that is critical
prior knowledge. However, existing methods cannot effectively learn a promising
synthesis model for high-dimensional DT from a small number of training data.
In this paper, we propose a novel DT synthesis method, which makes full use of
similarity prior knowledge to address this issue. Our method bases on the
proposed kernel similarity embedding, which not only can mitigate the
high-dimensionality and small sample issues, but also has the advantage of
modeling nonlinear feature relationship. Specifically, we first raise two
hypotheses that are essential for DT model to generate new frames using
similarity correlation. Then, we integrate kernel learning and extreme learning
machine into a unified synthesis model to learn kernel similarity embedding for
representing DT. Extensive experiments on DT videos collected from the internet
and two benchmark datasets, i.e., Gatech Graphcut Textures and Dyntex,
demonstrate that the learned kernel similarity embedding can effectively
exhibit the discriminative representation for DT. Accordingly, our method is
capable of preserving the long-term temporal continuity of the synthesized DT
sequences with excellent sustainability and generalization. Meanwhile, it
effectively generates realistic DT videos with fast speed and low computation,
compared with the state-of-the-art methods. The code and more synthesis videos
are available at our project page
https://shiming-chen.github.io/Similarity-page/Similarit.html.Comment: 13 pages, 12 figures, 2 table
A Tale of Two Latent Flows: Learning Latent Space Normalizing Flow with Short-run Langevin Flow for Approximate Inference
We study a normalizing flow in the latent space of a top-down generator
model, in which the normalizing flow model plays the role of the informative
prior model of the generator. We propose to jointly learn the latent space
normalizing flow prior model and the top-down generator model by a Markov chain
Monte Carlo (MCMC)-based maximum likelihood algorithm, where a short-run
Langevin sampling from the intractable posterior distribution is performed to
infer the latent variables for each observed example, so that the parameters of
the normalizing flow prior and the generator can be updated with the inferred
latent variables. We show that, under the scenario of non-convergent short-run
MCMC, the finite step Langevin dynamics is a flow-like approximate inference
model and the learning objective actually follows the perturbation of the
maximum likelihood estimation (MLE). We further point out that the learning
framework seeks to (i) match the latent space normalizing flow and the
aggregated posterior produced by the short-run Langevin flow, and (ii) bias the
model from MLE such that the short-run Langevin flow inference is close to the
true posterior. Empirical results of extensive experiments validate the
effectiveness of the proposed latent space normalizing flow model in the tasks
of image generation, image reconstruction, anomaly detection, supervised image
inpainting and unsupervised image recovery.Comment: The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI)
202