13 research outputs found
Recycle-GAN: Unsupervised Video Retargeting
We introduce a data-driven approach for unsupervised video retargeting that
translates content from one domain to another while preserving the style native
to a domain, i.e., if contents of John Oliver's speech were to be transferred
to Stephen Colbert, then the generated content/speech should be in Stephen
Colbert's style. Our approach combines both spatial and temporal information
along with adversarial losses for content translation and style preservation.
In this work, we first study the advantages of using spatiotemporal constraints
over spatial constraints for effective retargeting. We then demonstrate the
proposed approach for the problems where information in both space and time
matters such as face-to-face translation, flower-to-flower, wind and cloud
synthesis, sunrise and sunset.Comment: ECCV 2018; Please refer to project webpage for videos -
http://www.cs.cmu.edu/~aayushb/Recycle-GA
Factorized Inference in Deep Markov Models for Incomplete Multimodal Time Series
Integrating deep learning with latent state space models has the potential to
yield temporal models that are powerful, yet tractable and interpretable.
Unfortunately, current models are not designed to handle missing data or
multiple data modalities, which are both prevalent in real-world data. In this
work, we introduce a factorized inference method for Multimodal Deep Markov
Models (MDMMs), allowing us to filter and smooth in the presence of missing
data, while also performing uncertainty-aware multimodal fusion. We derive this
method by factorizing the posterior p(z|x) for non-linear state space models,
and develop a variational backward-forward algorithm for inference. Because our
method handles incompleteness over both time and modalities, it is capable of
interpolation, extrapolation, conditional generation, label prediction, and
weakly supervised learning of multimodal time series. We demonstrate these
capabilities on both synthetic and real-world multimodal data under high levels
of data deletion. Our method performs well even with more than 50% missing
data, and outperforms existing deep approaches to inference in latent time
series.Comment: 8 pages, 4 figures, accepted to AAAI 2020, code available at:
https://github.com/ztangent/multimodal-dm
Understanding the Perceived Quality of Video Predictions
The study of video prediction models is believed to be a fundamental approach
to representation learning for videos. While a plethora of generative models
for predicting the future frame pixel values given the past few frames exist,
the quantitative evaluation of the predicted frames has been found to be
extremely challenging. In this context, we study the problem of quality
assessment of predicted videos. We create the Indian Institute of Science
Predicted Videos Quality Assessment (IISc PVQA) Database consisting of 300
videos, obtained by applying different prediction models on different datasets,
and accompanying human opinion scores. We collected subjective ratings of
quality from 50 human participants for these videos. Our subjective study
reveals that human observers were highly consistent in their judgments of
quality of predicted videos. We benchmark several popularly used measures for
evaluating video prediction and show that they do not adequately correlate with
these subjective scores. We introduce two new features to effectively capture
the quality of predicted videos, motion-compensated cosine similarities of deep
features of predicted frames with past frames, and deep features extracted from
rescaled frame differences. We show that our feature design leads to state of
the art quality prediction in accordance with human judgments on our IISc PVQA
Database. The database and code are publicly available on our project website:
https://nagabhushansn95.github.io/publications/2020/pvqaComment: Project website:
https://nagabhushansn95.github.io/publications/2020/pvqa.htm
A General Method for Amortizing Variational Filtering
We introduce the variational filtering EM algorithm, a simple, general-purpose method for performing variational inference in dynamical latent variable models using information from only past and present variables, i.e. filtering. The algorithm is derived from the variational objective in the filtering setting and consists of an optimization procedure at each time step. By performing each inference optimization procedure with an iterative amortized inference model, we obtain a computationally efficient implementation of the algorithm, which we call amortized variational filtering. We present experiments demonstrating that this general-purpose method improves performance across several deep dynamical latent variable models