424 research outputs found
Learning to Generate Time-Lapse Videos Using Multi-Stage Dynamic Generative Adversarial Networks
Taking a photo outside, can we predict the immediate future, e.g., how would
the cloud move in the sky? We address this problem by presenting a generative
adversarial network (GAN) based two-stage approach to generating realistic
time-lapse videos of high resolution. Given the first frame, our model learns
to generate long-term future frames. The first stage generates videos of
realistic contents for each frame. The second stage refines the generated video
from the first stage by enforcing it to be closer to real videos with regard to
motion dynamics. To further encourage vivid motion in the final generated
video, Gram matrix is employed to model the motion more precisely. We build a
large scale time-lapse dataset, and test our approach on this new dataset.
Using our model, we are able to generate realistic videos of up to resolution for 32 frames. Quantitative and qualitative experiment results
have demonstrated the superiority of our model over the state-of-the-art
models.Comment: To appear in Proceedings of CVPR 201
Dynamic Facial Expression Generation on Hilbert Hypersphere with Conditional Wasserstein Generative Adversarial Nets
In this work, we propose a novel approach for generating videos of the six
basic facial expressions given a neutral face image. We propose to exploit the
face geometry by modeling the facial landmarks motion as curves encoded as
points on a hypersphere. By proposing a conditional version of manifold-valued
Wasserstein generative adversarial network (GAN) for motion generation on the
hypersphere, we learn the distribution of facial expression dynamics of
different classes, from which we synthesize new facial expression motions. The
resulting motions can be transformed to sequences of landmarks and then to
images sequences by editing the texture information using another conditional
Generative Adversarial Network. To the best of our knowledge, this is the first
work that explores manifold-valued representations with GAN to address the
problem of dynamic facial expression generation. We evaluate our proposed
approach both quantitatively and qualitatively on two public datasets;
Oulu-CASIA and MUG Facial Expression. Our experimental results demonstrate the
effectiveness of our approach in generating realistic videos with continuous
motion, realistic appearance and identity preservation. We also show the
efficiency of our framework for dynamic facial expressions generation, dynamic
facial expression transfer and data augmentation for training improved emotion
recognition models
Every Smile is Unique: Landmark-Guided Diverse Smile Generation
Each smile is unique: one person surely smiles in different ways (e.g.,
closing/opening the eyes or mouth). Given one input image of a neutral face,
can we generate multiple smile videos with distinctive characteristics? To
tackle this one-to-many video generation problem, we propose a novel deep
learning architecture named Conditional Multi-Mode Network (CMM-Net). To better
encode the dynamics of facial expressions, CMM-Net explicitly exploits facial
landmarks for generating smile sequences. Specifically, a variational
auto-encoder is used to learn a facial landmark embedding. This single
embedding is then exploited by a conditional recurrent network which generates
a landmark embedding sequence conditioned on a specific expression (e.g.,
spontaneous smile). Next, the generated landmark embeddings are fed into a
multi-mode recurrent landmark generator, producing a set of landmark sequences
still associated to the given smile class but clearly distinct from each other.
Finally, these landmark sequences are translated into face videos. Our
experimental results demonstrate the effectiveness of our CMM-Net in generating
realistic videos of multiple smile expressions.Comment: Accepted as a poster in Conference on Computer Vision and Pattern
Recognition (CVPR), 201
Deep learning approach to Fourier ptychographic microscopy
Convolutional neural networks (CNNs) have gained tremendous success in solving complex inverse problems. The aim of this work is to develop a novel CNN framework to reconstruct video sequences of dynamic live cells captured using a computational microscopy technique, Fourier ptychographic microscopy (FPM). The unique feature of the FPM is its capability to reconstruct images with both wide field-of-view (FOV) and high resolution, i.e. a large space-bandwidth-product (SBP), by taking a series of low resolution intensity images. For live cell imaging, a single FPM frame contains thousands of cell samples with different morphological features. Our idea is to fully exploit the statistical information provided by these large spatial ensembles so as to make predictions in a sequential measurement, without using any additional temporal dataset. Specifically, we show that it is possible to reconstruct high-SBP dynamic cell videos by a CNN trained only on the first FPM dataset captured at the beginning of a time-series experiment. Our CNN approach reconstructs a 12800×10800 pixel phase image using only ∼25 seconds, a 50× speedup compared to the model-based FPM algorithm. In addition, the CNN further reduces the required number of images in each time frame by ∼ 6×. Overall, this significantly improves the imaging throughput by reducing both the acquisition and computational times. The proposed CNN is based on the conditional generative adversarial network (cGAN) framework. We further propose a mixed loss function that combines the standard image domain loss and a weighted Fourier domain loss, which leads to improved reconstruction of the high frequency information. Additionally, we also exploit transfer learning so that our pre-trained CNN can be further optimized to image other cell types. Our technique demonstrates a promising deep learning approach to continuously monitor large live-cell populations over an extended time and gather useful spatial and temporal information with sub-cellular resolution.We would like to thank NVIDIA Corporation for supporting us with the GeForce Titan Xp through the GPU Grant Program. (NVIDIA Corporation; GeForce Titan Xp through the GPU Grant Program)First author draf
Deep learning approach to Fourier ptychographic microscopy
Convolutional neural networks (CNNs) have gained tremendous success in
solving complex inverse problems. The aim of this work is to develop a novel
CNN framework to reconstruct video sequence of dynamic live cells captured
using a computational microscopy technique, Fourier ptychographic microscopy
(FPM). The unique feature of the FPM is its capability to reconstruct images
with both wide field-of-view (FOV) and high resolution, i.e. a large
space-bandwidth-product (SBP), by taking a series of low resolution intensity
images. For live cell imaging, a single FPM frame contains thousands of cell
samples with different morphological features. Our idea is to fully exploit the
statistical information provided by this large spatial ensemble so as to make
predictions in a sequential measurement, without using any additional temporal
dataset. Specifically, we show that it is possible to reconstruct high-SBP
dynamic cell videos by a CNN trained only on the first FPM dataset captured at
the beginning of a time-series experiment. Our CNN approach reconstructs a
12800X10800 pixels phase image using only ~25 seconds, a 50X speedup compared
to the model-based FPM algorithm. In addition, the CNN further reduces the
required number of images in each time frame by ~6X. Overall, this
significantly improves the imaging throughput by reducing both the acquisition
and computational times. The proposed CNN is based on the conditional
generative adversarial network (cGAN) framework. Additionally, we also exploit
transfer learning so that our pre-trained CNN can be further optimized to image
other cell types. Our technique demonstrates a promising deep learning approach
to continuously monitor large live-cell populations over an extended time and
gather useful spatial and temporal information with sub-cellular resolution
- …