46,707 research outputs found
Hierarchical Video Generation from Orthogonal Information: Optical Flow and Texture
Learning to represent and generate videos from unlabeled data is a very
challenging problem. To generate realistic videos, it is important not only to
ensure that the appearance of each frame is real, but also to ensure the
plausibility of a video motion and consistency of a video appearance in the
time direction. The process of video generation should be divided according to
these intrinsic difficulties. In this study, we focus on the motion and
appearance information as two important orthogonal components of a video, and
propose Flow-and-Texture-Generative Adversarial Networks (FTGAN) consisting of
FlowGAN and TextureGAN. In order to avoid a huge annotation cost, we have to
explore a way to learn from unlabeled data. Thus, we employ optical flow as
motion information to generate videos. FlowGAN generates optical flow, which
contains only the edge and motion of the videos to be begerated. On the other
hand, TextureGAN specializes in giving a texture to optical flow generated by
FlowGAN. This hierarchical approach brings more realistic videos with plausible
motion and appearance consistency. Our experiments show that our model
generates more plausible motion videos and also achieves significantly improved
performance for unsupervised action classification in comparison to previous
GAN works. In addition, because our model generates videos from two independent
information, our model can generate new combinations of motion and attribute
that are not seen in training data, such as a video in which a person is doing
sit-up in a baseball ground.Comment: Our supplemental material is available on
http://www.mi.t.u-tokyo.ac.jp/assets/publication/hierarchical_video_generation_sup/
Accepted to AAAI201
Semi-supervised FusedGAN for Conditional Image Generation
We present FusedGAN, a deep network for conditional image synthesis with
controllable sampling of diverse images. Fidelity, diversity and controllable
sampling are the main quality measures of a good image generation model. Most
existing models are insufficient in all three aspects. The FusedGAN can perform
controllable sampling of diverse images with very high fidelity. We argue that
controllability can be achieved by disentangling the generation process into
various stages. In contrast to stacked GANs, where multiple stages of GANs are
trained separately with full supervision of labeled intermediate images, the
FusedGAN has a single stage pipeline with a built-in stacking of GANs. Unlike
existing methods, which requires full supervision with paired conditions and
images, the FusedGAN can effectively leverage more abundant images without
corresponding conditions in training, to produce more diverse samples with high
fidelity. We achieve this by fusing two generators: one for unconditional image
generation, and the other for conditional image generation, where the two
partly share a common latent space thereby disentangling the generation. We
demonstrate the efficacy of the FusedGAN in fine grained image generation tasks
such as text-to-image, and attribute-to-face generation
Informative sample generation using class aware generative adversarial networks for classification of chest Xrays
Training robust deep learning (DL) systems for disease detection from medical
images is challenging due to limited images covering different disease types
and severity. The problem is especially acute, where there is a severe class
imbalance. We propose an active learning (AL) framework to select most
informative samples for training our model using a Bayesian neural network.
Informative samples are then used within a novel class aware generative
adversarial network (CAGAN) to generate realistic chest xray images for data
augmentation by transferring characteristics from one class label to another.
Experiments show our proposed AL framework is able to achieve state-of-the-art
performance by using about of the full dataset, thus saving significant
time and effort over conventional methods
Deep Video Generation, Prediction and Completion of Human Action Sequences
Current deep learning results on video generation are limited while there are
only a few first results on video prediction and no relevant significant
results on video completion. This is due to the severe ill-posedness inherent
in these three problems. In this paper, we focus on human action videos, and
propose a general, two-stage deep framework to generate human action videos
with no constraints or arbitrary number of constraints, which uniformly address
the three problems: video generation given no input frames, video prediction
given the first few frames, and video completion given the first and last
frames. To make the problem tractable, in the first stage we train a deep
generative model that generates a human pose sequence from random noise. In the
second stage, a skeleton-to-image network is trained, which is used to generate
a human action video given the complete human pose sequence generated in the
first stage. By introducing the two-stage strategy, we sidestep the original
ill-posed problems while producing for the first time high-quality video
generation/prediction/completion results of much longer duration. We present
quantitative and qualitative evaluation to show that our two-stage approach
outperforms state-of-the-art methods in video generation, prediction and video
completion. Our video result demonstration can be viewed at
https://iamacewhite.github.io/supp/index.htmlComment: Under review for CVPR 2018. Haoye and Chunyan have equal contributio
Attribute-Guided Face Generation Using Conditional CycleGAN
We are interested in attribute-guided face generation: given a low-res face
input image, an attribute vector that can be extracted from a high-res image
(attribute image), our new method generates a high-res face image for the
low-res input that satisfies the given attributes. To address this problem, we
condition the CycleGAN and propose conditional CycleGAN, which is designed to
1) handle unpaired training data because the training low/high-res and high-res
attribute images may not necessarily align with each other, and to 2) allow
easy control of the appearance of the generated face via the input attributes.
We demonstrate impressive results on the attribute-guided conditional CycleGAN,
which can synthesize realistic face images with appearance easily controlled by
user-supplied attributes (e.g., gender, makeup, hair color, eyeglasses). Using
the attribute image as identity to produce the corresponding conditional vector
and by incorporating a face verification network, the attribute-guided network
becomes the identity-guided conditional CycleGAN which produces impressive and
interesting results on identity transfer. We demonstrate three applications on
identity-guided conditional CycleGAN: identity-preserving face superresolution,
face swapping, and frontal face generation, which consistently show the
advantage of our new method.Comment: ECCV 201
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks
In this paper, we propose an Attentional Generative Adversarial Network
(AttnGAN) that allows attention-driven, multi-stage refinement for fine-grained
text-to-image generation. With a novel attentional generative network, the
AttnGAN can synthesize fine-grained details at different subregions of the
image by paying attentions to the relevant words in the natural language
description. In addition, a deep attentional multimodal similarity model is
proposed to compute a fine-grained image-text matching loss for training the
generator. The proposed AttnGAN significantly outperforms the previous state of
the art, boosting the best reported inception score by 14.14% on the CUB
dataset and 170.25% on the more challenging COCO dataset. A detailed analysis
is also performed by visualizing the attention layers of the AttnGAN. It for
the first time shows that the layered attentional GAN is able to automatically
select the condition at the word level for generating different parts of the
image
- …