4,877 research outputs found
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
We present a new method for synthesizing high-resolution photo-realistic
images from semantic label maps using conditional generative adversarial
networks (conditional GANs). Conditional GANs have enabled a variety of
applications, but the results are often limited to low-resolution and still far
from realistic. In this work, we generate 2048x1024 visually appealing results
with a novel adversarial loss, as well as new multi-scale generator and
discriminator architectures. Furthermore, we extend our framework to
interactive visual manipulation with two additional features. First, we
incorporate object instance segmentation information, which enables object
manipulations such as removing/adding objects and changing the object category.
Second, we propose a method to generate diverse results given the same input,
allowing users to edit the object appearance interactively. Human opinion
studies demonstrate that our method significantly outperforms existing methods,
advancing both the quality and the resolution of deep image synthesis and
editing.Comment: v2: CVPR camera ready, adding more results for edge-to-photo example
HP-GAN: Probabilistic 3D human motion prediction via GAN
Predicting and understanding human motion dynamics has many applications,
such as motion synthesis, augmented reality, security, and autonomous vehicles.
Due to the recent success of generative adversarial networks (GAN), there has
been much interest in probabilistic estimation and synthetic data generation
using deep neural network architectures and learning algorithms.
We propose a novel sequence-to-sequence model for probabilistic human motion
prediction, trained with a modified version of improved Wasserstein generative
adversarial networks (WGAN-GP), in which we use a custom loss function designed
for human motion prediction. Our model, which we call HP-GAN, learns a
probability density function of future human poses conditioned on previous
poses. It predicts multiple sequences of possible future human poses, each from
the same input sequence but a different vector z drawn from a random
distribution. Furthermore, to quantify the quality of the non-deterministic
predictions, we simultaneously train a motion-quality-assessment model that
learns the probability that a given skeleton sequence is a real human motion.
We test our algorithm on two of the largest skeleton datasets: NTURGB-D and
Human3.6M. We train our model on both single and multiple action types. Its
predictive power for long-term motion estimation is demonstrated by generating
multiple plausible futures of more than 30 frames from just 10 frames of input.
We show that most sequences generated from the same input have more than 50\%
probabilities of being judged as a real human sequence. We will release all the
code used in this paper to Github
Rewriting a Deep Generative Model
A deep generative model such as a GAN learns to model a rich set of semantic
and physical rules about the target distribution, but up to now, it has been
obscure how such rules are encoded in the network, or how a rule could be
changed. In this paper, we introduce a new problem setting: manipulation of
specific rules encoded by a deep generative model. To address the problem, we
propose a formulation in which the desired rule is changed by manipulating a
layer of a deep network as a linear associative memory. We derive an algorithm
for modifying one entry of the associative memory, and we demonstrate that
several interesting structural rules can be located and modified within the
layers of state-of-the-art generative models. We present a user interface to
enable users to interactively change the rules of a generative model to achieve
desired effects, and we show several proof-of-concept applications. Finally,
results on multiple datasets demonstrate the advantage of our method against
standard fine-tuning methods and edit transfer algorithms.Comment: ECCV 2020 (oral). Code at https://github.com/davidbau/rewriting. For
videos and demos see https://rewriting.csail.mit.edu
- …