807 research outputs found
Non-Adversarial Image Synthesis with Generative Latent Nearest Neighbors
Unconditional image generation has recently been dominated by generative
adversarial networks (GANs). GAN methods train a generator which regresses
images from random noise vectors, as well as a discriminator that attempts to
differentiate between the generated images and a training set of real images.
GANs have shown amazing results at generating realistic looking images. Despite
their success, GANs suffer from critical drawbacks including: unstable training
and mode-dropping. The weaknesses in GANs have motivated research into
alternatives including: variational auto-encoders (VAEs), latent embedding
learning methods (e.g. GLO) and nearest-neighbor based implicit maximum
likelihood estimation (IMLE). Unfortunately at the moment, GANs still
significantly outperform the alternative methods for image generation. In this
work, we present a novel method - Generative Latent Nearest Neighbors (GLANN) -
for training generative models without adversarial training. GLANN combines the
strengths of IMLE and GLO in a way that overcomes the main drawbacks of each
method. Consequently, GLANN generates images that are far better than GLO and
IMLE. Our method does not suffer from mode collapse which plagues GAN training
and is much more stable. Qualitative results show that GLANN outperforms a
baseline consisting of 800 GANs and VAEs on commonly used datasets. Our models
are also shown to be effective for training truly non-adversarial unsupervised
image translation
Discriminator Rejection Sampling
We propose a rejection sampling scheme using the discriminator of a GAN to
approximately correct errors in the GAN generator distribution. We show that
under quite strict assumptions, this will allow us to recover the data
distribution exactly. We then examine where those strict assumptions break down
and design a practical algorithm - called Discriminator Rejection Sampling
(DRS) - that can be used on real data-sets. Finally, we demonstrate the
efficacy of DRS on a mixture of Gaussians and on the SAGAN model,
state-of-the-art in the image generation task at the time of developing this
work. On ImageNet, we train an improved baseline that increases the Inception
Score from 52.52 to 62.36 and reduces the Frechet Inception Distance from 18.65
to 14.79. We then use DRS to further improve on this baseline, improving the
Inception Score to 76.08 and the FID to 13.75.Comment: Published as a conference paper at ICLR 201
Safer Classification by Synthesis
The discriminative approach to classification using deep neural networks has
become the de-facto standard in various fields. Complementing recent
reservations about safety against adversarial examples, we show that
conventional discriminative methods can easily be fooled to provide incorrect
labels with very high confidence to out of distribution examples. We posit that
a generative approach is the natural remedy for this problem, and propose a
method for classification using generative models. At training time, we learn a
generative model for each class, while at test time, given an example to
classify, we query each generator for its most similar generation, and select
the class corresponding to the most similar one. Our approach is general and
can be used with expressive models such as GANs and VAEs. At test time, our
method accurately "knows when it does not know," and provides resilience to out
of distribution examples while maintaining competitive performance for standard
examples
Learning Pose Specific Representations by Predicting Different Views
The labeled data required to learn pose estimation for articulated objects is
difficult to provide in the desired quantity, realism, density, and accuracy.
To address this issue, we develop a method to learn representations, which are
very specific for articulated poses, without the need for labeled training
data. We exploit the observation that the object pose of a known object is
predictive for the appearance in any known view. That is, given only the pose
and shape parameters of a hand, the hand's appearance from any viewpoint can be
approximated. To exploit this observation, we train a model that -- given input
from one view -- estimates a latent representation, which is trained to be
predictive for the appearance of the object when captured from another
viewpoint. Thus, the only necessary supervision is the second view. The
training process of this model reveals an implicit pose representation in the
latent space. Importantly, at test time the pose representation can be inferred
using only a single view. In qualitative and quantitative experiments we show
that the learned representations capture detailed pose information. Moreover,
when training the proposed method jointly with labeled and unlabeled data, it
consistently surpasses the performance of its fully supervised counterpart,
while reducing the amount of needed labeled samples by at least one order of
magnitude.Comment: CVPR 2018 (Spotlight); Project Page at
https://poier.github.io/PreView
PixelNN: Example-based Image Synthesis
We present a simple nearest-neighbor (NN) approach that synthesizes
high-frequency photorealistic images from an "incomplete" signal such as a
low-resolution image, a surface normal map, or edges. Current state-of-the-art
deep generative models designed for such conditional image synthesis lack two
important things: (1) they are unable to generate a large set of diverse
outputs, due to the mode collapse problem. (2) they are not interpretable,
making it difficult to control the synthesized output. We demonstrate that NN
approaches potentially address such limitations, but suffer in accuracy on
small datasets. We design a simple pipeline that combines the best of both
worlds: the first stage uses a convolutional neural network (CNN) to maps the
input to a (overly-smoothed) image, and the second stage uses a pixel-wise
nearest neighbor method to map the smoothed output to multiple high-quality,
high-frequency outputs in a controllable manner. We demonstrate our approach
for various input modalities, and for various domains ranging from human faces
to cats-and-dogs to shoes and handbags.Comment: Project Page: http://www.cs.cmu.edu/~aayushb/pixelNN
Improved Precision and Recall Metric for Assessing Generative Models
The ability to automatically estimate the quality and coverage of the samples
produced by a generative model is a vital requirement for driving algorithm
research. We present an evaluation metric that can separately and reliably
measure both of these aspects in image generation tasks by forming explicit,
non-parametric representations of the manifolds of real and generated data. We
demonstrate the effectiveness of our metric in StyleGAN and BigGAN by providing
several illustrative examples where existing metrics yield uninformative or
contradictory results. Furthermore, we analyze multiple design variants of
StyleGAN to better understand the relationships between the model architecture,
training methods, and the properties of the resulting sample distribution. In
the process, we identify new variants that improve the state-of-the-art. We
also perform the first principled analysis of truncation methods and identify
an improved method. Finally, we extend our metric to estimate the perceptual
quality of individual samples, and use this to study latent space
interpolations.Comment: NeurIPS 2019 final versio
Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis
We introduce a data-driven approach to complete partial 3D shapes through a
combination of volumetric deep neural networks and 3D shape synthesis. From a
partially-scanned input shape, our method first infers a low-resolution -- but
complete -- output. To this end, we introduce a 3D-Encoder-Predictor Network
(3D-EPN) which is composed of 3D convolutional layers. The network is trained
to predict and fill in missing data, and operates on an implicit surface
representation that encodes both known and unknown space. This allows us to
predict global structure in unknown areas at high accuracy. We then correlate
these intermediary results with 3D geometry from a shape database at test time.
In a final pass, we propose a patch-based 3D shape synthesis method that
imposes the 3D geometry from these retrieved shapes as constraints on the
coarsely-completed mesh. This synthesis process enables us to reconstruct
fine-scale detail and generate high-resolution output while respecting the
global mesh structure obtained by the 3D-EPN. Although our 3D-EPN outperforms
state-of-the-art completion method, the main contribution in our work lies in
the combination of a data-driven shape predictor and analytic 3D shape
synthesis. In our results, we show extensive evaluations on a newly-introduced
shape completion benchmark for both real-world and synthetic data
Deep Forward and Inverse Perceptual Models for Tracking and Prediction
We consider the problems of learning forward models that map state to
high-dimensional images and inverse models that map high-dimensional images to
state in robotics. Specifically, we present a perceptual model for generating
video frames from state with deep networks, and provide a framework for its use
in tracking and prediction tasks. We show that our proposed model greatly
outperforms standard deconvolutional methods and GANs for image generation,
producing clear, photo-realistic images. We also develop a convolutional neural
network model for state estimation and compare the result to an Extended Kalman
Filter to estimate robot trajectories. We validate all models on a real robotic
system.Comment: 8 pages, International Conference on Robotics and Automation (ICRA)
201
StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks
Although Generative Adversarial Networks (GANs) have shown remarkable success
in various tasks, they still face challenges in generating high quality images.
In this paper, we propose Stacked Generative Adversarial Networks (StackGAN)
aiming at generating high-resolution photo-realistic images. First, we propose
a two-stage generative adversarial network architecture, StackGAN-v1, for
text-to-image synthesis. The Stage-I GAN sketches the primitive shape and
colors of the object based on given text description, yielding low-resolution
images. The Stage-II GAN takes Stage-I results and text descriptions as inputs,
and generates high-resolution images with photo-realistic details. Second, an
advanced multi-stage generative adversarial network architecture, StackGAN-v2,
is proposed for both conditional and unconditional generative tasks. Our
StackGAN-v2 consists of multiple generators and discriminators in a tree-like
structure; images at multiple scales corresponding to the same scene are
generated from different branches of the tree. StackGAN-v2 shows more stable
training behavior than StackGAN-v1 by jointly approximating multiple
distributions. Extensive experiments demonstrate that the proposed stacked
generative adversarial networks significantly outperform other state-of-the-art
methods in generating photo-realistic images.Comment: In IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI),
2018. (16 pages, 15 figures.
Learning Descriptor Networks for 3D Shape Synthesis and Analysis
This paper proposes a 3D shape descriptor network, which is a deep
convolutional energy-based model, for modeling volumetric shape patterns. The
maximum likelihood training of the model follows an "analysis by synthesis"
scheme and can be interpreted as a mode seeking and mode shifting process. The
model can synthesize 3D shape patterns by sampling from the probability
distribution via MCMC such as Langevin dynamics. The model can be used to train
a 3D generator network via MCMC teaching. The conditional version of the 3D
shape descriptor net can be used for 3D object recovery and 3D object
super-resolution. Experiments demonstrate that the proposed model can generate
realistic 3D shape patterns and can be useful for 3D shape analysis.Comment: CVPR 201
- …