1,548 research outputs found
Non-Adversarial Image Synthesis with Generative Latent Nearest Neighbors
Unconditional image generation has recently been dominated by generative
adversarial networks (GANs). GAN methods train a generator which regresses
images from random noise vectors, as well as a discriminator that attempts to
differentiate between the generated images and a training set of real images.
GANs have shown amazing results at generating realistic looking images. Despite
their success, GANs suffer from critical drawbacks including: unstable training
and mode-dropping. The weaknesses in GANs have motivated research into
alternatives including: variational auto-encoders (VAEs), latent embedding
learning methods (e.g. GLO) and nearest-neighbor based implicit maximum
likelihood estimation (IMLE). Unfortunately at the moment, GANs still
significantly outperform the alternative methods for image generation. In this
work, we present a novel method - Generative Latent Nearest Neighbors (GLANN) -
for training generative models without adversarial training. GLANN combines the
strengths of IMLE and GLO in a way that overcomes the main drawbacks of each
method. Consequently, GLANN generates images that are far better than GLO and
IMLE. Our method does not suffer from mode collapse which plagues GAN training
and is much more stable. Qualitative results show that GLANN outperforms a
baseline consisting of 800 GANs and VAEs on commonly used datasets. Our models
are also shown to be effective for training truly non-adversarial unsupervised
image translation
Discriminator Rejection Sampling
We propose a rejection sampling scheme using the discriminator of a GAN to
approximately correct errors in the GAN generator distribution. We show that
under quite strict assumptions, this will allow us to recover the data
distribution exactly. We then examine where those strict assumptions break down
and design a practical algorithm - called Discriminator Rejection Sampling
(DRS) - that can be used on real data-sets. Finally, we demonstrate the
efficacy of DRS on a mixture of Gaussians and on the SAGAN model,
state-of-the-art in the image generation task at the time of developing this
work. On ImageNet, we train an improved baseline that increases the Inception
Score from 52.52 to 62.36 and reduces the Frechet Inception Distance from 18.65
to 14.79. We then use DRS to further improve on this baseline, improving the
Inception Score to 76.08 and the FID to 13.75.Comment: Published as a conference paper at ICLR 201
Visually-Aware Fashion Recommendation and Design with Generative Image Models
Building effective recommender systems for domains like fashion is
challenging due to the high level of subjectivity and the semantic complexity
of the features involved (i.e., fashion styles). Recent work has shown that
approaches to `visual' recommendation (e.g.~clothing, art, etc.) can be made
more accurate by incorporating visual signals directly into the recommendation
objective, using `off-the-shelf' feature representations derived from deep
networks. Here, we seek to extend this contribution by showing that
recommendation performance can be significantly improved by learning `fashion
aware' image representations directly, i.e., by training the image
representation (from the pixel level) and the recommender system jointly; this
contribution is related to recent work using Siamese CNNs, though we are able
to show improvements over state-of-the-art recommendation techniques such as
BPR and variants that make use of pre-trained visual features. Furthermore, we
show that our model can be used \emph{generatively}, i.e., given a user and a
product category, we can generate new images (i.e., clothing items) that are
most consistent with their personal taste. This represents a first step towards
building systems that go beyond recommending existing items from a product
corpus, but which can be used to suggest styles and aid the design of new
products.Comment: 10 pages, 6 figures. Accepted by ICDM'17 as a long pape
Generative Latent Flow
In this work, we propose the Generative Latent Flow (GLF), an algorithm for
generative modeling of the data distribution. GLF uses an Auto-encoder (AE) to
learn latent representations of the data, and a normalizing flow to map the
distribution of the latent variables to that of simple i.i.d noise. In contrast
to some other Auto-encoder based generative models, which use various
regularizers that encourage the encoded latent distribution to match the prior
distribution, our model explicitly constructs a mapping between these two
distributions, leading to better density matching while avoiding over
regularizing the latent variables. We compare our model with several related
techniques, and show that it has many relative advantages including fast
convergence, single stage training and minimal reconstruction trade-off. We
also study the relationship between our model and its stochastic counterpart,
and show that our model can be viewed as a vanishing noise limit of VAEs with
flow prior. Quantitatively, under standardized evaluations, our method achieves
state-of-the-art sample quality among AE based models on commonly used
datasets, and is competitive with GANs' benchmarks
Large Scale GAN Training for High Fidelity Natural Image Synthesis
Despite recent progress in generative image modeling, successfully generating
high-resolution, diverse samples from complex datasets such as ImageNet remains
an elusive goal. To this end, we train Generative Adversarial Networks at the
largest scale yet attempted, and study the instabilities specific to such
scale. We find that applying orthogonal regularization to the generator renders
it amenable to a simple "truncation trick," allowing fine control over the
trade-off between sample fidelity and variety by reducing the variance of the
Generator's input. Our modifications lead to models which set the new state of
the art in class-conditional image synthesis. When trained on ImageNet at
128x128 resolution, our models (BigGANs) achieve an Inception Score (IS) of
166.5 and Frechet Inception Distance (FID) of 7.4, improving over the previous
best IS of 52.52 and FID of 18.6
Safer Classification by Synthesis
The discriminative approach to classification using deep neural networks has
become the de-facto standard in various fields. Complementing recent
reservations about safety against adversarial examples, we show that
conventional discriminative methods can easily be fooled to provide incorrect
labels with very high confidence to out of distribution examples. We posit that
a generative approach is the natural remedy for this problem, and propose a
method for classification using generative models. At training time, we learn a
generative model for each class, while at test time, given an example to
classify, we query each generator for its most similar generation, and select
the class corresponding to the most similar one. Our approach is general and
can be used with expressive models such as GANs and VAEs. At test time, our
method accurately "knows when it does not know," and provides resilience to out
of distribution examples while maintaining competitive performance for standard
examples
Learning Pose Specific Representations by Predicting Different Views
The labeled data required to learn pose estimation for articulated objects is
difficult to provide in the desired quantity, realism, density, and accuracy.
To address this issue, we develop a method to learn representations, which are
very specific for articulated poses, without the need for labeled training
data. We exploit the observation that the object pose of a known object is
predictive for the appearance in any known view. That is, given only the pose
and shape parameters of a hand, the hand's appearance from any viewpoint can be
approximated. To exploit this observation, we train a model that -- given input
from one view -- estimates a latent representation, which is trained to be
predictive for the appearance of the object when captured from another
viewpoint. Thus, the only necessary supervision is the second view. The
training process of this model reveals an implicit pose representation in the
latent space. Importantly, at test time the pose representation can be inferred
using only a single view. In qualitative and quantitative experiments we show
that the learned representations capture detailed pose information. Moreover,
when training the proposed method jointly with labeled and unlabeled data, it
consistently surpasses the performance of its fully supervised counterpart,
while reducing the amount of needed labeled samples by at least one order of
magnitude.Comment: CVPR 2018 (Spotlight); Project Page at
https://poier.github.io/PreView
Pros and Cons of GAN Evaluation Measures
Generative models, in particular generative adversarial networks (GANs), have
received significant attention recently. A number of GAN variants have been
proposed and have been utilized in many applications. Despite large strides in
terms of theoretical progress, evaluating and comparing GANs remains a daunting
task. While several measures have been introduced, as of yet, there is no
consensus as to which measure best captures strengths and limitations of models
and should be used for fair model comparison. As in other areas of computer
vision and machine learning, it is critical to settle on one or few good
measures to steer the progress in this field. In this paper, I review and
critically discuss more than 24 quantitative and 5 qualitative measures for
evaluating generative models with a particular emphasis on GAN-derived models.
I also provide a set of 7 desiderata followed by an evaluation of whether a
given measure or a family of measures is compatible with them
PixelNN: Example-based Image Synthesis
We present a simple nearest-neighbor (NN) approach that synthesizes
high-frequency photorealistic images from an "incomplete" signal such as a
low-resolution image, a surface normal map, or edges. Current state-of-the-art
deep generative models designed for such conditional image synthesis lack two
important things: (1) they are unable to generate a large set of diverse
outputs, due to the mode collapse problem. (2) they are not interpretable,
making it difficult to control the synthesized output. We demonstrate that NN
approaches potentially address such limitations, but suffer in accuracy on
small datasets. We design a simple pipeline that combines the best of both
worlds: the first stage uses a convolutional neural network (CNN) to maps the
input to a (overly-smoothed) image, and the second stage uses a pixel-wise
nearest neighbor method to map the smoothed output to multiple high-quality,
high-frequency outputs in a controllable manner. We demonstrate our approach
for various input modalities, and for various domains ranging from human faces
to cats-and-dogs to shoes and handbags.Comment: Project Page: http://www.cs.cmu.edu/~aayushb/pixelNN
Improved Precision and Recall Metric for Assessing Generative Models
The ability to automatically estimate the quality and coverage of the samples
produced by a generative model is a vital requirement for driving algorithm
research. We present an evaluation metric that can separately and reliably
measure both of these aspects in image generation tasks by forming explicit,
non-parametric representations of the manifolds of real and generated data. We
demonstrate the effectiveness of our metric in StyleGAN and BigGAN by providing
several illustrative examples where existing metrics yield uninformative or
contradictory results. Furthermore, we analyze multiple design variants of
StyleGAN to better understand the relationships between the model architecture,
training methods, and the properties of the resulting sample distribution. In
the process, we identify new variants that improve the state-of-the-art. We
also perform the first principled analysis of truncation methods and identify
an improved method. Finally, we extend our metric to estimate the perceptual
quality of individual samples, and use this to study latent space
interpolations.Comment: NeurIPS 2019 final versio
- …