14,376 research outputs found
Metric Learning-based Generative Adversarial Network
Generative Adversarial Networks (GANs), as a framework for estimating
generative models via an adversarial process, have attracted huge attention and
have proven to be powerful in a variety of tasks. However, training GANs is
well known for being delicate and unstable, partially caused by its sig- moid
cross entropy loss function for the discriminator. To overcome such a problem,
many researchers directed their attention on various ways to measure how close
the model distribution and real distribution are and have applied dif- ferent
metrics as their objective functions. In this paper, we propose a novel
framework to train GANs based on distance metric learning and we call it Metric
Learning-based Gener- ative Adversarial Network (MLGAN). The discriminator of
MLGANs can dynamically learn an appropriate metric, rather than a static one,
to measure the distance between generated samples and real samples. Afterwards,
MLGANs update the generator under the newly learned metric. We evaluate our ap-
proach on several representative datasets and the experimen- tal results
demonstrate that MLGANs can achieve superior performance compared with several
existing state-of-the-art approaches. We also empirically show that MLGANs
could increase the stability of training GANs
Generative Adversarial Mapping Networks
Generative Adversarial Networks (GANs) have shown impressive performance in
generating photo-realistic images. They fit generative models by minimizing
certain distance measure between the real image distribution and the generated
data distribution. Several distance measures have been used, such as
Jensen-Shannon divergence, -divergence, and Wasserstein distance, and
choosing an appropriate distance measure is very important for training the
generative network. In this paper, we choose to use the maximum mean
discrepancy (MMD) as the distance metric, which has several nice theoretical
guarantees. In fact, generative moment matching network (GMMN) (Li, Swersky,
and Zemel 2015) is such a generative model which contains only one generator
network trained by directly minimizing MMD between the real and generated
distributions. However, it fails to generate meaningful samples on challenging
benchmark datasets, such as CIFAR-10 and LSUN. To improve on GMMN, we propose
to add an extra network , called mapper. maps both real data
distribution and generated data distribution from the original data space to a
feature representation space , and it is trained to maximize MMD
between the two mapped distributions in , while the generator
tries to minimize the MMD. We call the new model generative adversarial mapping
networks (GAMNs). We demonstrate that the adversarial mapper can help
to better capture the underlying data distribution. We also show that GAMN
significantly outperforms GMMN, and is also superior to or comparable with
other state-of-the-art GAN based methods on MNIST, CIFAR-10 and LSUN-Bedrooms
datasets.Comment: 9 pages, 7 figure
Generative Adversarial Mapping Networks
Generative Adversarial Networks (GANs) have shown impressive performance in
generating photo-realistic images. They fit generative models by minimizing
certain distance measure between the real image distribution and the generated
data distribution. Several distance measures have been used, such as
Jensen-Shannon divergence, -divergence, and Wasserstein distance, and
choosing an appropriate distance measure is very important for training the
generative network. In this paper, we choose to use the maximum mean
discrepancy (MMD) as the distance metric, which has several nice theoretical
guarantees. In fact, generative moment matching network (GMMN) (Li, Swersky,
and Zemel 2015) is such a generative model which contains only one generator
network trained by directly minimizing MMD between the real and generated
distributions. However, it fails to generate meaningful samples on challenging
benchmark datasets, such as CIFAR-10 and LSUN. To improve on GMMN, we propose
to add an extra network , called mapper. maps both real data
distribution and generated data distribution from the original data space to a
feature representation space , and it is trained to maximize MMD
between the two mapped distributions in , while the generator
tries to minimize the MMD. We call the new model generative adversarial mapping
networks (GAMNs). We demonstrate that the adversarial mapper can help
to better capture the underlying data distribution. We also show that GAMN
significantly outperforms GMMN, and is also superior to or comparable with
other state-of-the-art GAN based methods on MNIST, CIFAR-10 and LSUN-Bedrooms
datasets.Comment: 9 pages, 7 figure
Non-Adversarial Image Synthesis with Generative Latent Nearest Neighbors
Unconditional image generation has recently been dominated by generative
adversarial networks (GANs). GAN methods train a generator which regresses
images from random noise vectors, as well as a discriminator that attempts to
differentiate between the generated images and a training set of real images.
GANs have shown amazing results at generating realistic looking images. Despite
their success, GANs suffer from critical drawbacks including: unstable training
and mode-dropping. The weaknesses in GANs have motivated research into
alternatives including: variational auto-encoders (VAEs), latent embedding
learning methods (e.g. GLO) and nearest-neighbor based implicit maximum
likelihood estimation (IMLE). Unfortunately at the moment, GANs still
significantly outperform the alternative methods for image generation. In this
work, we present a novel method - Generative Latent Nearest Neighbors (GLANN) -
for training generative models without adversarial training. GLANN combines the
strengths of IMLE and GLO in a way that overcomes the main drawbacks of each
method. Consequently, GLANN generates images that are far better than GLO and
IMLE. Our method does not suffer from mode collapse which plagues GAN training
and is much more stable. Qualitative results show that GLANN outperforms a
baseline consisting of 800 GANs and VAEs on commonly used datasets. Our models
are also shown to be effective for training truly non-adversarial unsupervised
image translation
An empirical study on evaluation metrics of generative adversarial networks
Evaluating generative adversarial networks (GANs) is inherently challenging.
In this paper, we revisit several representative sample-based evaluation
metrics for GANs, and address the problem of how to evaluate the evaluation
metrics. We start with a few necessary conditions for metrics to produce
meaningful scores, such as distinguishing real from generated samples,
identifying mode dropping and mode collapsing, and detecting overfitting. With
a series of carefully designed experiments, we comprehensively investigate
existing sample-based metrics and identify their strengths and limitations in
practical settings. Based on these results, we observe that kernel Maximum Mean
Discrepancy (MMD) and the 1-Nearest-Neighbor (1-NN) two-sample test seem to
satisfy most of the desirable properties, provided that the distances between
samples are computed in a suitable feature space. Our experiments also unveil
interesting properties about the behavior of several popular GAN models, such
as whether they are memorizing training samples, and how far they are from
learning the target distribution.Comment: arXiv admin note: text overlap with arXiv:1802.03446 by other author
Generative Adversarial Networks (GANs): What it can generate and What it cannot?
In recent years, Generative Adversarial Networks (GANs) have received
significant attention from the research community. With a straightforward
implementation and outstanding results, GANs have been used for numerous
applications. Despite the success, GANs lack a proper theoretical explanation.
These models suffer from issues like mode collapse, non-convergence, and
instability during training. To address these issues, researchers have proposed
theoretically rigorous frameworks inspired by varied fields of Game theory,
Statistical theory, Dynamical systems, etc.
In this paper, we propose to give an appropriate structure to study these
contributions systematically. We essentially categorize the papers based on the
issues they raise and the kind of novelty they introduce to address them.
Besides, we provide insight into how each of the discussed articles solves the
concerned problems. We compare and contrast different results and put forth a
summary of theoretical contributions about GANs with focus on image/visual
applications. We expect this summary paper to give a bird's eye view to a
person wishing to understand the theoretical progress in GANs so far
ShapeAdv: Generating Shape-Aware Adversarial 3D Point Clouds
We introduce ShapeAdv, a novel framework to study shape-aware adversarial
perturbations that reflect the underlying shape variations (e.g., geometric
deformations and structural differences) in the 3D point cloud space. We
develop shape-aware adversarial 3D point cloud attacks by leveraging the
learned latent space of a point cloud auto-encoder where the adversarial noise
is applied in the latent space. Specifically, we propose three different
variants including an exemplar-based one by guiding the shape deformation with
auxiliary data, such that the generated point cloud resembles the shape
morphing between objects in the same category. Different from prior works, the
resulting adversarial 3D point clouds reflect the shape variations in the 3D
point cloud space while still being close to the original one. In addition,
experimental evaluations on the ModelNet40 benchmark demonstrate that our
adversaries are more difficult to defend with existing point cloud defense
methods and exhibit a higher attack transferability across classifiers. Our
shape-aware adversarial attacks are orthogonal to existing point cloud based
attacks and shed light on the vulnerability of 3D deep neural networks.Comment: 3D Point Clouds, Adversarial Learnin
FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification
Person re-identification (reID) is an important task that requires to
retrieve a person's images from an image dataset, given one image of the person
of interest. For learning robust person features, the pose variation of person
images is one of the key challenges. Existing works targeting the problem
either perform human alignment, or learn human-region-based representations.
Extra pose information and computational cost is generally required for
inference. To solve this issue, a Feature Distilling Generative Adversarial
Network (FD-GAN) is proposed for learning identity-related and pose-unrelated
representations. It is a novel framework based on a Siamese structure with
multiple novel discriminators on human poses and identities. In addition to the
discriminators, a novel same-pose loss is also integrated, which requires
appearance of a same person's generated images to be similar. After learning
pose-unrelated person features with pose guidance, no auxiliary pose
information and additional computational cost is required during testing. Our
proposed FD-GAN achieves state-of-the-art performance on three person reID
datasets, which demonstrates that the effectiveness and robust feature
distilling capability of the proposed FD-GAN.Comment: Accepted in Proceedings of 32nd Conference on Neural Information
Processing Systems (NeurIPS 2018). Code available:
https://github.com/yxgeee/FD-GA
Label-Removed Generative Adversarial Networks Incorporating with K-Means
Generative Adversarial Networks (GANs) have achieved great success in
generating realistic images. Most of these are conditional models, although
acquisition of class labels is expensive and time-consuming in practice. To
reduce the dependence on labeled data, we propose an un-conditional generative
adversarial model, called K-Means-GAN (KM-GAN), which incorporates the idea of
updating centers in K-Means into GANs. Specifically, we redesign the framework
of GANs by applying K-Means on the features extracted from the discriminator.
With obtained labels from K-Means, we propose new objective functions from the
perspective of deep metric learning (DML). Distinct from previous works, the
discriminator is treated as a feature extractor rather than a classifier in
KM-GAN, meanwhile utilization of K-Means makes features of the discriminator
more representative. Experiments are conducted on various datasets, such as
MNIST, Fashion-10, CIFAR-10 and CelebA, and show that the quality of samples
generated by KM-GAN is comparable to some conditional generative adversarial
models
Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model
We present a novel training framework for neural sequence models,
particularly for grounded dialog generation. The standard training paradigm for
these models is maximum likelihood estimation (MLE), or minimizing the
cross-entropy of the human responses. Across a variety of domains, a recurring
problem with MLE trained generative neural dialog models (G) is that they tend
to produce 'safe' and generic responses ("I don't know", "I can't tell"). In
contrast, discriminative dialog models (D) that are trained to rank a list of
candidate human responses outperform their generative counterparts; in terms of
automatic metrics, diversity, and informativeness of the responses. However, D
is not useful in practice since it cannot be deployed to have real
conversations with users.
Our work aims to achieve the best of both worlds -- the practical usefulness
of G and the strong performance of D -- via knowledge transfer from D to G. Our
primary contribution is an end-to-end trainable generative visual dialog model,
where G receives gradients from D as a perceptual (not adversarial) loss of the
sequence sampled from G. We leverage the recently proposed Gumbel-Softmax (GS)
approximation to the discrete distribution -- specifically, an RNN augmented
with a sequence of GS samplers, coupled with the straight-through gradient
estimator to enable end-to-end differentiability. We also introduce a stronger
encoder for visual dialog, and employ a self-attention mechanism for answer
encoding along with a metric learning loss to aid D in better capturing
semantic similarities in answer responses. Overall, our proposed model
outperforms state-of-the-art on the VisDial dataset by a significant margin
(2.67% on recall@10). The source code can be downloaded from
https://github.com/jiasenlu/visDial.pytorch.Comment: 11 pages, 3 figure
- …