1,809 research outputs found
How Generative Adversarial Networks and Their Variants Work: An Overview
Generative Adversarial Networks (GAN) have received wide attention in the
machine learning field for their potential to learn high-dimensional, complex
real data distribution. Specifically, they do not rely on any assumptions about
the distribution and can generate real-like samples from latent space in a
simple manner. This powerful property leads GAN to be applied to various
applications such as image synthesis, image attribute editing, image
translation, domain adaptation and other academic fields. In this paper, we aim
to discuss the details of GAN for those readers who are familiar with, but do
not comprehend GAN deeply or who wish to view GAN from various perspectives. In
addition, we explain how GAN operates and the fundamental meaning of various
objective functions that have been suggested recently. We then focus on how the
GAN can be combined with an autoencoder framework. Finally, we enumerate the
GAN variants that are applied to various tasks and other fields for those who
are interested in exploiting GAN for their research.Comment: 41 pages, 16 figures, Published in ACM Computing Surveys (CSUR
Towards universal neural nets: Gibbs machines and ACE
We study from a physics viewpoint a class of generative neural nets, Gibbs
machines, designed for gradual learning. While including variational
auto-encoders, they offer a broader universal platform for incrementally adding
newly learned features, including physical symmetries. Their direct connection
to statistical physics and information geometry is established. A variational
Pythagorean theorem justifies invoking the exponential/Gibbs class of
probabilities for creating brand new objects. Combining these nets with
classifiers, gives rise to a brand of universal generative neural nets -
stochastic auto-classifier-encoders (ACE). ACE have state-of-the-art
performance in their class, both for classification and density estimation for
the MNIST data set.Comment: v5: added thermodynamic identities and variational error estimation;
expanded reference
Unified Probabilistic Deep Continual Learning through Generative Replay and Open Set Recognition
We introduce a probabilistic approach to unify open set recognition with the
prevention of catastrophic forgetting in deep continual learning, based on
variational Bayesian inference. Our single model combines a joint probabilistic
encoder with a generative model and a linear classifier that get shared across
sequentially arriving tasks. In order to successfully distinguish unseen
unknown data from trained known tasks, we propose to bound the class specific
approximate posterior by fitting regions of high density on the basis of
correctly classified data points. These bounds are further used to
significantly alleviate catastrophic forgetting by avoiding samples from low
density areas in generative replay. Our approach requires neither storing of
old, nor upfront knowledge of future data, and is empirically validated on
visual and audio tasks in class incremental, as well as cross-dataset scenarios
across modalities
Preventing Posterior Collapse with Levenshtein Variational Autoencoder
Variational autoencoders (VAEs) are a standard framework for inducing latent
variable models that have been shown effective in learning text representations
as well as in text generation. The key challenge with using VAEs is the {\it
posterior collapse} problem: learning tends to converge to trivial solutions
where the generators ignore latent variables. In our Levenstein VAE, we propose
to replace the evidence lower bound (ELBO) with a new objective which is simple
to optimize and prevents posterior collapse. Intuitively, it corresponds to
generating a sequence from the autoencoder and encouraging the model to predict
an optimal continuation according to the Levenshtein distance (LD) with the
reference sentence at each time step in the generated sequence. We motivate the
method from the probabilistic perspective by showing that it is closely related
to optimizing a bound on the intractable Kullback-Leibler divergence of an
LD-based kernel density estimator from the model distribution. With this
objective, any generator disregarding latent variables will incur large
penalties and hence posterior collapse does not happen. We relate our approach
to policy distillation \cite{RossGB11} and dynamic oracles \cite{GoldbergN12}.
By considering Yelp and SNLI benchmarks, we show that Levenstein VAE produces
more informative latent representations than alternative approaches to
preventing posterior collapse
Efficient Large-Scale Domain Classification with Personalized Attention
In this paper, we explore the task of mapping spoken language utterances to
one of thousands of natural language understanding domains in intelligent
personal digital assistants (IPDAs). This scenario is observed for many
mainstream IPDAs in industry that allow third parties to develop thousands of
new domains to augment built-in ones to rapidly increase domain coverage and
overall IPDA capabilities. We propose a scalable neural model architecture with
a shared encoder, a novel attention mechanism that incorporates personalization
information and domain-specific classifiers that solves the problem
efficiently. Our architecture is designed to efficiently accommodate new
domains that appear in-between full model retraining cycles with a rapid
bootstrapping mechanism two orders of magnitude faster than retraining. We
account for practical constraints in real-time production systems, and design
to minimize memory footprint and runtime latency. We demonstrate that
incorporating personalization results in significantly more accurate domain
classification in the setting with thousands of overlapping domains.Comment: Accepted to ACL 201
ConveRT: Efficient and Accurate Conversational Representations from Transformers
General-purpose pretrained sentence encoders such as BERT are not ideal for
real-world conversational AI applications; they are computationally heavy,
slow, and expensive to train. We propose ConveRT (Conversational
Representations from Transformers), a pretraining framework for conversational
tasks satisfying all the following requirements: it is effective, affordable,
and quick to train. We pretrain using a retrieval-based response selection
task, effectively leveraging quantization and subword-level parameterization in
the dual encoder to build a lightweight memory- and energy-efficient model. We
show that ConveRT achieves state-of-the-art performance across widely
established response selection tasks. We also demonstrate that the use of
extended dialog history as context yields further performance gains. Finally,
we show that pretrained representations from the proposed encoder can be
transferred to the intent classification task, yielding strong results across
three diverse data sets. ConveRT trains substantially faster than standard
sentence encoders or previous state-of-the-art dual encoders. With its reduced
size and superior performance, we believe this model promises wider portability
and scalability for Conversational AI applications
Student's t-Generative Adversarial Networks
Generative Adversarial Networks (GANs) have a great performance in image
generation, but they need a large scale of data to train the entire framework,
and often result in nonsensical results. We propose a new method referring to
conditional GAN, which equipments the latent noise with mixture of Student's
t-distribution with attention mechanism in addition to class information.
Student's t-distribution has long tails that can provide more diversity to the
latent noise. Meanwhile, the discriminator in our model implements two tasks
simultaneously, judging whether the images come from the true data
distribution, and identifying the class of each generated images. The
parameters of the mixture model can be learned along with those of GANs.
Moreover, we mathematically prove that any multivariate Student's
t-distribution can be obtained by a linear transformation of a normal
multivariate Student's t-distribution. Experiments comparing the proposed
method with typical GAN, DeliGAN and DCGAN indicate that, our method has a
great performance on generating diverse and legible objects with limited data
MINE: Mutual Information Neural Estimation
We argue that the estimation of mutual information between high dimensional
continuous random variables can be achieved by gradient descent over neural
networks. We present a Mutual Information Neural Estimator (MINE) that is
linearly scalable in dimensionality as well as in sample size, trainable
through back-prop, and strongly consistent. We present a handful of
applications on which MINE can be used to minimize or maximize mutual
information. We apply MINE to improve adversarially trained generative models.
We also use MINE to implement Information Bottleneck, applying it to supervised
classification; our results demonstrate substantial improvement in flexibility
and performance in these settings.Comment: 19 pages, 6 figure
An Introduction to Image Synthesis with Generative Adversarial Nets
There has been a drastic growth of research in Generative Adversarial Nets
(GANs) in the past few years. Proposed in 2014, GAN has been applied to various
applications such as computer vision and natural language processing, and
achieves impressive performance. Among the many applications of GAN, image
synthesis is the most well-studied one, and research in this area has already
demonstrated the great potential of using GAN in image synthesis. In this
paper, we provide a taxonomy of methods used in image synthesis, review
different models for text-to-image synthesis and image-to-image translation,
and discuss some evaluation metrics as well as possible future research
directions in image synthesis with GAN
Generative Creativity: Adversarial Learning for Bionic Design
Bionic design refers to an approach of generative creativity in which a
target object (e.g. a floor lamp) is designed to contain features of biological
source objects (e.g. flowers), resulting in creative biologically-inspired
design. In this work, we attempt to model the process of shape-oriented bionic
design as follows: given an input image of a design target object, the model
generates images that 1) maintain shape features of the input design target
image, 2) contain shape features of images from the specified biological source
domain, 3) are plausible and diverse. We propose DesignGAN, a novel
unsupervised deep generative approach to realising bionic design. Specifically,
we employ a conditional Generative Adversarial Networks architecture with
several designated losses (an adversarial loss, a regression loss, a cycle loss
and a latent loss) that respectively constrict our model to meet the
corresponding aforementioned requirements of bionic design modelling. We
perform qualitative and quantitative experiments to evaluate our method, and
demonstrate that our proposed approach successfully generates creative images
of bionic design
- …