124 research outputs found
On distinguishability criteria for estimating generative models
Two recently introduced criteria for estimation of generative models are both
based on a reduction to binary classification. Noise-contrastive estimation
(NCE) is an estimation procedure in which a generative model is trained to be
able to distinguish data samples from noise samples. Generative adversarial
networks (GANs) are pairs of generator and discriminator networks, with the
generator network learning to generate samples by attempting to fool the
discriminator network into believing its samples are real data. Both estimation
procedures use the same function to drive learning, which naturally raises
questions about how they are related to each other, as well as whether this
function is related to maximum likelihood estimation (MLE). NCE corresponds to
training an internal data model belonging to the {\em discriminator} network
but using a fixed generator network. We show that a variant of NCE, with a
dynamic generator network, is equivalent to maximum likelihood estimation.
Since pairing a learned discriminator with an appropriate dynamically selected
generator recovers MLE, one might expect the reverse to hold for pairing a
learned generator with a certain discriminator. However, we show that
recovering MLE for a learned generator requires departing from the
distinguishability game. Specifically:
(i) The expected gradient of the NCE discriminator can be made to match the
expected gradient of
MLE, if one is allowed to use a non-stationary noise distribution for NCE,
(ii) No choice of discriminator network can make the expected gradient for
the GAN generator match that of MLE, and
(iii) The existing theory does not guarantee that GANs will converge in the
non-convex case.
This suggests that the key next step in GAN research is to determine whether
GANs converge, and if not, to modify their training algorithm to force
convergence.Comment: This version adds a figure that appeared on the poster at ICLR,
changes the template to say that the paper was accepted as a workshop
contribution (previously it was under a review as a conference submission),
and fixes some typo
Skill Rating for Generative Models
We explore a new way to evaluate generative models using insights from
evaluation of competitive games between human players. We show experimentally
that tournaments between generators and discriminators provide an effective way
to evaluate generative models. We introduce two methods for summarizing
tournament outcomes: tournament win rate and skill rating. Evaluations are
useful in different contexts, including monitoring the progress of a single
model as it learns during the training process, and comparing the capabilities
of two different fully trained models. We show that a tournament consisting of
a single model playing against past and future versions of itself produces a
useful measure of training progress. A tournament containing multiple separate
models (using different seeds, hyperparameters, and architectures) provides a
useful relative comparison between different trained GANs. Tournament-based
rating methods are conceptually distinct from numerous previous categories of
approaches to evaluation of generative models, and have complementary
advantages and disadvantages
Super-Resolution via Conditional Implicit Maximum Likelihood Estimation
Single-image super-resolution (SISR) is a canonical problem with diverse
applications. Leading methods like SRGAN produce images that contain various
artifacts, such as high-frequency noise, hallucinated colours and shape
distortions, which adversely affect the realism of the result. In this paper,
we propose an alternative approach based on an extension of the method of
Implicit Maximum Likelihood Estimation (IMLE). We demonstrate greater
effectiveness at noise reduction and preservation of the original colours and
shapes, yielding more realistic super-resolved images.Comment: 12 pages, 7 figure
Softmax GAN
Softmax GAN is a novel variant of Generative Adversarial Network (GAN). The
key idea of Softmax GAN is to replace the classification loss in the original
GAN with a softmax cross-entropy loss in the sample space of one single batch.
In the adversarial learning of real training samples and generated
samples, the target of discriminator training is to distribute all the
probability mass to the real samples, each with probability , and
distribute zero probability to generated data. In the generator training phase,
the target is to assign equal probability to all data points in the batch, each
with probability . While the original GAN is closely related to
Noise Contrastive Estimation (NCE), we show that Softmax GAN is the Importance
Sampling version of GAN. We futher demonstrate with experiments that this
simple change stabilizes GAN training.Comment: NIPS 2017 submissio
Pattern classification via unsupervised learners
We consider classification problems in a variant of the Probably Approximately Correct (PAC)-learning framework, in which an unsupervised learner creates a discriminant function over each class and observations are labeled by the learner returning the highest value associated with that observation. Consideration is given to whether this approach gains significant advantage over traditional discriminant techniques.
It is shown that PAC-learning distributions over class labels under Ll distance or KL-divergence implies PAC classification in this framework. We give bounds on the regret associated with the resulting classifier, taking into account the possibility of variable misclassification penalties. We demonstrate the advantage of estimating the a posteriori probability distributions over class labels in the setting of Optical Character Recognition.
We show that unsupervised learners can be used to learn a class of probabilistic concepts (stochastic rules denoting the probability that an observation has a positive label in a 2-class setting). This demonstrates a situation where unsupervised learners can be used even when it is hard to learn distributions over class labels - in this case the discriminant functions do not estimate the class probability densities.
We use a standard state-merging technique to PAC-learn a class of probabilistic automata and show that by learning the distribution over outputs under the weaker L1 distance rather than KL-divergence we are able to learn without knowledge of the expected length of an output. It is also shown that for a restricted class of these automata learning under L1 distance is equivalent to learning under KL-divergence
Learning Determinantal Point Processes by Corrective Negative Sampling
Determinantal Point Processes (DPPs) have attracted significant interest from
the machine-learning community due to their ability to elegantly and tractably
model the delicate balance between quality and diversity of sets. DPPs are
commonly learned from data using maximum likelihood estimation (MLE). While
fitting observed sets well, MLE for DPPs may also assign high likelihoods to
unobserved sets that are far from the true generative distribution of the data.
To address this issue, which reduces the quality of the learned model, we
introduce a novel optimization problem, Contrastive Estimation (CE), which
encodes information about "negative" samples into the basic learning model. CE
is grounded in the successful use of negative information in machine-vision and
language modeling. Depending on the chosen negative distribution (which may be
static or evolve during optimization), CE assumes two different forms, which we
analyze theoretically and experimentally. We evaluate our new model on
real-world datasets; on a challenging dataset, CE learning delivers a
considerable improvement in predictive performance over a DPP learned without
using contrastive information.Comment: Will appear in AISTATS 201
Improving Maximum Likelihood Training for Text Generation with Density Ratio Estimation
Auto-regressive sequence generative models trained by Maximum Likelihood
Estimation suffer the exposure bias problem in practical finite sample
scenarios. The crux is that the number of training samples for Maximum
Likelihood Estimation is usually limited and the input data distributions are
different at training and inference stages. Many method shave been proposed to
solve the above problem (Yu et al., 2017; Lu et al., 2018), which relies on
sampling from the non-stationary model distribution and suffers from high
variance or biased estimations. In this paper, we propose{\psi}-MLE, a new
training scheme for auto-regressive sequence generative models, which is
effective and stable when operating at large sample space encountered in text
generation. We derive our algorithm from a new perspective of self-augmentation
and introduce bias correction with density ratio estimation. Extensive
experimental results on synthetic data and real-world text generation tasks
demonstrate that our method stably outperforms Maximum Likelihood Estimation
and other state-of-the-art sequence generative models in terms of both quality
and diversity.Comment: Accepted to International Conference on Artificial Intelligence and
Statistics 202
On the Implicit Assumptions of GANs
Generative adversarial nets (GANs) have generated a lot of excitement.
Despite their popularity, they exhibit a number of well-documented issues in
practice, which apparently contradict theoretical guarantees. A number of
enlightening papers have pointed out that these issues arise from unjustified
assumptions that are commonly made, but the message seems to have been lost
amid the optimism of recent years. We believe the identified problems deserve
more attention, and highlight the implications on both the properties of GANs
and the trajectory of research on probabilistic models. We recently proposed an
alternative method that sidesteps these problems.Comment: 8 page
Ligand-receptor promiscuity enables cellular addressing
In multicellular organisms, secreted ligands selectively activate, or "address," specific target cell populations to control cell fate decision-making and other processes. Key cell-cell communication pathways use multiple promiscuously interacting ligands and receptors, provoking the question of how addressing specificity can emerge from molecular promiscuity. To investigate this issue, we developed a general mathematical modeling framework based on the bone morphogenetic protein (BMP) pathway architecture. We find that promiscuously interacting ligand-receptor systems allow a small number of ligands, acting in combinations, to address a larger number of individual cell types, each defined by its receptor expression profile. Promiscuous systems outperform seemingly more specific one-to-one signaling architectures in addressing capacity. Combinatorial addressing extends to groups of cell types, is robust to receptor expression noise, grows more powerful with increasing receptor multiplicity, and is maximized by specific biochemical parameter relationships. Together, these results identify fundamental design principles governing cell addressing by ligand combinations
Evaluating a Generative Adversarial Framework for Information Retrieval
Recent advances in Generative Adversarial Networks (GANs) have resulted in
its widespread applications to multiple domains. A recent model, IRGAN, applies
this framework to Information Retrieval (IR) and has gained significant
attention over the last few years. In this focused work, we critically analyze
multiple components of IRGAN, while providing experimental and theoretical
evidence of some of its shortcomings. Specifically, we identify issues with the
constant baseline term in the policy gradients optimization and show that the
generator harms IRGAN's performance. Motivated by our findings, we propose two
models influenced by self-contrastive estimation and co-training which
outperform IRGAN on two out of the three tasks considered
- …