17 research outputs found
Generalized Gumbel-Softmax Gradient Estimator for Various Discrete Random Variables
Estimating the gradients of stochastic nodes is one of the crucial research
questions in the deep generative modeling community, which enables the gradient
descent optimization on neural network parameters. This estimation problem
becomes further complex when we regard the stochastic nodes to be discrete
because pathwise derivative techniques cannot be applied. Hence, the stochastic
gradient estimation of discrete distributions requires either a score function
method or continuous relaxation of the discrete random variables. This paper
proposes a general version of the Gumbel-Softmax estimator with continuous
relaxation, and this estimator is able to relax the discreteness of probability
distributions including more diverse types, other than categorical and
Bernoulli. In detail, we utilize the truncation of discrete random variables
and the Gumbel-Softmax trick with a linear transformation for the relaxed
reparameterization. The proposed approach enables the relaxed discrete random
variable to be reparameterized and to backpropagated through a large scale
stochastic computational graph. Our experiments consist of (1) synthetic data
analyses, which show the efficacy of our methods; and (2) applications on VAE
and topic model, which demonstrate the value of the proposed estimation in
practices
Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions
Combining discrete probability distributions and combinatorial optimization
problems with neural network components has numerous applications but poses
several challenges. We propose Implicit Maximum Likelihood Estimation (I-MLE),
a framework for end-to-end learning of models combining discrete exponential
family distributions and differentiable neural components. I-MLE is widely
applicable as it only requires the ability to compute the most probable states
and does not rely on smooth relaxations. The framework encompasses several
approaches such as perturbation-based implicit differentiation and recent
methods to differentiate through black-box combinatorial solvers. We introduce
a novel class of noise distributions for approximating marginals via
perturb-and-MAP. Moreover, we show that I-MLE simplifies to maximum likelihood
estimation when used in some recently studied learning settings that involve
combinatorial solvers. Experiments on several datasets suggest that I-MLE is
competitive with and often outperforms existing approaches which rely on
problem-specific relaxations.Comment: NeurIPS 2021 camera-ready; repo:
https://github.com/nec-research/tf-iml