213 research outputs found
Generalized Gumbel-Softmax Gradient Estimator for Various Discrete Random Variables
Estimating the gradients of stochastic nodes is one of the crucial research
questions in the deep generative modeling community, which enables the gradient
descent optimization on neural network parameters. This estimation problem
becomes further complex when we regard the stochastic nodes to be discrete
because pathwise derivative techniques cannot be applied. Hence, the stochastic
gradient estimation of discrete distributions requires either a score function
method or continuous relaxation of the discrete random variables. This paper
proposes a general version of the Gumbel-Softmax estimator with continuous
relaxation, and this estimator is able to relax the discreteness of probability
distributions including more diverse types, other than categorical and
Bernoulli. In detail, we utilize the truncation of discrete random variables
and the Gumbel-Softmax trick with a linear transformation for the relaxed
reparameterization. The proposed approach enables the relaxed discrete random
variable to be reparameterized and to backpropagated through a large scale
stochastic computational graph. Our experiments consist of (1) synthetic data
analyses, which show the efficacy of our methods; and (2) applications on VAE
and topic model, which demonstrate the value of the proposed estimation in
practices
Reparameterizing the Birkhoff Polytope for Variational Permutation Inference
Many matching, tracking, sorting, and ranking problems require probabilistic
reasoning about possible permutations, a set that grows factorially with
dimension. Combinatorial optimization algorithms may enable efficient point
estimation, but fully Bayesian inference poses a severe challenge in this
high-dimensional, discrete space. To surmount this challenge, we start with the
usual step of relaxing a discrete set (here, of permutation matrices) to its
convex hull, which here is the Birkhoff polytope: the set of all
doubly-stochastic matrices. We then introduce two novel transformations: first,
an invertible and differentiable stick-breaking procedure that maps
unconstrained space to the Birkhoff polytope; second, a map that rounds points
toward the vertices of the polytope. Both transformations include a temperature
parameter that, in the limit, concentrates the densities on permutation
matrices. We then exploit these transformations and reparameterization
gradients to introduce variational inference over permutation matrices, and we
demonstrate its utility in a series of experiments
- …