11 research outputs found
Reparameterizing the Birkhoff Polytope for Variational Permutation Inference
Many matching, tracking, sorting, and ranking problems require probabilistic
reasoning about possible permutations, a set that grows factorially with
dimension. Combinatorial optimization algorithms may enable efficient point
estimation, but fully Bayesian inference poses a severe challenge in this
high-dimensional, discrete space. To surmount this challenge, we start with the
usual step of relaxing a discrete set (here, of permutation matrices) to its
convex hull, which here is the Birkhoff polytope: the set of all
doubly-stochastic matrices. We then introduce two novel transformations: first,
an invertible and differentiable stick-breaking procedure that maps
unconstrained space to the Birkhoff polytope; second, a map that rounds points
toward the vertices of the polytope. Both transformations include a temperature
parameter that, in the limit, concentrates the densities on permutation
matrices. We then exploit these transformations and reparameterization
gradients to introduce variational inference over permutation matrices, and we
demonstrate its utility in a series of experiments
Sampling and Inference for Beta Neutral-to-the-Left Models of Sparse Networks
Empirical evidence suggests that heavy-tailed degree distributions occurring
in many real networks are well-approximated by power laws with exponents
that may take values either less than and greater than two. Models based on
various forms of exchangeability are able to capture power laws with , and admit tractable inference algorithms; we draw on previous results to
show that cannot be generated by the forms of exchangeability used
in existing random graph models. Preferential attachment models generate power
law exponents greater than two, but have been of limited use as statistical
models due to the inherent difficulty of performing inference in
non-exchangeable models. Motivated by this gap, we design and implement
inference algorithms for a recently proposed class of models that generates
of all possible values. We show that although they are not exchangeable,
these models have probabilistic structure amenable to inference. Our methods
make a large class of previously intractable models useful for statistical
inference.Comment: Accepted for publication in the proceedings of Conference on
Uncertainty in Artificial Intelligence (UAI) 201
Invertible Gaussian Reparameterization: Revisiting the Gumbel-Softmax
The Gumbel-Softmax is a continuous distribution over the simplex that is
often used as a relaxation of discrete distributions. Because it can be readily
interpreted and easily reparameterized, it enjoys widespread use. We propose a
conceptually simpler and more flexible alternative family of reparameterizable
distributions where Gaussian noise is transformed into a one-hot approximation
through an invertible function. This invertible function is composed of a
modified softmax and can incorporate diverse transformations that serve
different specific purposes. For example, the stick-breaking procedure allows
us to extend the reparameterization trick to distributions with countably
infinite support, or normalizing flows let us increase the flexibility of the
distribution. Our construction enjoys theoretical advantages over the
Gumbel-Softmax, such as closed form KL, and significantly outperforms it in a
variety of experiments
Learning Latent Permutations with Gumbel-Sinkhorn Networks
Permutations and matchings are core building blocks in a variety of latent
variable models, as they allow us to align, canonicalize, and sort data.
Learning in such models is difficult, however, because exact marginalization
over these combinatorial objects is intractable. In response, this paper
introduces a collection of new methods for end-to-end learning in such models
that approximate discrete maximum-weight matching using the continuous Sinkhorn
operator. Sinkhorn iteration is attractive because it functions as a simple,
easy-to-implement analog of the softmax operator. With this, we can define the
Gumbel-Sinkhorn method, an extension of the Gumbel-Softmax method (Jang et al.
2016, Maddison2016 et al. 2016) to distributions over latent matchings. We
demonstrate the effectiveness of our method by outperforming competitive
baselines on a range of qualitatively different tasks: sorting numbers, solving
jigsaw puzzles, and identifying neural signals in worms