17 research outputs found
Reparameterizing the Birkhoff Polytope for Variational Permutation Inference
Many matching, tracking, sorting, and ranking problems require probabilistic
reasoning about possible permutations, a set that grows factorially with
dimension. Combinatorial optimization algorithms may enable efficient point
estimation, but fully Bayesian inference poses a severe challenge in this
high-dimensional, discrete space. To surmount this challenge, we start with the
usual step of relaxing a discrete set (here, of permutation matrices) to its
convex hull, which here is the Birkhoff polytope: the set of all
doubly-stochastic matrices. We then introduce two novel transformations: first,
an invertible and differentiable stick-breaking procedure that maps
unconstrained space to the Birkhoff polytope; second, a map that rounds points
toward the vertices of the polytope. Both transformations include a temperature
parameter that, in the limit, concentrates the densities on permutation
matrices. We then exploit these transformations and reparameterization
gradients to introduce variational inference over permutation matrices, and we
demonstrate its utility in a series of experiments
Recommended from our members
Statistical Machine Learning Methods for the Large Scale Analysis of Neural Data
Modern neurotechnologies enable the recording of neural activity at the scale of entire brains and with single-cell resolution. However, the lack of principled approaches to extract structure from these massive data streams prevent us from fully exploiting the potential of these technologies. This thesis, divided in three parts, introduces new statistical machine learning methods to enable the large-scale analysis of some of these complex neural datasets. In the first part, I present a method that leverages Gaussian quadrature to accelerate inference of neural encoding models from a certain type of observed neural point processes --- spike trains --- resulting in substantial improvements over existing methods.
The second part focuses on the simultaneous electrical stimulation and recording of neurons using large electrode arrays. There, identification of neural activity is hindered by stimulation artifacts that are much larger than spikes, and overlap temporally with spikes. To surmount this challenge, I develop an algorithm to infer and cancel this artifact, enabling inference of the neural signal of interest. This algorithm is based on a a bayesian generative model for recordings, where a structured gaussian process is used to represent prior knowledge of the artifact. The algorithm achieves near perfect accuracy and enables the analysis of data hundreds of time faster than previous approaches.
The third part is motivated by the problem of inference of neural dynamics in the worm C.elegans: when taking a data-driven approach to this question, e.g., when using whole-brain calcium imaging data, one is faced with the need to match neural recordings to canonical neural identities, in practice resolved by tedious human labor. Alternatively, on a bayesian setup this problem may be cast as posterior inference of a latent permutation. I introduce methods that enable gradient-based approximate posterior inference of permutations, overcoming the difficulties imposed by the combinatorial and discrete nature of this object. Results suggest the feasibility of automating neural identification, and demonstrate variational inference in permutations is a sensible alternative to MCMC
Learning Latent Permutations with Gumbel-Sinkhorn Networks
Permutations and matchings are core building blocks in a variety of latent
variable models, as they allow us to align, canonicalize, and sort data.
Learning in such models is difficult, however, because exact marginalization
over these combinatorial objects is intractable. In response, this paper
introduces a collection of new methods for end-to-end learning in such models
that approximate discrete maximum-weight matching using the continuous Sinkhorn
operator. Sinkhorn iteration is attractive because it functions as a simple,
easy-to-implement analog of the softmax operator. With this, we can define the
Gumbel-Sinkhorn method, an extension of the Gumbel-Softmax method (Jang et al.
2016, Maddison2016 et al. 2016) to distributions over latent matchings. We
demonstrate the effectiveness of our method by outperforming competitive
baselines on a range of qualitatively different tasks: sorting numbers, solving
jigsaw puzzles, and identifying neural signals in worms
Modeling Orders of User Behaviors via Differentiable Sorting: A Multi-task Framework to Predicting User Post-click Conversion
User post-click conversion prediction is of high interest to researchers and
developers. Recent studies employ multi-task learning to tackle the selection
bias and data sparsity problem, two severe challenges in post-click behavior
prediction, by incorporating click data. However, prior works mainly focused on
pointwise learning and the orders of labels (i.e., click and post-click) are
not well explored, which naturally poses a listwise learning problem. Inspired
by recent advances on differentiable sorting, in this paper, we propose a novel
multi-task framework that leverages orders of user behaviors to predict user
post-click conversion in an end-to-end approach. Specifically, we define an
aggregation operator to combine predicted outputs of different tasks to a
unified score, then we use the computed scores to model the label relations via
differentiable sorting. Extensive experiments on public and industrial datasets
show the superiority of our proposed model against competitive baselines.Comment: The paper is accepted as a short research paper by SIGIR 202
Low-variance black-box gradient estimates for the Plackett-Luce distribution
Learning models with discrete latent variables using stochastic gradient descent remains a challenge due to the high variance of gradient estimates. Modern variance reduction techniques mostly consider categorical distributions and have limited applicability when the number of possible outcomes becomes large. In this work, we consider models with latent permutations and propose control variates for the Plackett-Luce distribution. In particular, the control variates allow us to optimize black-box functions over permutations using stochastic gradient descent. To illustrate the approach, we consider a variety of causal structure learning tasks for continuous and discrete data. We show that our method outperforms competitive relaxation-based optimization methods and is also applicable to non-differentiable score functions
Sampling and Inference for Beta Neutral-to-the-Left Models of Sparse Networks
Empirical evidence suggests that heavy-tailed degree distributions occurring
in many real networks are well-approximated by power laws with exponents
that may take values either less than and greater than two. Models based on
various forms of exchangeability are able to capture power laws with , and admit tractable inference algorithms; we draw on previous results to
show that cannot be generated by the forms of exchangeability used
in existing random graph models. Preferential attachment models generate power
law exponents greater than two, but have been of limited use as statistical
models due to the inherent difficulty of performing inference in
non-exchangeable models. Motivated by this gap, we design and implement
inference algorithms for a recently proposed class of models that generates
of all possible values. We show that although they are not exchangeable,
these models have probabilistic structure amenable to inference. Our methods
make a large class of previously intractable models useful for statistical
inference.Comment: Accepted for publication in the proceedings of Conference on
Uncertainty in Artificial Intelligence (UAI) 201
Mirror Sinkhorn: Fast Online Optimization on Transport Polytopes
Optimal transport is an important tool in machine learning, allowing to
capture geometric properties of the data through a linear program on transport
polytopes. We present a single-loop optimization algorithm for minimizing
general convex objectives on these domains, utilizing the principles of
Sinkhorn matrix scaling and mirror descent. The proposed algorithm is robust to
noise, and can be used in an online setting. We provide theoretical guarantees
for convex objectives and experimental results showcasing it effectiveness on
both synthetic and real-world data.Comment: ICML 202