41 research outputs found
A Polynomial Time MCMC Method for Sampling from Continuous DPPs
We study the Gibbs sampling algorithm for continuous determinantal point
processes. We show that, given a warm start, the Gibbs sampler generates a
random sample from a continuous -DPP defined on a -dimensional domain by
only taking number of steps. As an application, we design an
algorithm to generate random samples from -DPPs defined by a spherical
Gaussian kernel on a unit sphere in -dimensions, in time
polynomial in
Learning, Large Scale Inference, and Temporal Modeling of Determinantal Point Processes
Determinantal Point Processes (DPPs) are random point processes well-suited for modelling repulsion. In discrete settings, DPPs are a natural model for subset selection problems where diversity is desired. For example, they can be used to select relevant but diverse sets of text or image search results. Among many remarkable properties, they offer tractable algorithms for exact inference, including computing marginals, computing certain conditional probabilities, and sampling.
In this thesis, we provide four main contributions that enable DPPs to be used in more general settings. First, we develop algorithms to sample from approximate discrete DPPs in settings where we need to select a diverse subset from a large amount of items.
Second, we extend this idea to continuous spaces where we develop approximate algorithms to sample from continuous DPPs, yielding a method to select point configurations that tend to be overly-dispersed.
Our third contribution is in developing robust algorithms to learn the parameters of the DPP kernels, which is previously thought to be a difficult, open problem.
Finally, we develop a temporal extension for discrete DPPs, where we model sequences of subsets that are not only marginally diverse but also diverse across time
Approximate Inference for Determinantal Point Processes
In this thesis we explore a probabilistic model that is well-suited to a variety of subset selection tasks: the determinantal point process (DPP). DPPs were originally developed in the physics community to describe the repulsive interactions of fermions. More recently, they have been applied to machine learning problems such as search diversification and document summarization, which can be cast as subset selection tasks. A challenge, however, is scaling such DPP-based methods to the size of the datasets of interest to this community, and developing approximations for DPP inference tasks whose exact computation is prohibitively expensive.
A DPP defines a probability distribution over all subsets of a ground set of items. Consider the inference tasks common to probabilistic models, which include normalizing, marginalizing, conditioning, sampling, estimating the mode, and maximizing likelihood. For DPPs, exactly computing the quantities necessary for the first four of these tasks requires time cubic in the number of items or features of the items. In this thesis, we propose a means of making these four tasks tractable even in the realm where the number of items and the number of features is large. Specifically, we analyze the impact of randomly projecting the features down to a lower-dimensional space and show that the variational distance between the resulting DPP and the original is bounded. In addition to expanding the circumstances in which these first four tasks are tractable, we also tackle the other two tasks, the first of which is known to be NP-hard (with no PTAS) and the second of which is conjectured to be NP-hard. For mode estimation, we build on submodular maximization techniques to develop an algorithm with a multiplicative approximation guarantee. For likelihood maximization, we exploit the generative process associated with DPP sampling to derive an expectation-maximization (EM) algorithm. We experimentally verify the practicality of all the techniques that we develop, testing them on applications such as news and research summarization, political candidate comparison, and product recommendation
Kernel quadrature with randomly pivoted Cholesky
This paper presents new quadrature rules for functions in a reproducing
kernel Hilbert space using nodes drawn by a sampling algorithm known as
randomly pivoted Cholesky. The resulting computational procedure compares
favorably to previous kernel quadrature methods, which either achieve low
accuracy or require solving a computationally challenging sampling problem.
Theoretical and numerical results show that randomly pivoted Cholesky is fast
and achieves comparable quadrature error rates to more computationally
expensive quadrature schemes based on continuous volume sampling, thinning, and
recombination. Randomly pivoted Cholesky is easily adapted to complicated
geometries with arbitrary kernels, unlocking new potential for kernel
quadrature.Comment: 19 pages, 3 figures; NeurIPS 2023 (spotlight), camera-ready versio
Kernel interpolation with continuous volume sampling
A fundamental task in kernel methods is to pick nodes and weights, so as to
approximate a given function from an RKHS by the weighted sum of kernel
translates located at the nodes. This is the crux of kernel density estimation,
kernel quadrature, or interpolation from discrete samples. Furthermore, RKHSs
offer a convenient mathematical and computational framework. We introduce and
analyse continuous volume sampling (VS), the continuous counterpart -- for
choosing node locations -- of a discrete distribution introduced in (Deshpande
& Vempala, 2006). Our contribution is theoretical: we prove almost optimal
bounds for interpolation and quadrature under VS. While similar bounds already
exist for some specific RKHSs using ad-hoc node constructions, VS offers bounds
that apply to any Mercer kernel and depend on the spectrum of the associated
integration operator. We emphasize that, unlike previous randomized approaches
that rely on regularized leverage scores or determinantal point processes,
evaluating the pdf of VS only requires pointwise evaluations of the kernel. VS
is thus naturally amenable to MCMC samplers
Optimal Sublinear Sampling of Spanning Trees and Determinantal Point Processes via Average-Case Entropic Independence
We design fast algorithms for repeatedly sampling from strongly Rayleigh
distributions, which include random spanning tree distributions and
determinantal point processes. For a graph , we show how to
approximately sample uniformly random spanning trees from in
time per sample after an initial
time preprocessing. For a determinantal point
process on subsets of size of a ground set of elements, we show how to
approximately sample in time after an initial
time preprocessing, where is
the matrix multiplication exponent. We even improve the state of the art for
obtaining a single sample from determinantal point processes, from the prior
runtime of to
.
In our main technical result, we achieve the optimal limit on domain
sparsification for strongly Rayleigh distributions. In domain sparsification,
sampling from a distribution on is reduced to sampling
from related distributions on for . We show that for
strongly Rayleigh distributions, we can can achieve the optimal
. Our reduction involves sampling from
domain-sparsified distributions, all of which can be produced efficiently
assuming convenient access to approximate overestimates for marginals of .
Having access to marginals is analogous to having access to the mean and
covariance of a continuous distribution, or knowing "isotropy" for the
distribution, the key assumption behind the Kannan-Lov\'asz-Simonovits (KLS)
conjecture and optimal samplers based on it. We view our result as a moral
analog of the KLS conjecture and its consequences for sampling, for discrete
strongly Rayleigh measures