    A Polynomial Time MCMC Method for Sampling from Continuous DPPs

    We study the Gibbs sampling algorithm for continuous determinantal point processes. We show that, given a warm start, the Gibbs sampler generates a random sample from a continuous kk-DPP defined on a dd-dimensional domain by only taking poly(k)\text{poly}(k) number of steps. As an application, we design an algorithm to generate random samples from kk-DPPs defined by a spherical Gaussian kernel on a unit sphere in dd-dimensions, Sd1\mathbb{S}^{d-1} in time polynomial in k,dk,d

    Learning, Large Scale Inference, and Temporal Modeling of Determinantal Point Processes

    Determinantal Point Processes (DPPs) are random point processes well-suited for modelling repulsion. In discrete settings, DPPs are a natural model for subset selection problems where diversity is desired. For example, they can be used to select relevant but diverse sets of text or image search results. Among many remarkable properties, they offer tractable algorithms for exact inference, including computing marginals, computing certain conditional probabilities, and sampling. In this thesis, we provide four main contributions that enable DPPs to be used in more general settings. First, we develop algorithms to sample from approximate discrete DPPs in settings where we need to select a diverse subset from a large amount of items. Second, we extend this idea to continuous spaces where we develop approximate algorithms to sample from continuous DPPs, yielding a method to select point configurations that tend to be overly-dispersed. Our third contribution is in developing robust algorithms to learn the parameters of the DPP kernels, which is previously thought to be a difficult, open problem. Finally, we develop a temporal extension for discrete DPPs, where we model sequences of subsets that are not only marginally diverse but also diverse across time

    Approximate Inference for Determinantal Point Processes

    In this thesis we explore a probabilistic model that is well-suited to a variety of subset selection tasks: the determinantal point process (DPP). DPPs were originally developed in the physics community to describe the repulsive interactions of fermions. More recently, they have been applied to machine learning problems such as search diversification and document summarization, which can be cast as subset selection tasks. A challenge, however, is scaling such DPP-based methods to the size of the datasets of interest to this community, and developing approximations for DPP inference tasks whose exact computation is prohibitively expensive. A DPP defines a probability distribution over all subsets of a ground set of items. Consider the inference tasks common to probabilistic models, which include normalizing, marginalizing, conditioning, sampling, estimating the mode, and maximizing likelihood. For DPPs, exactly computing the quantities necessary for the first four of these tasks requires time cubic in the number of items or features of the items. In this thesis, we propose a means of making these four tasks tractable even in the realm where the number of items and the number of features is large. Specifically, we analyze the impact of randomly projecting the features down to a lower-dimensional space and show that the variational distance between the resulting DPP and the original is bounded. In addition to expanding the circumstances in which these first four tasks are tractable, we also tackle the other two tasks, the first of which is known to be NP-hard (with no PTAS) and the second of which is conjectured to be NP-hard. For mode estimation, we build on submodular maximization techniques to develop an algorithm with a multiplicative approximation guarantee. For likelihood maximization, we exploit the generative process associated with DPP sampling to derive an expectation-maximization (EM) algorithm. We experimentally verify the practicality of all the techniques that we develop, testing them on applications such as news and research summarization, political candidate comparison, and product recommendation

    Kernel quadrature with randomly pivoted Cholesky

    This paper presents new quadrature rules for functions in a reproducing kernel Hilbert space using nodes drawn by a sampling algorithm known as randomly pivoted Cholesky. The resulting computational procedure compares favorably to previous kernel quadrature methods, which either achieve low accuracy or require solving a computationally challenging sampling problem. Theoretical and numerical results show that randomly pivoted Cholesky is fast and achieves comparable quadrature error rates to more computationally expensive quadrature schemes based on continuous volume sampling, thinning, and recombination. Randomly pivoted Cholesky is easily adapted to complicated geometries with arbitrary kernels, unlocking new potential for kernel quadrature.Comment: 19 pages, 3 figures; NeurIPS 2023 (spotlight), camera-ready versio

    Kernel interpolation with continuous volume sampling

    A fundamental task in kernel methods is to pick nodes and weights, so as to approximate a given function from an RKHS by the weighted sum of kernel translates located at the nodes. This is the crux of kernel density estimation, kernel quadrature, or interpolation from discrete samples. Furthermore, RKHSs offer a convenient mathematical and computational framework. We introduce and analyse continuous volume sampling (VS), the continuous counterpart -- for choosing node locations -- of a discrete distribution introduced in (Deshpande & Vempala, 2006). Our contribution is theoretical: we prove almost optimal bounds for interpolation and quadrature under VS. While similar bounds already exist for some specific RKHSs using ad-hoc node constructions, VS offers bounds that apply to any Mercer kernel and depend on the spectrum of the associated integration operator. We emphasize that, unlike previous randomized approaches that rely on regularized leverage scores or determinantal point processes, evaluating the pdf of VS only requires pointwise evaluations of the kernel. VS is thus naturally amenable to MCMC samplers

    Optimal Sublinear Sampling of Spanning Trees and Determinantal Point Processes via Average-Case Entropic Independence

    We design fast algorithms for repeatedly sampling from strongly Rayleigh distributions, which include random spanning tree distributions and determinantal point processes. For a graph G=(V,E)G=(V, E), we show how to approximately sample uniformly random spanning trees from GG in O~(V)\widetilde{O}(\lvert V\rvert) time per sample after an initial O~(E)\widetilde{O}(\lvert E\rvert) time preprocessing. For a determinantal point process on subsets of size kk of a ground set of nn elements, we show how to approximately sample in O~(kω)\widetilde{O}(k^\omega) time after an initial O~(nkω1)\widetilde{O}(nk^{\omega-1}) time preprocessing, where ω<2.372864\omega<2.372864 is the matrix multiplication exponent. We even improve the state of the art for obtaining a single sample from determinantal point processes, from the prior runtime of O~(min{nk2,nω})\widetilde{O}(\min\{nk^2, n^\omega\}) to O~(nkω1)\widetilde{O}(nk^{\omega-1}). In our main technical result, we achieve the optimal limit on domain sparsification for strongly Rayleigh distributions. In domain sparsification, sampling from a distribution μ\mu on ([n]k)\binom{[n]}{k} is reduced to sampling from related distributions on ([t]k)\binom{[t]}{k} for tnt\ll n. We show that for strongly Rayleigh distributions, we can can achieve the optimal t=O~(k)t=\widetilde{O}(k). Our reduction involves sampling from O~(1)\widetilde{O}(1) domain-sparsified distributions, all of which can be produced efficiently assuming convenient access to approximate overestimates for marginals of μ\mu. Having access to marginals is analogous to having access to the mean and covariance of a continuous distribution, or knowing "isotropy" for the distribution, the key assumption behind the Kannan-Lov\'asz-Simonovits (KLS) conjecture and optimal samplers based on it. We view our result as a moral analog of the KLS conjecture and its consequences for sampling, for discrete strongly Rayleigh measures