Search CORE

arXiv.org e-Print Archive

Sampling from a $k$ -DPP without looking at all items

Author: Calandriello Daniele
Dereziński Michał
Valko Michal
Publication venue
Publication date: 01/01/2020
Field of study

Determinantal point processes (DPPs) are a useful probabilistic model for selecting a small diverse subset out of a large collection of items, with applications in summarization, stochastic optimization, active learning and more. Given a kernel function and a subset size

k

, our goal is to sample

k

out of

n

items with probability proportional to the determinant of the kernel matrix induced by the subset (a.k.a.

k

-DPP). Existing

k

-DPP sampling algorithms require an expensive preprocessing step which involves multiple passes over all

n

items, making it infeasible for large datasets. A na\"ive heuristic addressing this problem is to uniformly subsample a fraction of the data and perform

k

-DPP sampling only on those items, however this method offers no guarantee that the produced sample will even approximately resemble the target distribution over the original dataset. In this paper, we develop an algorithm which adaptively builds a sufficiently large uniform sample of data that is then used to efficiently generate a smaller set of

k

items, while ensuring that this set is drawn exactly from the target distribution defined on all

n

items. We show empirically that our algorithm produces a

k

-DPP sample after observing only a small fraction of all elements, leading to several orders of magnitude faster performance compared to the state-of-the-art

arXiv.org e-Print Archive

Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret

Author: Calandriello Daniele
Carratino Luigi
Lazaric Alessandro
Rosasco Lorenzo
Valko Michal
Publication venue
Publication date: 01/01/2019
Field of study

Gaussian processes (GP) are a well studied Bayesian approach for the optimization of black-box functions. Despite their effectiveness in simple problems, GP-based algorithms hardly scale to high-dimensional functions, as their per-iteration time and space cost is at least quadratic in the number of dimensions

d

and iterations

t

. Given a set of

A

alternatives to choose from, the overall runtime

O(t^3A)

is prohibitive. In this paper we introduce BKB (budgeted kernelized bandit), a new approximate GP algorithm for optimization under bandit feedback that achieves near-optimal regret (and hence near-optimal convergence rate) with near-constant per-iteration complexity and remarkably no assumption on the input space or covariance of the GP. We combine a kernelized linear bandit algorithm (GP-UCB) with randomized matrix sketching based on leverage score sampling, and we prove that randomly sampling inducing points based on their posterior variance gives an accurate low-rank approximation of the GP, preserving variance estimates and confidence intervals. As a consequence, BKB does not suffer from variance starvation, an important problem faced by many previous sparse GP approximations. Moreover, we show that our procedure selects at most

\tilde{O}(d_{eff})

points, where

d_{eff}

is the effective dimension of the explored space, which is typically much smaller than both

d

and

t

. This greatly reduces the dimensionality of the problem, thus leading to a

O(TAd_{eff}^2)

runtime and

O(A d_{eff})

space complexity.Comment: Accepted at COLT 2019. Corrected typos and improved comparison with existing method

HAL-Rennes 1

Sampling from a k-DPP without looking at all items

Author: Calandriello Daniele
Dereziński Michał
Valko Michal
Publication venue: HAL CCSD
Publication date: 01/01/2020
Field of study

International audienceDeterminantal point processes (DPPs) are a useful probabilistic model for selecting a small diverse subset out of a large collection of items, with applications in summarization, stochastic optimization, active learning and more. Given a kernel function and a subset size k, our goal is to sample k out of n items with probability proportional to the determinant of the kernel matrix induced by the subset (a.k.a. k-DPP). Existing k-DPP sampling algorithms require an expensive preprocessing step which involves multiple passes over all n items, making it infeasible for large datasets. A naïve heuristic addressing this problem is to uniformly subsample a fraction of the data and perform k-DPP sampling only on those items, however this method offers no guarantee that the produced sample will even approximately resemble the target distribution over the original dataset. In this paper, we develop α-DPP, an algorithm which adaptively builds a sufficiently large uniform sample of data that is then used to efficiently generate a smaller set of k items, while ensuring that this set is drawn exactly from the target distribution defined on all n items. We show empirically that our algorithm produces a k-DPP sample after observing only a small fraction of all elements, leading to several orders of magnitude faster performance compared to the state-of-the-art. Our implementation of α-DPP is provided at https://github.com/guilgautier/DPPy/

Large-scale semi-supervised learning with online spectral graph sparsification

Author: Calandriello Daniele
Lazaric Alessandro
Valko Michal
Publication venue: HAL CCSD
Publication date: 11/07/2015
Field of study

International audienceWe introduce Sparse-HFS, a scalable algorithm that can compute solutions to SSL problems using only O(n polylog(n)) space and O(m polylog(n)) time

Efficient second-order online kernel learning with adaptive embedding

Author: Calandriello Daniele
Lazaric Alessandro
Valko Michal
Publication venue: HAL CCSD
Publication date: 01/01/2017
Field of study

International audienceOnline kernel learning (OKL) is a flexible framework to approach prediction problems, since the large approximation space provided by reproducing kernel Hilbert spaces can contain an accurate function for the problem. Nonetheless, optimizing over this space is computationally expensive. Not only first order methods accumulate O( sqrt T ) more loss than the optimal function, but the curse of kernelization results in a O(t) per step complexity. Second-order methods get closer to the optimum much faster, suffering only O( log(T)) regret, but second-order updates are even more expensive, with a O(t 2) per-step cost. Existing approximate OKL methods try to reduce this complexity either by limiting the Support Vectors (SV) introduced in the predictor, or by avoiding the kernelization process altogether using embedding. Nonetheless, as long as the size of the approximation space or the number of SV does not grow over time, an adversary can always exploit the approximation process. In this paper, we propose PROS-N-KONS, a method that combines Nystrom sketching to project the input point in a small, accurate embedded space, and performs efficient second-order updates in this space. The embedded space is continuously updated to guarantee that the embedding remains accurate, and we show that the per-step cost only grows with the effective dimension of the problem and not with T . Moreover, the second-order updated allows us to achieve the logarithmic regret. We empirically compare our algorithm on recent large-scales benchmarks and show it performs favorably