36 research outputs found
Second-Order Kernel Online Convex Optimization with Adaptive Sketching
Kernel online convex optimization (KOCO) is a framework combining the
expressiveness of non-parametric kernel models with the regret guarantees of
online learning. First-order KOCO methods such as functional gradient descent
require only time and space per iteration, and, when the only
information on the losses is their convexity, achieve a minimax optimal
regret. Nonetheless, many common losses in kernel
problems, such as squared loss, logistic loss, and squared hinge loss posses
stronger curvature that can be exploited. In this case, second-order KOCO
methods achieve regret, which
we show scales as , where
is the effective dimension of the problem and is usually much smaller than
. The main drawback of second-order methods is their
much higher space and time complexity. In this paper, we
introduce kernel online Newton step (KONS), a new second-order KOCO method that
also achieves regret. To address the
computational complexity of second-order methods, we introduce a new matrix
sketching algorithm for the kernel matrix , and show that for
a chosen parameter our Sketched-KONS reduces the space and time
complexity by a factor of to space and
time per iteration, while incurring only times more regret
Sampling from a -DPP without looking at all items
Determinantal point processes (DPPs) are a useful probabilistic model for
selecting a small diverse subset out of a large collection of items, with
applications in summarization, stochastic optimization, active learning and
more. Given a kernel function and a subset size , our goal is to sample
out of items with probability proportional to the determinant of the kernel
matrix induced by the subset (a.k.a. -DPP). Existing -DPP sampling
algorithms require an expensive preprocessing step which involves multiple
passes over all items, making it infeasible for large datasets. A na\"ive
heuristic addressing this problem is to uniformly subsample a fraction of the
data and perform -DPP sampling only on those items, however this method
offers no guarantee that the produced sample will even approximately resemble
the target distribution over the original dataset. In this paper, we develop an
algorithm which adaptively builds a sufficiently large uniform sample of data
that is then used to efficiently generate a smaller set of items, while
ensuring that this set is drawn exactly from the target distribution defined on
all items. We show empirically that our algorithm produces a -DPP sample
after observing only a small fraction of all elements, leading to several
orders of magnitude faster performance compared to the state-of-the-art
Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret
Gaussian processes (GP) are a well studied Bayesian approach for the
optimization of black-box functions. Despite their effectiveness in simple
problems, GP-based algorithms hardly scale to high-dimensional functions, as
their per-iteration time and space cost is at least quadratic in the number of
dimensions and iterations . Given a set of alternatives to choose
from, the overall runtime is prohibitive. In this paper we introduce
BKB (budgeted kernelized bandit), a new approximate GP algorithm for
optimization under bandit feedback that achieves near-optimal regret (and hence
near-optimal convergence rate) with near-constant per-iteration complexity and
remarkably no assumption on the input space or covariance of the GP.
We combine a kernelized linear bandit algorithm (GP-UCB) with randomized
matrix sketching based on leverage score sampling, and we prove that randomly
sampling inducing points based on their posterior variance gives an accurate
low-rank approximation of the GP, preserving variance estimates and confidence
intervals. As a consequence, BKB does not suffer from variance starvation, an
important problem faced by many previous sparse GP approximations. Moreover, we
show that our procedure selects at most points, where
is the effective dimension of the explored space, which is typically
much smaller than both and . This greatly reduces the dimensionality of
the problem, thus leading to a runtime and
space complexity.Comment: Accepted at COLT 2019. Corrected typos and improved comparison with
existing method
Sampling from a k-DPP without looking at all items
International audienceDeterminantal point processes (DPPs) are a useful probabilistic model for selecting a small diverse subset out of a large collection of items, with applications in summarization, stochastic optimization, active learning and more. Given a kernel function and a subset size k, our goal is to sample k out of n items with probability proportional to the determinant of the kernel matrix induced by the subset (a.k.a. k-DPP). Existing k-DPP sampling algorithms require an expensive preprocessing step which involves multiple passes over all n items, making it infeasible for large datasets. A naïve heuristic addressing this problem is to uniformly subsample a fraction of the data and perform k-DPP sampling only on those items, however this method offers no guarantee that the produced sample will even approximately resemble the target distribution over the original dataset. In this paper, we develop α-DPP, an algorithm which adaptively builds a sufficiently large uniform sample of data that is then used to efficiently generate a smaller set of k items, while ensuring that this set is drawn exactly from the target distribution defined on all n items. We show empirically that our algorithm produces a k-DPP sample after observing only a small fraction of all elements, leading to several orders of magnitude faster performance compared to the state-of-the-art. Our implementation of α-DPP is provided at https://github.com/guilgautier/DPPy/
Large-scale semi-supervised learning with online spectral graph sparsification
International audienceWe introduce Sparse-HFS, a scalable algorithm that can compute solutions to SSL problems using only O(n polylog(n)) space and O(m polylog(n)) time
Efficient second-order online kernel learning with adaptive embedding
International audienceOnline kernel learning (OKL) is a flexible framework to approach prediction problems, since the large approximation space provided by reproducing kernel Hilbert spaces can contain an accurate function for the problem. Nonetheless, optimizing over this space is computationally expensive. Not only first order methods accumulate O( sqrt T ) more loss than the optimal function, but the curse of kernelization results in a O(t) per step complexity. Second-order methods get closer to the optimum much faster, suffering only O( log(T)) regret, but second-order updates are even more expensive, with a O(t 2) per-step cost. Existing approximate OKL methods try to reduce this complexity either by limiting the Support Vectors (SV) introduced in the predictor, or by avoiding the kernelization process altogether using embedding. Nonetheless, as long as the size of the approximation space or the number of SV does not grow over time, an adversary can always exploit the approximation process. In this paper, we propose PROS-N-KONS, a method that combines Nystrom sketching to project the input point in a small, accurate embedded space, and performs efficient second-order updates in this space. The embedded space is continuously updated to guarantee that the embedding remains accurate, and we show that the per-step cost only grows with the effective dimension of the problem and not with T . Moreover, the second-order updated allows us to achieve the logarithmic regret. We empirically compare our algorithm on recent large-scales benchmarks and show it performs favorably