Search CORE

422 research outputs found

Learning with SGD and Random Features

Author: Carratino Luigi
Rosasco Lorenzo
Rudi Alessandro
Publication venue
Publication date: 01/12/2018
Field of study

Sketching and stochastic gradient methods are arguably the most common techniques to derive efficient large scale learning algorithms. In this paper, we investigate their application in the context of nonparametric statistical learning. More precisely, we study the estimator defined by stochastic gradient with mini batches and random features. The latter can be seen as form of nonlinear sketching and used to define approximate kernel methods. The considered estimator is not explicitly penalized/constrained and regularization is implicit. Indeed, our study highlights how different parameters, such as number of features, iterations, step-size and mini-batch size control the learning properties of the solutions. We do this by deriving optimal finite sample bounds, under standard assumptions. The obtained results are corroborated and illustrated by numerical experiments

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Second-Order Kernel Online Convex Optimization with Adaptive Sketching

Author: Calandriello Daniele
Lazaric Alessandro
Valko Michal
Publication venue
Publication date: 01/01/2017
Field of study

Kernel online convex optimization (KOCO) is a framework combining the expressiveness of non-parametric kernel models with the regret guarantees of online learning. First-order KOCO methods such as functional gradient descent require only

\mathcal{O}(t)

time and space per iteration, and, when the only information on the losses is their convexity, achieve a minimax optimal

\mathcal{O}(\sqrt{T})

regret. Nonetheless, many common losses in kernel problems, such as squared loss, logistic loss, and squared hinge loss posses stronger curvature that can be exploited. In this case, second-order KOCO methods achieve

\mathcal{O}(\log(\text{Det}(\boldsymbol{K})))

regret, which we show scales as

\mathcal{O}(d_{\text{eff}}\log T)

, where

d_{\text{eff}}

is the effective dimension of the problem and is usually much smaller than

\mathcal{O}(\sqrt{T})

. The main drawback of second-order methods is their much higher

\mathcal{O}(t^2)

space and time complexity. In this paper, we introduce kernel online Newton step (KONS), a new second-order KOCO method that also achieves

\mathcal{O}(d_{\text{eff}}\log T)

regret. To address the computational complexity of second-order methods, we introduce a new matrix sketching algorithm for the kernel matrix

\boldsymbol{K}_t

, and show that for a chosen parameter

\gamma \leq 1

our Sketched-KONS reduces the space and time complexity by a factor of

\gamma^2

\mathcal{O}(t^2\gamma^2)

space and time per iteration, while incurring only

1/\gamma

times more regret

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret

Author: Calandriello Daniele
Carratino Luigi
Lazaric Alessandro
Rosasco Lorenzo
Valko Michal
Publication venue
Publication date: 01/01/2019
Field of study

Gaussian processes (GP) are a well studied Bayesian approach for the optimization of black-box functions. Despite their effectiveness in simple problems, GP-based algorithms hardly scale to high-dimensional functions, as their per-iteration time and space cost is at least quadratic in the number of dimensions

d

and iterations

t

. Given a set of

A

alternatives to choose from, the overall runtime

O(t^3A)

is prohibitive. In this paper we introduce BKB (budgeted kernelized bandit), a new approximate GP algorithm for optimization under bandit feedback that achieves near-optimal regret (and hence near-optimal convergence rate) with near-constant per-iteration complexity and remarkably no assumption on the input space or covariance of the GP. We combine a kernelized linear bandit algorithm (GP-UCB) with randomized matrix sketching based on leverage score sampling, and we prove that randomly sampling inducing points based on their posterior variance gives an accurate low-rank approximation of the GP, preserving variance estimates and confidence intervals. As a consequence, BKB does not suffer from variance starvation, an important problem faced by many previous sparse GP approximations. Moreover, we show that our procedure selects at most

\tilde{O}(d_{eff})

points, where

d_{eff}

is the effective dimension of the explored space, which is typically much smaller than both

d

and

t

. This greatly reduces the dimensionality of the problem, thus leading to a

O(TAd_{eff}^2)

runtime and

O(A d_{eff})

space complexity.Comment: Accepted at COLT 2019. Corrected typos and improved comparison with existing method

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Reward Imputation with Sketching for Contextual Batched Bandits

Author: Shao Ninglu
Si Zihua
Su Hanjing
Wang Wenhan
Wen Ji-Rong
Xu Jun
Zhang Xiao
Publication venue
Publication date: 07/10/2023
Field of study

Contextual batched bandit (CBB) is a setting where a batch of rewards is observed from the environment at the end of each episode, but the rewards of the non-executed actions are unobserved, resulting in partial-information feedback. Existing approaches for CBB often ignore the rewards of the non-executed actions, leading to underutilization of feedback information. In this paper, we propose an efficient approach called Sketched Policy Updating with Imputed Rewards (SPUIR) that completes the unobserved rewards using sketching, which approximates the full-information feedbacks. We formulate reward imputation as an imputation regularized ridge regression problem that captures the feedback mechanisms of both executed and non-executed actions. To reduce time complexity, we solve the regression problem using randomized sketching. We prove that our approach achieves an instantaneous regret with controllable bias and smaller variance than approaches without reward imputation. Furthermore, our approach enjoys a sublinear regret bound against the optimal policy. We also present two extensions, a rate-scheduled version and a version for nonlinear rewards, making our approach more practical. Experimental results show that SPUIR outperforms state-of-the-art baselines on synthetic, public benchmark, and real-world datasets.Comment: Accepted by NeurIPS 202

arXiv.org e-Print Archive