Search CORE

28,021 research outputs found

Second-Order Kernel Online Convex Optimization with Adaptive Sketching

Author: Calandriello Daniele
Lazaric Alessandro
Valko Michal
Publication venue
Publication date: 01/01/2017
Field of study

Kernel online convex optimization (KOCO) is a framework combining the expressiveness of non-parametric kernel models with the regret guarantees of online learning. First-order KOCO methods such as functional gradient descent require only

\mathcal{O}(t)

time and space per iteration, and, when the only information on the losses is their convexity, achieve a minimax optimal

\mathcal{O}(\sqrt{T})

regret. Nonetheless, many common losses in kernel problems, such as squared loss, logistic loss, and squared hinge loss posses stronger curvature that can be exploited. In this case, second-order KOCO methods achieve

\mathcal{O}(\log(\text{Det}(\boldsymbol{K})))

regret, which we show scales as

\mathcal{O}(d_{\text{eff}}\log T)

, where

d_{\text{eff}}

is the effective dimension of the problem and is usually much smaller than

\mathcal{O}(\sqrt{T})

. The main drawback of second-order methods is their much higher

\mathcal{O}(t^2)

space and time complexity. In this paper, we introduce kernel online Newton step (KONS), a new second-order KOCO method that also achieves

\mathcal{O}(d_{\text{eff}}\log T)

regret. To address the computational complexity of second-order methods, we introduce a new matrix sketching algorithm for the kernel matrix

\boldsymbol{K}_t

, and show that for a chosen parameter

\gamma \leq 1

our Sketched-KONS reduces the space and time complexity by a factor of

\gamma^2

\mathcal{O}(t^2\gamma^2)

space and time per iteration, while incurring only

1/\gamma

times more regret

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Online Learning of Noisy Data with Kernels

Author: Cesa-Bianchi Nicolò
Shalev-Shwartz Shai
Shamir Ohad
Publication venue
Publication date: 01/01/2010
Field of study

We study online learning when individual instances are corrupted by adversarially chosen random noise. We assume the noise distribution is unknown, and may change over time with no restriction other than having zero mean and bounded variance. Our technique relies on a family of unbiased estimators for non-linear functions, which may be of independent interest. We show that a variant of online gradient descent can learn functions in any dot-product (e.g., polynomial) or Gaussian kernel space with any analytic convex loss function. Our variant uses randomized estimates that need to query a random number of noisy copies of each instance, where with high probability this number is upper bounded by a constant. Allowing such multiple queries cannot be avoided: Indeed, we show that online learning is in general impossible when only one noisy copy of each instance can be accessed.Comment: This is a full version of the paper appearing in the 23rd International Conference on Learning Theory (COLT 2010

arXiv.org e-Print Archive

CiteSeerX

AIR Universita degli studi di Milano

Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning

Author: Orabona Francesco
Publication venue
Publication date: 15/06/2014
Field of study

Stochastic gradient descent algorithms for training linear and kernel predictors are gaining more and more importance, thanks to their scalability. While various methods have been proposed to speed up their convergence, the model selection phase is often ignored. In fact, in theoretical works most of the time assumptions are made, for example, on the prior knowledge of the norm of the optimal solution, while in the practical world validation methods remain the only viable approach. In this paper, we propose a new kernel-based stochastic gradient descent algorithm that performs model selection while training, with no parameters to tune, nor any form of cross-validation. The algorithm builds on recent advancement in online learning theory for unconstrained settings, to estimate over time the right regularization in a data-dependent way. Optimal rates of convergence are proved under standard smoothness assumptions on the target function, using the range space of the fractional integral operator associated with the kernel

arXiv.org e-Print Archive

CiteSeerX

Multiclass Learning with Simplex Coding

Author: Mroueh Youssef
Poggio Tomaso
Rosasco Lorenzo
Slotine Jean-Jacques
Publication venue
Publication date: 01/01/2012
Field of study

In this paper we discuss a novel framework for multiclass learning, defined by a suitable coding/decoding strategy, namely the simplex coding, that allows to generalize to multiple classes a relaxation approach commonly used in binary classification. In this framework, a relaxation error analysis can be developed avoiding constraints on the considered hypotheses class. Moreover, we show that in this setting it is possible to derive the first provably consistent regularized method with training/tuning complexity which is independent to the number of classes. Tools from convex analysis are introduced that can be used beyond the scope of this paper

arXiv.org e-Print Archive

DSpace@MIT

Archivio istituzionale della ricerca - Università di Genova