Search CORE

226 research outputs found

A Modern Introduction to Online Learning

Author: Orabona Francesco
Publication venue
Publication date: 29/12/2020
Field of study

In this monograph, I introduce the basic concepts of Online Learning through a modern view of Online Convex Optimization. Here, online learning refers to the framework of regret minimization under worst-case assumptions. I present first-order and second-order algorithms for online learning with convex losses, in Euclidean and non-Euclidean settings. All the algorithms are clearly presented as instantiation of Online Mirror Descent or Follow-The-Regularized-Leader and their variants. Particular attention is given to the issue of tuning the parameters of the algorithms and learning in unbounded domains, through adaptive and parameter-free online learning algorithms. Non-convex losses are dealt through convex surrogate losses and through randomization. The bandit setting is also briefly discussed, touching on the problem of adversarial and stochastic multi-armed bandits. These notes do not require prior knowledge of convex analysis and all the required mathematical tools are rigorously explained. Moreover, all the proofs have been carefully chosen to be as simple and as short as possible.Comment: Fixed more typos, added more history bits, added local norms bounds for OMD and FTR

arXiv.org e-Print Archive

Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning

Author: Orabona Francesco
Publication venue
Publication date: 15/06/2014
Field of study

Stochastic gradient descent algorithms for training linear and kernel predictors are gaining more and more importance, thanks to their scalability. While various methods have been proposed to speed up their convergence, the model selection phase is often ignored. In fact, in theoretical works most of the time assumptions are made, for example, on the prior knowledge of the norm of the optimal solution, while in the practical world validation methods remain the only viable approach. In this paper, we propose a new kernel-based stochastic gradient descent algorithm that performs model selection while training, with no parameters to tune, nor any form of cross-validation. The algorithm builds on recent advancement in online learning theory for unconstrained settings, to estimate over time the right regularization in a data-dependent way. Optimal rates of convergence are proved under standard smoothness assumptions on the target function, using the range space of the fractional integral operator associated with the kernel

arXiv.org e-Print Archive

CiteSeerX

Momentum-based variance reduction in non-convex SGD

Author: Cutkosky Ashok
Orabona Francesco
Publication venue
Publication date: 08/12/2019
Field of study

Variance reduction has emerged in recent years as a strong competitor to stochastic gradient descent in non-convex problems, providing the first algorithms to improve upon the converge rate of stochastic gradient descent for finding first-order critical points. However, variance reduction techniques typically require carefully tuned learning rates and willingness to use excessively large “mega-batches” in order to achieve their improved results. We present a new algorithm, Storm, that does not require any batches and makes use of adaptive learning rates, enabling simpler implementation and less hyperparameter tuning. Our technique for removing the batches uses a variant of momentum to achieve variance reduction in non-convex optimization. On smooth losses F, Storm finds a point x with E[k∇F(x)k] ≤ O(1 /√ T + σ^1/3 /T^1/3) in T iterations with σ^2 variance in the gradients, matching the optimal rate and without requiring knowledge of σ.https://arxiv.org/pdf/1905.10018.pdfPublished versio

Boston University Institutional Repository (OpenBU)

Momentum-Based Variance Reduction in Non-Convex SGD

Author: Cutkosky Ashok
Orabona Francesco
Publication venue
Publication date: 08/12/2019
Field of study

Variance reduction has emerged in recent years as a strong competitor to stochastic gradient descent in non-convex problems, providing the first algorithms to improve upon the converge rate of stochastic gradient descent for finding first-order critical points. However, variance reduction techniques typically require carefully tuned learning rates and willingness to use excessively large "mega-batches" in order to achieve their improved results. We present a new algorithm, STORM, that does not require any batches and makes use of adaptive learning rates, enabling simpler implementation and less hyperparameter tuning. Our technique for removing the batches uses a variant of momentum to achieve variance reduction in non-convex optimization. On smooth losses

F

, STORM finds a point

\boldsymbol{x}

with

\mathbb{E}[\|\nabla F(\boldsymbol{x})\|]\le O(1/\sqrt{T}+\sigma^{1/3}/T^{1/3})

T

iterations with

\sigma^2

variance in the gradients, matching the optimal rate but without requiring knowledge of

\sigma

.Comment: Added Ac

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

Training Deep Networks without Learning Rates Through Coin Betting

Author: Orabona Francesco
Tommasi Tatiana
Publication venue: Neural information processing systems foundation
Publication date: 01/01/2017
Field of study

Deep learning methods achieve state-of-the-art performance in many application scenarios. Yet, these methods require a significant amount of hyperparameters tuning in order to achieve the best results. In particular, tuning the learning rates in the stochastic optimization process is still one of the main bottlenecks. In this paper, we propose a new stochastic gradient descent procedure for deep networks that does not require any learning rate setting. Contrary to previous methods, we do not adapt the learning rates nor we make use of the assumed curvature of the objective function. Instead, we reduce the optimization process to a game of betting on a coin and propose a learning rate free optimal algorithm for this scenario. Theoretical convergence is proven for convex and quasi-convex functions and empirical evidence shows the advantage of our algorithm over popular stochastic gradient algorithms

arXiv.org e-Print Archive

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Archivio della ricerca- Università di Roma La Sapienza

Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations

Author: McMahan H. Brendan
Orabona Francesco
Publication venue
Publication date: 21/05/2014
Field of study

We study algorithms for online linear optimization in Hilbert spaces, focusing on the case where the player is unconstrained. We develop a novel characterization of a large class of minimax algorithms, recovering, and even improving, several previous results as immediate corollaries. Moreover, using our tools, we develop an algorithm that provides a regret bound of

\mathcal{O}\Big(U \sqrt{T \log(U \sqrt{T} \log^2 T +1)}\Big)

, where

U

is the

L_2

norm of an arbitrary comparator and both

T

and

U

are unknown to the player. This bound is optimal up to

\sqrt{\log \log T}

terms. When

T

is known, we derive an algorithm with an optimal regret bound (up to constant factors). For both the known and unknown

T

case, a Normal approximation to the conditional value of the game proves to be the key analysis tool.Comment: Proceedings of the 27th Annual Conference on Learning Theory (COLT 2014

arXiv.org e-Print Archive

CiteSeerX

Parameter-free locally differentially private stochastic subgradient descent

Author: Jun Kwang-Sung
Orabona Francesco
Publication venue
Publication date: 21/11/2019
Field of study

https://arxiv.org/pdf/1911.09564.pdfhttps://arxiv.org/pdf/1911.09564.pdfhttps://arxiv.org/pdf/1911.09564.pdfhttps://arxiv.org/pdf/1911.09564.pdfhttps://arxiv.org/pdf/1911.09564.pdfhttps://arxiv.org/pdf/1911.09564.pdfPublished versio

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)