Search CORE

5,121 research outputs found

A new regret analysis for Adam-type algorithms

Author: Alacaoglu Ahmet
Cevher Volkan
Malitsky Yura
Mertikopoulos Panayotis
Publication venue: HAL CCSD
Publication date: 01/01/2020
Field of study

International audienceIn this paper, we focus on a theory-practice gap for Adam and its variants (AMSgrad, AdamNC, etc.). In practice, these algorithms are used with a constant first-order moment parameter β 1 (typically between 0.9 and 0.99). In theory, regret guarantees for online convex optimization require a rapidly decaying β 1 → 0 schedule. We show that this is an artifact of the standard analysis, and we propose a novel framework that allows us to derive optimal, data-dependent regret bounds with a constant β 1 , without further assumptions. We also demonstrate the flexibility of our analysis on a wide range of different algorithms and settings

INRIA a CCSD electronic archive server

Variants of RMSProp and Adagrad with Logarithmic Regret Bounds

Author: Hein Matthias
Mukkamala Mahesh Chandra
Publication venue
Publication date: 01/01/2017
Field of study

Adaptive gradient methods have become recently very popular, in particular as they have been shown to be useful in the training of deep neural networks. In this paper we have analyzed RMSProp, originally proposed for the training of deep neural networks, in the context of online convex optimization and show

\sqrt{T}

-type regret bounds. Moreover, we propose two variants SC-Adagrad and SC-RMSProp for which we show logarithmic regret bounds for strongly convex functions. Finally, we demonstrate in the experiments that these new variants outperform other adaptive gradient techniques or stochastic gradient descent in the optimization of strongly convex functions as well as in training of deep neural networks.Comment: ICML 2017, 16 pages, 23 figure

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

Variants of RMSProp and Adagrad with Logarithmic Regret Bounds

Author: Mukkamala Mahesh Chandra
Hein Matthias
Publication venue
Publication date: 01/01/1960
Field of study

\sqrt{T}

arXiv.org e-Print Archive

Biblioteca Virtual del Patrimonio Bibliográfico (Virtual Library of Bibliographical Heritage)

Dual Averaging Method for Online Graph-structured Sparsity

Author: Bahmani Sohail
Bottou Léon
Chen Feng
Chen Lin
Duchi John
Duchi John
Gao Xiand
Hegde Chinmay
Johnson David S
Kingma Diederik P
Langford John
Qian Jing
Xiao Lin
Zhou Pan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/05/2019
Field of study

Online learning algorithms update models via one sample per iteration, thus efficient to process large-scale datasets and useful to detect malicious events for social benefits, such as disease outbreak and traffic congestion on the fly. However, existing algorithms for graph-structured models focused on the offline setting and the least square loss, incapable for online setting, while methods designed for online setting cannot be directly applied to the problem of complex (usually non-convex) graph-structured sparsity model. To address these limitations, in this paper we propose a new algorithm for graph-structured sparsity constraint problems under online setting, which we call \textsc{GraphDA}. The key part in \textsc{GraphDA} is to project both averaging gradient (in dual space) and primal variables (in primal space) onto lower dimensional subspaces, thus capturing the graph-structured sparsity effectively. Furthermore, the objective functions assumed here are generally convex so as to handle different losses for online learning settings. To the best of our knowledge, \textsc{GraphDA} is the first online learning algorithm for graph-structure constrained optimization problems. To validate our method, we conduct extensive experiments on both benchmark graph and real-world graph datasets. Our experiment results show that, compared to other baseline methods, \textsc{GraphDA} not only improves classification performance, but also successfully captures graph-structured features more effectively, hence stronger interpretability.Comment: 11 pages, 14 figure

arXiv.org e-Print Archive

Crossref

Scipedia