5,121 research outputs found

    A new regret analysis for Adam-type algorithms

    Get PDF
    International audienceIn this paper, we focus on a theory-practice gap for Adam and its variants (AMSgrad, AdamNC, etc.). In practice, these algorithms are used with a constant first-order moment parameter β 1 (typically between 0.9 and 0.99). In theory, regret guarantees for online convex optimization require a rapidly decaying β 1 → 0 schedule. We show that this is an artifact of the standard analysis, and we propose a novel framework that allows us to derive optimal, data-dependent regret bounds with a constant β 1 , without further assumptions. We also demonstrate the flexibility of our analysis on a wide range of different algorithms and settings

    Variants of RMSProp and Adagrad with Logarithmic Regret Bounds

    Full text link
    Adaptive gradient methods have become recently very popular, in particular as they have been shown to be useful in the training of deep neural networks. In this paper we have analyzed RMSProp, originally proposed for the training of deep neural networks, in the context of online convex optimization and show T\sqrt{T}-type regret bounds. Moreover, we propose two variants SC-Adagrad and SC-RMSProp for which we show logarithmic regret bounds for strongly convex functions. Finally, we demonstrate in the experiments that these new variants outperform other adaptive gradient techniques or stochastic gradient descent in the optimization of strongly convex functions as well as in training of deep neural networks.Comment: ICML 2017, 16 pages, 23 figure

    Variants of RMSProp and Adagrad with Logarithmic Regret Bounds

    Get PDF
    Adaptive gradient methods have become recently very popular, in particular as they have been shown to be useful in the training of deep neural networks. In this paper we have analyzed RMSProp, originally proposed for the training of deep neural networks, in the context of online convex optimization and show T\sqrt{T}-type regret bounds. Moreover, we propose two variants SC-Adagrad and SC-RMSProp for which we show logarithmic regret bounds for strongly convex functions. Finally, we demonstrate in the experiments that these new variants outperform other adaptive gradient techniques or stochastic gradient descent in the optimization of strongly convex functions as well as in training of deep neural networks.Comment: ICML 2017, 16 pages, 23 figure

    Dual Averaging Method for Online Graph-structured Sparsity

    Full text link
    Online learning algorithms update models via one sample per iteration, thus efficient to process large-scale datasets and useful to detect malicious events for social benefits, such as disease outbreak and traffic congestion on the fly. However, existing algorithms for graph-structured models focused on the offline setting and the least square loss, incapable for online setting, while methods designed for online setting cannot be directly applied to the problem of complex (usually non-convex) graph-structured sparsity model. To address these limitations, in this paper we propose a new algorithm for graph-structured sparsity constraint problems under online setting, which we call \textsc{GraphDA}. The key part in \textsc{GraphDA} is to project both averaging gradient (in dual space) and primal variables (in primal space) onto lower dimensional subspaces, thus capturing the graph-structured sparsity effectively. Furthermore, the objective functions assumed here are generally convex so as to handle different losses for online learning settings. To the best of our knowledge, \textsc{GraphDA} is the first online learning algorithm for graph-structure constrained optimization problems. To validate our method, we conduct extensive experiments on both benchmark graph and real-world graph datasets. Our experiment results show that, compared to other baseline methods, \textsc{GraphDA} not only improves classification performance, but also successfully captures graph-structured features more effectively, hence stronger interpretability.Comment: 11 pages, 14 figure
    • …
    corecore