Search CORE

15,446 research outputs found

Minimizing Finite Sums with the Stochastic Average Gradient

Author: Bach Francis
Roux Nicolas Le
Schmidt Mark
Publication venue
Publication date: 10/05/2016
Field of study

We propose the stochastic average gradient (SAG) method for optimizing the sum of a finite number of smooth convex functions. Like stochastic gradient (SG) methods, the SAG method's iteration cost is independent of the number of terms in the sum. However, by incorporating a memory of previous gradient values the SAG method achieves a faster convergence rate than black-box SG methods. The convergence rate is improved from O(1/k^{1/2}) to O(1/k) in general, and when the sum is strongly-convex the convergence rate is improved from the sub-linear O(1/k) to a linear convergence rate of the form O(p^k) for p \textless{} 1. Further, in many cases the convergence rate of the new method is also faster than black-box deterministic gradient methods, in terms of the number of gradient evaluations. Numerical experiments indicate that the new algorithm often dramatically outperforms existing SG and deterministic gradient methods, and that the performance may be further improved through the use of non-uniform sampling strategies.Comment: Revision from January 2015 submission. Major changes: updated literature follow and discussion of subsequent work, additional Lemma showing the validity of one of the formulas, somewhat simplified presentation of Lyapunov bound, included code needed for checking proofs rather than the polynomials generated by the code, added error regions to the numerical experiment

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Scalable First-Order Methods for Robust MDPs

Author: Grand-Clément Julien
Kroer Christian
Publication venue
Publication date: 14/01/2021
Field of study

Robust Markov Decision Processes (MDPs) are a powerful framework for modeling sequential decision-making problems with model uncertainty. This paper proposes the first first-order framework for solving robust MDPs. Our algorithm interleaves primal-dual first-order updates with approximate Value Iteration updates. By carefully controlling the tradeoff between the accuracy and cost of Value Iteration updates, we achieve an ergodic convergence rate of

O \left( A^{2} S^{3}\log(S)\log(\epsilon^{-1}) \epsilon^{-1} \right)

for the best choice of parameters on ellipsoidal and Kullback-Leibler

s

-rectangular uncertainty sets, where

S

and

A

is the number of states and actions, respectively. Our dependence on the number of states and actions is significantly better (by a factor of

O(A^{1.5}S^{1.5})

) than that of pure Value Iteration algorithms. In numerical experiments on ellipsoidal uncertainty sets we show that our algorithm is significantly more scalable than state-of-the-art approaches. Our framework is also the first one to solve robust MDPs with

s

-rectangular KL uncertainty sets

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Efficient Linear Programming for Dense CRFs

Author: Ajanthan Thalaiyasingam
Bunel Rudy
Desmaison Alban
Kumar M. Pawan
Salzmann Mathieu
Torr Philip H. S.
Publication venue
Publication date: 01/01/2017
Field of study

The fully connected conditional random field (CRF) with Gaussian pairwise potentials has proven popular and effective for multi-class semantic segmentation. While the energy of a dense CRF can be minimized accurately using a linear programming (LP) relaxation, the state-of-the-art algorithm is too slow to be useful in practice. To alleviate this deficiency, we introduce an efficient LP minimization algorithm for dense CRFs. To this end, we develop a proximal minimization framework, where the dual of each proximal problem is optimized via block coordinate descent. We show that each block of variables can be efficiently optimized. Specifically, for one block, the problem decomposes into significantly smaller subproblems, each of which is defined over a single pixel. For the other block, the problem is optimized via conditional gradient descent. This has two advantages: 1) the conditional gradient can be computed in a time linear in the number of pixels and labels; and 2) the optimal step size can be computed analytically. Our experiments on standard datasets provide compelling evidence that our approach outperforms all existing baselines including the previous LP based approach for dense CRFs.Comment: 24 pages, 10 figures and 4 table

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Oxford University Research Archive