Search CORE

16,137 research outputs found

From error bounds to the complexity of first-order descent methods for convex functions

This paper shows that error bounds can be used as effective tools for deriving complexity results for first-order descent methods in convex minimization. In a first stage, this objective led us to revisit the interplay between error bounds and the Kurdyka-\L ojasiewicz (KL) inequality. One can show the equivalence between the two concepts for convex functions having a moderately flat profile near the set of minimizers (as those of functions with H\"olderian growth). A counterexample shows that the equivalence is no longer true for extremely flat functions. This fact reveals the relevance of an approach based on KL inequality. In a second stage, we show how KL inequalities can in turn be employed to compute new complexity bounds for a wealth of descent methods for convex problems. Our approach is completely original and makes use of a one-dimensional worst-case proximal sequence in the spirit of the famous majorant method of Kantorovich. Our result applies to a very simple abstract scheme that covers a wide class of descent methods. As a byproduct of our study, we also provide new results for the globalization of KL inequalities in the convex framework. Our main results inaugurate a simple methodology: derive an error bound, compute the desingularizing function whenever possible, identify essential constants in the descent method and finally compute the complexity using the one-dimensional worst case proximal sequence. Our method is illustrated through projection methods for feasibility problems, and through the famous iterative shrinkage thresholding algorithm (ISTA), for which we show that the complexity bound is of the form

O(q^{k})

where the constituents of the bound only depend on error bound constants obtained for an arbitrary least squares objective with

\ell^1

regularization

arXiv.org e-Print Archive

Crossref

Toulouse Capitole Publications

Toulouse 1 Capitole Publications

Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization

Author: Agarwal Alekh
Bartlett Peter L.
Ravikumar Pradeep
Wainwright Martin J.
Publication venue
Publication date: 22/03/2011
Field of study

Relative to the large literature on upper bounds on complexity of convex optimization, lesser attention has been paid to the fundamental hardness of these problems. Given the extensive use of convex optimization in machine learning and statistics, gaining an understanding of these complexity-theoretic issues is important. In this paper, we study the complexity of stochastic convex optimization in an oracle model of computation. We improve upon known results and obtain tight minimax complexity estimates for various function classes

arXiv.org e-Print Archive

CiteSeerX

Queensland University of Technology ePrints Archive

Sharp Time--Data Tradeoffs for Linear Inverse Problems

Author: Oymak Samet
Recht Benjamin
Soltanolkotabi Mahdi
Publication venue
Publication date: 05/01/2016
Field of study

In this paper we characterize sharp time-data tradeoffs for optimization problems used for solving linear inverse problems. We focus on the minimization of a least-squares objective subject to a constraint defined as the sub-level set of a penalty function. We present a unified convergence analysis of the gradient projection algorithm applied to such problems. We sharply characterize the convergence rate associated with a wide variety of random measurement ensembles in terms of the number of measurements and structural complexity of the signal with respect to the chosen penalty function. The results apply to both convex and nonconvex constraints, demonstrating that a linear convergence rate is attainable even though the least squares objective is not strongly convex in these settings. When specialized to Gaussian measurements our results show that such linear convergence occurs when the number of measurements is merely 4 times the minimal number required to recover the desired signal at all (a.k.a. the phase transition). We also achieve a slower but geometric rate of convergence precisely above the phase transition point. Extensive numerical results suggest that the derived rates exactly match the empirical performance

arXiv.org e-Print Archive

CiteSeerX