Search CORE

2,280 research outputs found

The Statistical Complexity of Early-Stopped Mirror Descent

Author: Kanade Varun
Rebeschini Patrick
Vaškevičius Tomas
Publication venue
Publication date: 27/08/2020
Field of study

Recently there has been a surge of interest in understanding implicit regularization properties of iterative gradient-based optimization algorithms. In this paper, we study the statistical guarantees on the excess risk achieved by early-stopped unconstrained mirror descent algorithms applied to the unregularized empirical risk with the squared loss for linear models and kernel methods. By completing an inequality that characterizes convexity for the squared loss, we identify an intrinsic link between offset Rademacher complexities and potential-based convergence analysis of mirror descent methods. Our observation immediately yields excess risk guarantees for the path traced by the iterates of mirror descent in terms of offset complexities of certain function classes depending only on the choice of the mirror map, initialization point, step-size, and the number of iterations. We apply our theory to recover, in a clean and elegant manner via rather short proofs, some of the recent results in the implicit regularization literature, while also showing how to improve upon them in some settings

arXiv.org e-Print Archive

Oxford University Research Archive

The statistical complexity of early-stopped mirror descent

Author: Kanade Varun
Rebeschini Patrick
Vaskevicius Tomas
Publication venue: Oxford University Press
Publication date: 23/11/2023
Field of study

Recently there has been a surge of interest in understanding implicit regularization properties of iterative gradient-based optimization algorithms. In this paper, we study the statistical guarantees on the excess risk achieved by early-stopped unconstrained mirror descent algorithms applied to the unregularized empirical risk. We consider the set-up of learning linear models and kernel methods for strongly convex and Lipschitz loss functions while imposing only boundedness conditions on the unknown data-generating mechanism. By completing an inequality that characterizes convexity for the squared loss, we identify an intrinsic link between offset Rademacher complexities and potential-based convergence analysis of mirror descent methods. Our observation immediately yields excess risk guarantees for the path traced by the iterates of mirror descent in terms of offset complexities of certain function classes depending only on the choice of the mirror map, initialization point, step size and the number of iterations. We apply our theory to recover, in a clean and elegant manner via rather short proofs, some of the recent results in the implicit regularization literature while also showing how to improve upon them in some settings

Oxford University Research Archive

On Stochastic Subgradient Mirror-Descent Algorithm with Weighted Averaging

Author: Angelia Nedic ́
Dedicated To Paul Tseng
Soomin Lee
Publication venue
Publication date: 07/07/2013
Field of study

This paper considers stochastic subgradient mirror-descent method for solving constrained convex minimization problems. In particular, a stochastic subgradient mirror-descent method with weighted iterate-averaging is investigated and its per-iterate convergence rate is analyzed. The novel part of the approach is in the choice of weights that are used to construct the averages. Through the use of these weighted averages, we show that the known optimal rates can be obtained with simpler algorithms than those currently existing in the literature. Specifically, by suitably choosing the stepsize values, one can obtain the rate of the order

1/k

for strongly convex functions, and the rate

1/\sqrt{k}

for general convex functions (not necessarily differentiable). Furthermore, for the latter case, it is shown that a stochastic subgradient mirror-descent with iterate averaging converges (along a subsequence) to an optimal solution, almost surely, even with the stepsize of the form

1/\sqrt{1+k}

, which was not previously known. The stepsize choices that achieve the best rates are those proposed by Paul Tseng for acceleration of proximal gradient methods

arXiv.org e-Print Archive

CiteSeerX

A Stochastic Interpretation of Stochastic Mirror Descent: Risk-Sensitive Optimality

Author: Azizan Navid
Hassibi Babak
Publication venue
Publication date: 03/04/2019
Field of study

Stochastic mirror descent (SMD) is a fairly new family of algorithms that has recently found a wide range of applications in optimization, machine learning, and control. It can be considered a generalization of the classical stochastic gradient algorithm (SGD), where instead of updating the weight vector along the negative direction of the stochastic gradient, the update is performed in a "mirror domain" defined by the gradient of a (strictly convex) potential function. This potential function, and the mirror domain it yields, provides considerable flexibility in the algorithm compared to SGD. While many properties of SMD have already been obtained in the literature, in this paper we exhibit a new interpretation of SMD, namely that it is a risk-sensitive optimal estimator when the unknown weight vector and additive noise are non-Gaussian and belong to the exponential family of distributions. The analysis also suggests a modified version of SMD, which we refer to as symmetric SMD (SSMD). The proofs rely on some simple properties of Bregman divergence, which allow us to extend results from quadratics and Gaussians to certain convex functions and exponential families in a rather seamless way

arXiv.org e-Print Archive

Crossref

Caltech Authors

The Extended Regularized Dual Averaging Method for Composite Optimization

Author: Siegel Jonathan W.
Xu Jinchao
Publication venue
Publication date: 10/03/2021
Field of study

We present a new algorithm, extended regularized dual averaging (XRDA), for solving composite optimization problems, which are a generalization of the regularized dual averaging (RDA) method. The main novelty of the method is that it allows more flexible control of the backward step size. For instance, the backward step size for RDA grows without bound, while XRDA the backward step size can be kept bounded

arXiv.org e-Print Archive