2,280 research outputs found
The Statistical Complexity of Early-Stopped Mirror Descent
Recently there has been a surge of interest in understanding implicit
regularization properties of iterative gradient-based optimization algorithms.
In this paper, we study the statistical guarantees on the excess risk achieved
by early-stopped unconstrained mirror descent algorithms applied to the
unregularized empirical risk with the squared loss for linear models and kernel
methods. By completing an inequality that characterizes convexity for the
squared loss, we identify an intrinsic link between offset Rademacher
complexities and potential-based convergence analysis of mirror descent
methods. Our observation immediately yields excess risk guarantees for the path
traced by the iterates of mirror descent in terms of offset complexities of
certain function classes depending only on the choice of the mirror map,
initialization point, step-size, and the number of iterations. We apply our
theory to recover, in a clean and elegant manner via rather short proofs, some
of the recent results in the implicit regularization literature, while also
showing how to improve upon them in some settings
The statistical complexity of early-stopped mirror descent
Recently there has been a surge of interest in understanding implicit regularization properties of iterative gradient-based optimization algorithms. In this paper, we study the statistical guarantees on the excess risk achieved by early-stopped unconstrained mirror descent algorithms applied to the unregularized empirical risk. We consider the set-up of learning linear models and kernel methods for strongly convex and Lipschitz loss functions while imposing only boundedness conditions on the unknown data-generating mechanism. By completing an inequality that characterizes convexity for the squared loss, we identify an intrinsic link between offset Rademacher complexities and potential-based convergence analysis of mirror descent methods. Our observation immediately yields excess risk guarantees for the path traced by the iterates of mirror descent in terms of offset complexities of certain function classes depending only on the choice of the mirror map, initialization point, step size and the number of iterations. We apply our theory to recover, in a clean and elegant manner via rather short proofs, some of the recent results in the implicit regularization literature while also showing how to improve upon them in some settings
On Stochastic Subgradient Mirror-Descent Algorithm with Weighted Averaging
This paper considers stochastic subgradient mirror-descent method for solving
constrained convex minimization problems. In particular, a stochastic
subgradient mirror-descent method with weighted iterate-averaging is
investigated and its per-iterate convergence rate is analyzed. The novel part
of the approach is in the choice of weights that are used to construct the
averages. Through the use of these weighted averages, we show that the known
optimal rates can be obtained with simpler algorithms than those currently
existing in the literature. Specifically, by suitably choosing the stepsize
values, one can obtain the rate of the order for strongly convex
functions, and the rate for general convex functions (not
necessarily differentiable). Furthermore, for the latter case, it is shown that
a stochastic subgradient mirror-descent with iterate averaging converges (along
a subsequence) to an optimal solution, almost surely, even with the stepsize of
the form , which was not previously known. The stepsize choices
that achieve the best rates are those proposed by Paul Tseng for acceleration
of proximal gradient methods
A Stochastic Interpretation of Stochastic Mirror Descent: Risk-Sensitive Optimality
Stochastic mirror descent (SMD) is a fairly new family of algorithms that has
recently found a wide range of applications in optimization, machine learning,
and control. It can be considered a generalization of the classical stochastic
gradient algorithm (SGD), where instead of updating the weight vector along the
negative direction of the stochastic gradient, the update is performed in a
"mirror domain" defined by the gradient of a (strictly convex) potential
function. This potential function, and the mirror domain it yields, provides
considerable flexibility in the algorithm compared to SGD. While many
properties of SMD have already been obtained in the literature, in this paper
we exhibit a new interpretation of SMD, namely that it is a risk-sensitive
optimal estimator when the unknown weight vector and additive noise are
non-Gaussian and belong to the exponential family of distributions. The
analysis also suggests a modified version of SMD, which we refer to as
symmetric SMD (SSMD). The proofs rely on some simple properties of Bregman
divergence, which allow us to extend results from quadratics and Gaussians to
certain convex functions and exponential families in a rather seamless way
The Extended Regularized Dual Averaging Method for Composite Optimization
We present a new algorithm, extended regularized dual averaging (XRDA), for
solving composite optimization problems, which are a generalization of the
regularized dual averaging (RDA) method. The main novelty of the method is that
it allows more flexible control of the backward step size. For instance, the
backward step size for RDA grows without bound, while XRDA the backward step
size can be kept bounded
- …