4,097 research outputs found
From Proximal Point Method to Nesterov's Acceleration
The proximal point method (PPM) is a fundamental method in optimization that
is often used as a building block for fast optimization algorithms. In this
work, building on a recent work by Defazio (2019), we provide a complete
understanding of Nesterov's accelerated gradient method (AGM) by establishing
quantitative and analytical connections between PPM and AGM. The main
observation in this paper is that AGM is in fact equal to a simple
approximation of PPM, which results in an elementary derivation of the
mysterious updates of AGM as well as its step sizes. This connection also leads
to a conceptually simple analysis of AGM based on the standard analysis of PPM.
This view naturally extends to the strongly convex case and also motivates
other accelerated methods for practically relevant settings.Comment: 14 pages; Section 4 updated; Remark 5 added; comments would be
appreciated
Fast global convergence of gradient methods for high-dimensional statistical recovery
Many statistical -estimators are based on convex optimization problems
formed by the combination of a data-dependent loss function with a norm-based
regularizer. We analyze the convergence rates of projected gradient and
composite gradient methods for solving such problems, working within a
high-dimensional framework that allows the data dimension \pdim to grow with
(and possibly exceed) the sample size \numobs. This high-dimensional
structure precludes the usual global assumptions---namely, strong convexity and
smoothness conditions---that underlie much of classical optimization analysis.
We define appropriately restricted versions of these conditions, and show that
they are satisfied with high probability for various statistical models. Under
these conditions, our theory guarantees that projected gradient descent has a
globally geometric rate of convergence up to the \emph{statistical precision}
of the model, meaning the typical distance between the true unknown parameter
and an optimal solution . This result is substantially
sharper than previous convergence results, which yielded sublinear convergence,
or linear convergence only up to the noise level. Our analysis applies to a
wide range of -estimators and statistical models, including sparse linear
regression using Lasso (-regularized regression); group Lasso for block
sparsity; log-linear models with regularization; low-rank matrix recovery using
nuclear norm regularization; and matrix decomposition. Overall, our analysis
reveals interesting connections between statistical precision and computational
efficiency in high-dimensional estimation
- …