1,869 research outputs found
Adaptive Momentum for Neural Network Optimization
In this thesis, we develop a novel and efficient algorithm for optimizing neural networks inspired by a recently proposed geodesic optimization algorithm. Our algorithm, which we call Stochastic Geodesic Optimization (SGeO), utilizes an adaptive coefficient on top of Polyaks Heavy Ball method effectively controlling the amount of weight put on the previous update to the parameters based on the change of direction in the optimization path. Experimental results on strongly convex functions with Lipschitz gradients and deep Autoencoder benchmarks show that SGeO reaches lower errors than established first-order methods and competes well with lower or similar errors to a recent second-order method called K-FAC (Kronecker-Factored Approximate Curvature). We also incorporate Nesterov style lookahead gradient into our algorithm (SGeO-N) and observe notable improvements. We believe that our research will open up new directions for high-dimensional neural network optimization where combining the efficiency of first-order methods and the effectiveness of second-order methods proves a promising avenue to explore
PhasePack: A Phase Retrieval Library
Phase retrieval deals with the estimation of complex-valued signals solely
from the magnitudes of linear measurements. While there has been a recent
explosion in the development of phase retrieval algorithms, the lack of a
common interface has made it difficult to compare new methods against the
state-of-the-art. The purpose of PhasePack is to create a common software
interface for a wide range of phase retrieval algorithms and to provide a
common testbed using both synthetic data and empirical imaging datasets.
PhasePack is able to benchmark a large number of recent phase retrieval methods
against one another to generate comparisons using a range of different
performance metrics. The software package handles single method testing as well
as multiple method comparisons.
The algorithm implementations in PhasePack differ slightly from their
original descriptions in the literature in order to achieve faster speed and
improved robustness. In particular, PhasePack uses adaptive stepsizes,
line-search methods, and fast eigensolvers to speed up and automate
convergence
Acceleration Methods
This monograph covers some recent advances in a range of acceleration
techniques frequently used in convex optimization. We first use quadratic
optimization problems to introduce two key families of methods, namely momentum
and nested optimization schemes. They coincide in the quadratic case to form
the Chebyshev method. We discuss momentum methods in detail, starting with the
seminal work of Nesterov and structure convergence proofs using a few master
templates, such as that for optimized gradient methods, which provide the key
benefit of showing how momentum methods optimize convergence guarantees. We
further cover proximal acceleration, at the heart of the Catalyst and
Accelerated Hybrid Proximal Extragradient frameworks, using similar algorithmic
patterns. Common acceleration techniques rely directly on the knowledge of some
of the regularity parameters in the problem at hand. We conclude by discussing
restart schemes, a set of simple techniques for reaching nearly optimal
convergence rates while adapting to unobserved regularity parameters.Comment: Published in Foundation and Trends in Optimization (see
https://www.nowpublishers.com/article/Details/OPT-036
Primal-dual accelerated gradient methods with small-dimensional relaxation oracle
In this paper, a new variant of accelerated gradient descent is proposed. The
pro-posed method does not require any information about the objective function,
usesexact line search for the practical accelerations of convergence, converges
accordingto the well-known lower bounds for both convex and non-convex
objective functions,possesses primal-dual properties and can be applied in the
non-euclidian set-up. Asfar as we know this is the rst such method possessing
all of the above properties atthe same time. We also present a universal
version of the method which is applicableto non-smooth problems. We demonstrate
how in practice one can efficiently use thecombination of line-search and
primal-duality by considering a convex optimizationproblem with a simple
structure (for example, linearly constrained)
Gradient methods for convex minimization: better rates under weaker conditions
The convergence behavior of gradient methods for minimizing convex
differentiable functions is one of the core questions in convex optimization.
This paper shows that their well-known complexities can be achieved under
conditions weaker than the commonly accepted ones. We relax the common gradient
Lipschitz-continuity condition and strong convexity condition to ones that hold
only over certain line segments. Specifically, we establish complexities
and for the ordinary and
accelerate gradient methods, respectively, assuming that is
Lipschitz continuous with constant over the line segment joining and
for each x\in\dom f. Then we improve them to
and
for function that also
satisfies the secant inequality
for each x\in \dom f and its projection to the minimizer set of .
The secant condition is also shown to be necessary for the geometric decay of
solution error. Not only are the relaxed conditions met by more functions, the
restrictions give smaller and larger than they are without the
restrictions and thus lead to better complexity bounds. We apply these results
to sparse optimization and demonstrate a faster algorithm.Comment: 20 pages, 4 figures, typos are corrected, Theorem 2 is ne
- …