Search CORE

1,869 research outputs found

Adaptive Momentum for Neural Network Optimization

Author: Rashidi Zana
Publication venue
Publication date: 11/05/2020
Field of study

In this thesis, we develop a novel and efficient algorithm for optimizing neural networks inspired by a recently proposed geodesic optimization algorithm. Our algorithm, which we call Stochastic Geodesic Optimization (SGeO), utilizes an adaptive coefficient on top of Polyaks Heavy Ball method effectively controlling the amount of weight put on the previous update to the parameters based on the change of direction in the optimization path. Experimental results on strongly convex functions with Lipschitz gradients and deep Autoencoder benchmarks show that SGeO reaches lower errors than established first-order methods and competes well with lower or similar errors to a recent second-order method called K-FAC (Kronecker-Factored Approximate Curvature). We also incorporate Nesterov style lookahead gradient into our algorithm (SGeO-N) and observe notable improvements. We believe that our research will open up new directions for high-dimensional neural network optimization where combining the efficiency of first-order methods and the effectiveness of second-order methods proves a promising avenue to explore

YorkSpace

PhasePack: A Phase Retrieval Library

Author: Chandra Rohan
Goldstein Tom
Hontz Justin
McCulloch Val
Studer Christoph
Zhong Ziyuan
Publication venue
Publication date: 30/11/2017
Field of study

Phase retrieval deals with the estimation of complex-valued signals solely from the magnitudes of linear measurements. While there has been a recent explosion in the development of phase retrieval algorithms, the lack of a common interface has made it difficult to compare new methods against the state-of-the-art. The purpose of PhasePack is to create a common software interface for a wide range of phase retrieval algorithms and to provide a common testbed using both synthetic data and empirical imaging datasets. PhasePack is able to benchmark a large number of recent phase retrieval methods against one another to generate comparisons using a range of different performance metrics. The software package handles single method testing as well as multiple method comparisons. The algorithm implementations in PhasePack differ slightly from their original descriptions in the literature in order to achieve faster speed and improved robustness. In particular, PhasePack uses adaptive stepsizes, line-search methods, and fast eigensolvers to speed up and automate convergence

arXiv.org e-Print Archive

Crossref

Acceleration Methods

Author: d'Aspremont Alexandre
Scieur Damien
Taylor Adrien
Publication venue: 'Now Publishers'
Publication date: 01/03/2021
Field of study

This monograph covers some recent advances in a range of acceleration techniques frequently used in convex optimization. We first use quadratic optimization problems to introduce two key families of methods, namely momentum and nested optimization schemes. They coincide in the quadratic case to form the Chebyshev method. We discuss momentum methods in detail, starting with the seminal work of Nesterov and structure convergence proofs using a few master templates, such as that for optimized gradient methods, which provide the key benefit of showing how momentum methods optimize convergence guarantees. We further cover proximal acceleration, at the heart of the Catalyst and Accelerated Hybrid Proximal Extragradient frameworks, using similar algorithmic patterns. Common acceleration techniques rely directly on the knowledge of some of the regularity parameters in the problem at hand. We conclude by discussing restart schemes, a set of simple techniques for reaching nearly optimal convergence rates while adapting to unobserved regularity parameters.Comment: Published in Foundation and Trends in Optimization (see https://www.nowpublishers.com/article/Details/OPT-036

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Primal-dual accelerated gradient methods with small-dimensional relaxation oracle

Author: Dvurechensky Pavel
Gasnikov Alexander
Guminov Sergey
Nesterov Yurii
Publication venue
Publication date: 01/01/2019
Field of study

In this paper, a new variant of accelerated gradient descent is proposed. The pro-posed method does not require any information about the objective function, usesexact line search for the practical accelerations of convergence, converges accordingto the well-known lower bounds for both convex and non-convex objective functions,possesses primal-dual properties and can be applied in the non-euclidian set-up. Asfar as we know this is the rst such method possessing all of the above properties atthe same time. We also present a universal version of the method which is applicableto non-smooth problems. We demonstrate how in practice one can efficiently use thecombination of line-search and primal-duality by considering a convex optimizationproblem with a simple structure (for example, linearly constrained)

arXiv.org e-Print Archive

DIAL UCLouvain

Gradient methods for convex minimization: better rates under weaker conditions

Author: Yin Wotao
Zhang Hui
Publication venue
Publication date: 01/01/2013
Field of study

The convergence behavior of gradient methods for minimizing convex differentiable functions is one of the core questions in convex optimization. This paper shows that their well-known complexities can be achieved under conditions weaker than the commonly accepted ones. We relax the common gradient Lipschitz-continuity condition and strong convexity condition to ones that hold only over certain line segments. Specifically, we establish complexities

O(\frac{R}{\epsilon})

and

O(\sqrt{\frac{R}{\epsilon}})

for the ordinary and accelerate gradient methods, respectively, assuming that

\nabla f

is Lipschitz continuous with constant

R

over the line segment joining

x

and

x-\frac{1}{R}\nabla f

for each x\in\dom f. Then we improve them to

O(\frac{R}{\nu}\log(\frac{1}{\epsilon}))

and

O(\sqrt{\frac{R}{\nu}}\log(\frac{1}{\epsilon}))

for function

f

that also satisfies the secant inequality

\ \ge \nu\|x-x^*\|^2

for each x\in \dom f and its projection

x^*

to the minimizer set of

f

. The secant condition is also shown to be necessary for the geometric decay of solution error. Not only are the relaxed conditions met by more functions, the restrictions give smaller

R

and larger

\nu

than they are without the restrictions and thus lead to better complexity bounds. We apply these results to sparse optimization and demonstrate a faster algorithm.Comment: 20 pages, 4 figures, typos are corrected, Theorem 2 is ne

arXiv.org e-Print Archive

CiteSeerX

DSpace at Rice University