1,869 research outputs found

    Adaptive Momentum for Neural Network Optimization

    Get PDF
    In this thesis, we develop a novel and efficient algorithm for optimizing neural networks inspired by a recently proposed geodesic optimization algorithm. Our algorithm, which we call Stochastic Geodesic Optimization (SGeO), utilizes an adaptive coefficient on top of Polyaks Heavy Ball method effectively controlling the amount of weight put on the previous update to the parameters based on the change of direction in the optimization path. Experimental results on strongly convex functions with Lipschitz gradients and deep Autoencoder benchmarks show that SGeO reaches lower errors than established first-order methods and competes well with lower or similar errors to a recent second-order method called K-FAC (Kronecker-Factored Approximate Curvature). We also incorporate Nesterov style lookahead gradient into our algorithm (SGeO-N) and observe notable improvements. We believe that our research will open up new directions for high-dimensional neural network optimization where combining the efficiency of first-order methods and the effectiveness of second-order methods proves a promising avenue to explore

    PhasePack: A Phase Retrieval Library

    Full text link
    Phase retrieval deals with the estimation of complex-valued signals solely from the magnitudes of linear measurements. While there has been a recent explosion in the development of phase retrieval algorithms, the lack of a common interface has made it difficult to compare new methods against the state-of-the-art. The purpose of PhasePack is to create a common software interface for a wide range of phase retrieval algorithms and to provide a common testbed using both synthetic data and empirical imaging datasets. PhasePack is able to benchmark a large number of recent phase retrieval methods against one another to generate comparisons using a range of different performance metrics. The software package handles single method testing as well as multiple method comparisons. The algorithm implementations in PhasePack differ slightly from their original descriptions in the literature in order to achieve faster speed and improved robustness. In particular, PhasePack uses adaptive stepsizes, line-search methods, and fast eigensolvers to speed up and automate convergence

    Acceleration Methods

    Full text link
    This monograph covers some recent advances in a range of acceleration techniques frequently used in convex optimization. We first use quadratic optimization problems to introduce two key families of methods, namely momentum and nested optimization schemes. They coincide in the quadratic case to form the Chebyshev method. We discuss momentum methods in detail, starting with the seminal work of Nesterov and structure convergence proofs using a few master templates, such as that for optimized gradient methods, which provide the key benefit of showing how momentum methods optimize convergence guarantees. We further cover proximal acceleration, at the heart of the Catalyst and Accelerated Hybrid Proximal Extragradient frameworks, using similar algorithmic patterns. Common acceleration techniques rely directly on the knowledge of some of the regularity parameters in the problem at hand. We conclude by discussing restart schemes, a set of simple techniques for reaching nearly optimal convergence rates while adapting to unobserved regularity parameters.Comment: Published in Foundation and Trends in Optimization (see https://www.nowpublishers.com/article/Details/OPT-036

    Primal-dual accelerated gradient methods with small-dimensional relaxation oracle

    Full text link
    In this paper, a new variant of accelerated gradient descent is proposed. The pro-posed method does not require any information about the objective function, usesexact line search for the practical accelerations of convergence, converges accordingto the well-known lower bounds for both convex and non-convex objective functions,possesses primal-dual properties and can be applied in the non-euclidian set-up. Asfar as we know this is the rst such method possessing all of the above properties atthe same time. We also present a universal version of the method which is applicableto non-smooth problems. We demonstrate how in practice one can efficiently use thecombination of line-search and primal-duality by considering a convex optimizationproblem with a simple structure (for example, linearly constrained)

    Gradient methods for convex minimization: better rates under weaker conditions

    Full text link
    The convergence behavior of gradient methods for minimizing convex differentiable functions is one of the core questions in convex optimization. This paper shows that their well-known complexities can be achieved under conditions weaker than the commonly accepted ones. We relax the common gradient Lipschitz-continuity condition and strong convexity condition to ones that hold only over certain line segments. Specifically, we establish complexities O(Rϵ)O(\frac{R}{\epsilon}) and O(Rϵ)O(\sqrt{\frac{R}{\epsilon}}) for the ordinary and accelerate gradient methods, respectively, assuming that f\nabla f is Lipschitz continuous with constant RR over the line segment joining xx and x1Rfx-\frac{1}{R}\nabla f for each x\in\dom f. Then we improve them to O(Rνlog(1ϵ))O(\frac{R}{\nu}\log(\frac{1}{\epsilon})) and O(Rνlog(1ϵ))O(\sqrt{\frac{R}{\nu}}\log(\frac{1}{\epsilon})) for function ff that also satisfies the secant inequality  νxx2\ \ge \nu\|x-x^*\|^2 for each x\in \dom f and its projection xx^* to the minimizer set of ff. The secant condition is also shown to be necessary for the geometric decay of solution error. Not only are the relaxed conditions met by more functions, the restrictions give smaller RR and larger ν\nu than they are without the restrictions and thus lead to better complexity bounds. We apply these results to sparse optimization and demonstrate a faster algorithm.Comment: 20 pages, 4 figures, typos are corrected, Theorem 2 is ne