Search CORE

3,223 research outputs found

Lyapunov Functions for First-Order Methods: Tight Automated Convergence Guarantees

Author: Lessard Laurent
Taylor Adrien
Van Scoy Bryan
Publication venue
Publication date: 11/06/2018
Field of study

We present a novel way of generating Lyapunov functions for proving linear convergence rates of first-order optimization methods. Our approach provably obtains the fastest linear convergence rate that can be verified by a quadratic Lyapunov function (with given states), and only relies on solving a small-sized semidefinite program. Our approach combines the advantages of performance estimation problems (PEP, due to Drori & Teboulle (2014)) and integral quadratic constraints (IQC, due to Lessard et al. (2016)), and relies on convex interpolation (due to Taylor et al. (2017c;b)).Comment: to appear in ICML'1

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

The exact worst-case convergence rate of the gradient method with fixed step lengths for L-smooth functions

Author: Abbaszadehpeivasti Hadi
de Klerk Etienne
Zamani Moslem
Publication venue
Publication date: 12/04/2021
Field of study

In this paper, we study the convergence rate of gradient (or steepest descent) method with fixed step lengths for finding a stationary point of an

L

-smooth function. We establish a new convergence rate, and show that the bound may be exact in some cases. In addition, based on the bound, we derive an optimal step length

arXiv.org e-Print Archive

Tilburg University Repository

Conditions for linear convergence of the gradient method for non-convex optimization

Author: Abbaszadehpeivasti Hadi
de Klerk Etienne
Zamani Moslem
Publication venue
Publication date: 01/04/2022
Field of study

In this paper, we derive a new linear convergence rate for the gradient method with fixed step lengths for non-convex smooth optimization problems satisfying the Polyak-Lojasiewicz (PL) inequality. We establish that the PL inequality is a necessary and sufficient condition for linear convergence to the optimal value for this class of problems. We list some related classes of functions for which the gradient method may enjoy linear convergence rate. Moreover, we investigate their relationship with the PL inequality

arXiv.org e-Print Archive

Tilburg University Repository

Optimisation for efficient deep learning

Author: Paren Alasdair
Publication venue
Publication date: 10/07/2023
Field of study

Over the past 10 years there has been a huge advance in the performance power of deep neural networks on many supervised learning tasks. Over this period these models have redefined the state of the art numerous times on many classic machine vision and natural language processing benchmarks. Deep neural networks have also found their way into many real-world applications including chat bots, art generation, voice activated virtual assistants, surveillance, and medical diagnosis systems. Much of the improved performance of these models can be attributed to an increase in scale, which in turn has raised computation and energy costs. In this thesis we detail approaches of how to reduce the cost of deploying deep neural networks in various settings. We first focus on training efficiency, and to that end we present two optimisation techniques that produce high accuracy models without extensive tuning. These optimisers only have a single fixed maximal step size hyperparameter to cross-validate and we demonstrate that they outperform other comparable methods in a wide range of settings. These approaches do not require the onerous process of finding a good learning rate schedule, which often requires training many versions of the same network, hence they reduce the computation needed. The first of these optimisers is a novel bundle method designed for the interpolation setting. The second demonstrates the effectiveness of a Polyak-like step size in combination with an online estimate of the optimal loss value in the non-interpolating setting. Next, we turn our attention to training efficient binary networks with both binary parameters and activations. With the right implementation, fully binary networks are highly efficient at inference time, as they can replace the majority of operations with cheaper bit-wise alternatives. This makes them well suited for lightweight or embedded applications. Due to the discrete nature of these models conventional training approaches are not viable. We present a simple and effective alternative to the existing optimisation techniques for these models

Oxford University Research Archive