Search CORE

1,995 research outputs found

Proximal boosting and its acceleration

Author: Boyer Claire
Fouillen Erwan
Sangnier Maxime
Publication venue
Publication date: 22/01/2020
Field of study

Gradient boosting is a prediction method that iteratively combines weak learners to produce a complex and accurate model. From an optimization point of view, the learning procedure of gradient boosting mimics a gradient descent on a functional variable. This paper proposes to build upon the proximal point algorithm when the empirical risk to minimize is not differentiable to introduce a novel boosting approach, called proximal boosting. Besides being motivated by non-differentiable optimization, the proposed algorithm benefits from Nesterov's acceleration in the same way as gradient boosting [Biau et al., 2018]. This leads to a variant, called accelerated proximal boosting. Advantages of leveraging proximal methods for boosting are illustrated by numerical experiments on simulated and real-world data. In particular, we exhibit a favorable comparison over gradient boosting regarding convergence rate and prediction accuracy

arXiv.org e-Print Archive

PRISMA: PRoximal Iterative SMoothing Algorithm

Author: Argyriou Andreas
Orabona Francesco
Srebro Nathan
Publication venue
Publication date: 18/11/2012
Field of study

Motivated by learning problems including max-norm regularized matrix completion and clustering, robust PCA and sparse inverse covariance selection, we propose a novel optimization algorithm for minimizing a convex objective which decomposes into three parts: a smooth part, a simple non-smooth Lipschitz part, and a simple non-smooth non-Lipschitz part. We use a time variant smoothing strategy that allows us to obtain a guarantee that does not depend on knowing in advance the total number of iterations nor a bound on the domain

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Catalyst Acceleration for First-order Convex Optimization: from Theory to Practice

Author: Harchaoui Zaid
Lin Hongzhou
Mairal Julien
Publication venue
Publication date: 01/04/2018
Field of study

We introduce a generic scheme for accelerating gradient-based optimization methods in the sense of Nesterov. The approach, called Catalyst, builds upon the inexact accelerated proximal point algorithm for minimizing a convex objective function, and consists of approximately solving a sequence of well-chosen auxiliary problems, leading to faster convergence. One of the keys to achieve acceleration in theory and in practice is to solve these sub-problems with appropriate accuracy by using the right stopping criterion and the right warm-start strategy. We give practical guidelines to use Catalyst and present a comprehensive analysis of its global complexity. We show that Catalyst applies to a large class of algorithms, including gradient descent, block coordinate descent, incremental algorithms such as SAG, SAGA, SDCA, SVRG, MISO/Finito, and their proximal variants. For all of these methods, we establish faster rates using the Catalyst acceleration, for strongly convex and non-strongly convex objectives. We conclude with extensive experiments showing that acceleration is useful in practice, especially for ill-conditioned problems.Comment: link to publisher website: http://jmlr.org/papers/volume18/17-748/17-748.pd

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

From Averaging to Acceleration, There is Only a Step-size

Author: Bach Francis
Flammarion Nicolas
Publication venue
Publication date: 01/01/2015
Field of study

We show that accelerated gradient descent, averaged gradient descent and the heavy-ball method for non-strongly-convex problems may be reformulated as constant parameter second-order difference equation algorithms, where stability of the system is equivalent to convergence at rate O(1/n 2), where n is the number of iterations. We provide a detailed analysis of the eigenvalues of the corresponding linear dynamical system , showing various oscillatory and non-oscillatory behaviors, together with a sharp stability result with explicit constants. We also consider the situation where noisy gradients are available, where we extend our general convergence result, which suggests an alternative algorithm (i.e., with different step sizes) that exhibits the good aspects of both averaging and acceleration

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server