Search CORE

948 research outputs found

Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization

Author: Bach Francis
Bubeck Sébastien
Hendrikx Hadrien
Massoulié Laurent
Xiao Lin
Publication venue: HAL CCSD
Publication date: 12/07/2020
Field of study

International audienceWe consider the setting of distributed empirical risk minimization where multiple machines compute the gradients in parallel and a centralized server updates the model parameters. In order to reduce the number of communications required to reach a given accuracy, we propose a precon-ditioned accelerated gradient method where the preconditioning is done by solving a local optimization problem over a subsampled dataset at the server. The convergence rate of the method depends on the square root of the relative condition number between the global and local loss functions. We estimate the relative condition number for linear prediction models by studying uniform concentration of the Hessians over a bounded domain , which allows us to derive improved convergence rates for existing preconditioned gradient methods and our accelerated method. Experiments on real-world datasets illustrate the benefits of acceleration in the ill-conditioned regime

INRIA a CCSD electronic archive server

Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization

Author: Bach Francis
Bubeck Sebastien
Hendrikx Hadrien
Massoulie Laurent
Xiao Lin
Publication venue
Publication date: 25/02/2020
Field of study

We consider the setting of distributed empirical risk minimization where multiple machines compute the gradients in parallel and a centralized server updates the model parameters. In order to reduce the number of communications required to reach a given accuracy, we propose a \emph{preconditioned} accelerated gradient method where the preconditioning is done by solving a local optimization problem over a subsampled dataset at the server. The convergence rate of the method depends on the square root of the relative condition number between the global and local loss functions. We estimate the relative condition number for linear prediction models by studying \emph{uniform} concentration of the Hessians over a bounded domain, which allows us to derive improved convergence rates for existing preconditioned gradient methods and our accelerated method. Experiments on real-world datasets illustrate the benefits of acceleration in the ill-conditioned regime

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hyperfast second-order local solvers for efficient statistically preconditioned distributed optimization

Author: Dvurechensky Pavel
Gasnikov Alexander
Kamzolov Dmitry
Lee Soomin
Lukashevich Aleksandr
Ordentlich Erik
Uribe César A.
Publication venue: Amsterdam : Elsevier
Publication date: 16/02/2021
Field of study

Statistical preconditioning enables fast methods for distributed large-scale empirical risk minimization problems. In this approach, multiple worker nodes compute gradients in parallel, which are then used by the central node to update the parameter by solving an auxiliary (preconditioned) smaller-scale optimization problem. The recently proposed Statistically Preconditioned Accelerated Gradient (SPAG) method [1] has complexity bounds superior to other such algorithms but requires an exact solution for computationally intensive auxiliary optimization problems at every iteration. In this paper, we propose an Inexact SPAG (InSPAG) and explicitly characterize the accuracy by which the corresponding auxiliary subproblem needs to be solved to guarantee the same convergence rate as the exact method. We build our results by first developing an inexact adaptive accelerated Bregman proximal gradient method for general optimization problems under relative smoothness and strong convexity assumptions, which may be of independent interest. Moreover, we explore the properties of the auxiliary problem in the InSPAG algorithm assuming Lipschitz third-order derivatives and strong convexity. For such problem class, we develop a linearly convergent Hyperfast second-order method and estimate the total complexity of the InSPAG method with hyperfast auxiliary problem solver. Finally, we illustrate the proposed method's practical efficiency by performing large-scale numerical experiments on logistic regression models. To the best of our knowledge, these are the first empirical results on implementing high-order methods on large-scale problems, as we work with data where the dimension is of the order of 3 million, and the number of samples is 700 million

arXiv.org e-Print Archive

Repositorium für Naturwissenschaften und Technik

Fast minimum variance wavefront reconstruction for extremely large telescopes

Author: Béchet
Ellerbroek
Eric Thiébaut
Fried
Fried
Gendron
Gilles
Gilles
Gilles
Gilles
Herrmann
Lane
Le Louarn
Le Roux
MacMartin
Michel Tallon
Poyneer
Poyneer
Roddier
Skilling
Southwell
Tarantola
Vogel
Vogel
Wild
Yang
Publication venue: 'The Optical Society'
Publication date: 01/03/2010
Field of study

We present a new algorithm, FRiM (FRactal Iterative Method), aiming at the reconstruction of the optical wavefront from measurements provided by a wavefront sensor. As our application is adaptive optics on extremely large telescopes, our algorithm was designed with speed and best quality in mind. The latter is achieved thanks to a regularization which enforces prior statistics. To solve the regularized problem, we use the conjugate gradient method which takes advantage of the sparsity of the wavefront sensor model matrix and avoids the storage and inversion of a huge matrix. The prior covariance matrix is however non-sparse and we derive a fractal approximation to the Karhunen-Loeve basis thanks to which the regularization by Kolmogorov statistics can be computed in O(N) operations, N being the number of phase samples to estimate. Finally, we propose an effective preconditioning which also scales as O(N) and yields the solution in 5-10 conjugate gradient iterations for any N. The resulting algorithm is therefore O(N). As an example, for a 128 x 128 Shack-Hartmann wavefront sensor, FRiM appears to be more than 100 times faster than the classical vector-matrix multiplication method.Comment: to appear in the Journal of the Optical Society of America

arXiv.org e-Print Archive

Crossref

First order algorithms in variational image processing

Author: Burger Martin
Sawatzky Alex
Steidl Gabriele
Publication venue
Publication date: 01/01/2014
Field of study

Variational methods in imaging are nowadays developing towards a quite universal and flexible tool, allowing for highly successful approaches on tasks like denoising, deblurring, inpainting, segmentation, super-resolution, disparity, and optical flow estimation. The overall structure of such approaches is of the form

{\cal D}(Ku) + \alpha {\cal R} (u) \rightarrow \min_u

; where the functional

{\cal D}

is a data fidelity term also depending on some input data

f

and measuring the deviation of

Ku

from such and

{\cal R}

is a regularization functional. Moreover

K

is a (often linear) forward operator modeling the dependence of data on an underlying image, and

\alpha

is a positive regularization parameter. While

{\cal D}

is often smooth and (strictly) convex, the current practice almost exclusively uses nonsmooth regularization functionals. The majority of successful techniques is using nonsmooth and convex functionals like the total variation and generalizations thereof or

\ell_1

-norms of coefficients arising from scalar products with some frame system. The efficient solution of such variational problems in imaging demands for appropriate algorithms. Taking into account the specific structure as a sum of two very different terms to be minimized, splitting algorithms are a quite canonical choice. Consequently this field has revived the interest in techniques like operator splittings or augmented Lagrangians. Here we shall provide an overview of methods currently developed and recent results as well as some computational studies providing a comparison of different methods and also illustrating their success in applications.Comment: 60 pages, 33 figure

arXiv.org e-Print Archive

Kaiserslauterer uniweiter elektronischer Dokumentenserver

Communication-Efficient Distributed Optimization with Quantized Preconditioners

Author: Alimisis Foivos
Alistarh Dan
Davies Peter
Publication venue
Publication date: 01/01/2021
Field of study

We investigate fast and communication-efficient algorithms for the classic problem of minimizing a sum of strongly convex and smooth functions that are distributed among

n

different nodes, which can communicate using a limited number of bits. Most previous communication-efficient approaches for this problem are limited to first-order optimization, and therefore have \emph{linear} dependence on the condition number in their communication complexity. We show that this dependence is not inherent: communication-efficient methods can in fact have sublinear dependence on the condition number. For this, we design and analyze the first communication-efficient distributed variants of preconditioned gradient descent for Generalized Linear Models, and for Newton's method. Our results rely on a new technique for quantizing both the preconditioner and the descent direction at each step of the algorithms, while controlling their convergence rate. We also validate our findings experimentally, showing fast convergence and reduced communication

arXiv.org e-Print Archive

IST Austria: PubRep (Institute of Science and Technology)