948 research outputs found
Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization
International audienceWe consider the setting of distributed empirical risk minimization where multiple machines compute the gradients in parallel and a centralized server updates the model parameters. In order to reduce the number of communications required to reach a given accuracy, we propose a precon-ditioned accelerated gradient method where the preconditioning is done by solving a local optimization problem over a subsampled dataset at the server. The convergence rate of the method depends on the square root of the relative condition number between the global and local loss functions. We estimate the relative condition number for linear prediction models by studying uniform concentration of the Hessians over a bounded domain , which allows us to derive improved convergence rates for existing preconditioned gradient methods and our accelerated method. Experiments on real-world datasets illustrate the benefits of acceleration in the ill-conditioned regime
Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization
We consider the setting of distributed empirical risk minimization where
multiple machines compute the gradients in parallel and a centralized server
updates the model parameters. In order to reduce the number of communications
required to reach a given accuracy, we propose a \emph{preconditioned}
accelerated gradient method where the preconditioning is done by solving a
local optimization problem over a subsampled dataset at the server. The
convergence rate of the method depends on the square root of the relative
condition number between the global and local loss functions. We estimate the
relative condition number for linear prediction models by studying
\emph{uniform} concentration of the Hessians over a bounded domain, which
allows us to derive improved convergence rates for existing preconditioned
gradient methods and our accelerated method. Experiments on real-world datasets
illustrate the benefits of acceleration in the ill-conditioned regime
Hyperfast second-order local solvers for efficient statistically preconditioned distributed optimization
Statistical preconditioning enables fast methods for distributed large-scale empirical risk minimization problems. In this approach, multiple worker nodes compute gradients in parallel, which are then used by the central node to update the parameter by solving an auxiliary (preconditioned) smaller-scale optimization problem. The recently proposed Statistically Preconditioned Accelerated Gradient (SPAG) method [1] has complexity bounds superior to other such algorithms but requires an exact solution for computationally intensive auxiliary optimization problems at every iteration. In this paper, we propose an Inexact SPAG (InSPAG) and explicitly characterize the accuracy by which the corresponding auxiliary subproblem needs to be solved to guarantee the same convergence rate as the exact method. We build our results by first developing an inexact adaptive accelerated Bregman proximal gradient method for general optimization problems under relative smoothness and strong convexity assumptions, which may be of independent interest. Moreover, we explore the properties of the auxiliary problem in the InSPAG algorithm assuming Lipschitz third-order derivatives and strong convexity. For such problem class, we develop a linearly convergent Hyperfast second-order method and estimate the total complexity of the InSPAG method with hyperfast auxiliary problem solver. Finally, we illustrate the proposed method's practical efficiency by performing large-scale numerical experiments on logistic regression models. To the best of our knowledge, these are the first empirical results on implementing high-order methods on large-scale problems, as we work with data where the dimension is of the order of 3 million, and the number of samples is 700 million
Fast minimum variance wavefront reconstruction for extremely large telescopes
We present a new algorithm, FRiM (FRactal Iterative Method), aiming at the
reconstruction of the optical wavefront from measurements provided by a
wavefront sensor. As our application is adaptive optics on extremely large
telescopes, our algorithm was designed with speed and best quality in mind. The
latter is achieved thanks to a regularization which enforces prior statistics.
To solve the regularized problem, we use the conjugate gradient method which
takes advantage of the sparsity of the wavefront sensor model matrix and avoids
the storage and inversion of a huge matrix. The prior covariance matrix is
however non-sparse and we derive a fractal approximation to the Karhunen-Loeve
basis thanks to which the regularization by Kolmogorov statistics can be
computed in O(N) operations, N being the number of phase samples to estimate.
Finally, we propose an effective preconditioning which also scales as O(N) and
yields the solution in 5-10 conjugate gradient iterations for any N. The
resulting algorithm is therefore O(N). As an example, for a 128 x 128
Shack-Hartmann wavefront sensor, FRiM appears to be more than 100 times faster
than the classical vector-matrix multiplication method.Comment: to appear in the Journal of the Optical Society of America
First order algorithms in variational image processing
Variational methods in imaging are nowadays developing towards a quite
universal and flexible tool, allowing for highly successful approaches on tasks
like denoising, deblurring, inpainting, segmentation, super-resolution,
disparity, and optical flow estimation. The overall structure of such
approaches is of the form ; where the functional is a data fidelity term also
depending on some input data and measuring the deviation of from such
and is a regularization functional. Moreover is a (often linear)
forward operator modeling the dependence of data on an underlying image, and
is a positive regularization parameter. While is often
smooth and (strictly) convex, the current practice almost exclusively uses
nonsmooth regularization functionals. The majority of successful techniques is
using nonsmooth and convex functionals like the total variation and
generalizations thereof or -norms of coefficients arising from scalar
products with some frame system. The efficient solution of such variational
problems in imaging demands for appropriate algorithms. Taking into account the
specific structure as a sum of two very different terms to be minimized,
splitting algorithms are a quite canonical choice. Consequently this field has
revived the interest in techniques like operator splittings or augmented
Lagrangians. Here we shall provide an overview of methods currently developed
and recent results as well as some computational studies providing a comparison
of different methods and also illustrating their success in applications.Comment: 60 pages, 33 figure
Communication-Efficient Distributed Optimization with Quantized Preconditioners
We investigate fast and communication-efficient algorithms for the classic
problem of minimizing a sum of strongly convex and smooth functions that are
distributed among different nodes, which can communicate using a limited
number of bits. Most previous communication-efficient approaches for this
problem are limited to first-order optimization, and therefore have
\emph{linear} dependence on the condition number in their communication
complexity. We show that this dependence is not inherent:
communication-efficient methods can in fact have sublinear dependence on the
condition number. For this, we design and analyze the first
communication-efficient distributed variants of preconditioned gradient descent
for Generalized Linear Models, and for Newton's method. Our results rely on a
new technique for quantizing both the preconditioner and the descent direction
at each step of the algorithms, while controlling their convergence rate. We
also validate our findings experimentally, showing fast convergence and reduced
communication
- âŠ