948 research outputs found

    Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization

    Get PDF
    International audienceWe consider the setting of distributed empirical risk minimization where multiple machines compute the gradients in parallel and a centralized server updates the model parameters. In order to reduce the number of communications required to reach a given accuracy, we propose a precon-ditioned accelerated gradient method where the preconditioning is done by solving a local optimization problem over a subsampled dataset at the server. The convergence rate of the method depends on the square root of the relative condition number between the global and local loss functions. We estimate the relative condition number for linear prediction models by studying uniform concentration of the Hessians over a bounded domain , which allows us to derive improved convergence rates for existing preconditioned gradient methods and our accelerated method. Experiments on real-world datasets illustrate the benefits of acceleration in the ill-conditioned regime

    Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization

    Get PDF
    We consider the setting of distributed empirical risk minimization where multiple machines compute the gradients in parallel and a centralized server updates the model parameters. In order to reduce the number of communications required to reach a given accuracy, we propose a \emph{preconditioned} accelerated gradient method where the preconditioning is done by solving a local optimization problem over a subsampled dataset at the server. The convergence rate of the method depends on the square root of the relative condition number between the global and local loss functions. We estimate the relative condition number for linear prediction models by studying \emph{uniform} concentration of the Hessians over a bounded domain, which allows us to derive improved convergence rates for existing preconditioned gradient methods and our accelerated method. Experiments on real-world datasets illustrate the benefits of acceleration in the ill-conditioned regime

    Hyperfast second-order local solvers for efficient statistically preconditioned distributed optimization

    Get PDF
    Statistical preconditioning enables fast methods for distributed large-scale empirical risk minimization problems. In this approach, multiple worker nodes compute gradients in parallel, which are then used by the central node to update the parameter by solving an auxiliary (preconditioned) smaller-scale optimization problem. The recently proposed Statistically Preconditioned Accelerated Gradient (SPAG) method [1] has complexity bounds superior to other such algorithms but requires an exact solution for computationally intensive auxiliary optimization problems at every iteration. In this paper, we propose an Inexact SPAG (InSPAG) and explicitly characterize the accuracy by which the corresponding auxiliary subproblem needs to be solved to guarantee the same convergence rate as the exact method. We build our results by first developing an inexact adaptive accelerated Bregman proximal gradient method for general optimization problems under relative smoothness and strong convexity assumptions, which may be of independent interest. Moreover, we explore the properties of the auxiliary problem in the InSPAG algorithm assuming Lipschitz third-order derivatives and strong convexity. For such problem class, we develop a linearly convergent Hyperfast second-order method and estimate the total complexity of the InSPAG method with hyperfast auxiliary problem solver. Finally, we illustrate the proposed method's practical efficiency by performing large-scale numerical experiments on logistic regression models. To the best of our knowledge, these are the first empirical results on implementing high-order methods on large-scale problems, as we work with data where the dimension is of the order of 3 million, and the number of samples is 700 million

    Fast minimum variance wavefront reconstruction for extremely large telescopes

    Full text link
    We present a new algorithm, FRiM (FRactal Iterative Method), aiming at the reconstruction of the optical wavefront from measurements provided by a wavefront sensor. As our application is adaptive optics on extremely large telescopes, our algorithm was designed with speed and best quality in mind. The latter is achieved thanks to a regularization which enforces prior statistics. To solve the regularized problem, we use the conjugate gradient method which takes advantage of the sparsity of the wavefront sensor model matrix and avoids the storage and inversion of a huge matrix. The prior covariance matrix is however non-sparse and we derive a fractal approximation to the Karhunen-Loeve basis thanks to which the regularization by Kolmogorov statistics can be computed in O(N) operations, N being the number of phase samples to estimate. Finally, we propose an effective preconditioning which also scales as O(N) and yields the solution in 5-10 conjugate gradient iterations for any N. The resulting algorithm is therefore O(N). As an example, for a 128 x 128 Shack-Hartmann wavefront sensor, FRiM appears to be more than 100 times faster than the classical vector-matrix multiplication method.Comment: to appear in the Journal of the Optical Society of America

    First order algorithms in variational image processing

    Get PDF
    Variational methods in imaging are nowadays developing towards a quite universal and flexible tool, allowing for highly successful approaches on tasks like denoising, deblurring, inpainting, segmentation, super-resolution, disparity, and optical flow estimation. The overall structure of such approaches is of the form D(Ku)+αR(u)→min⁥u{\cal D}(Ku) + \alpha {\cal R} (u) \rightarrow \min_u ; where the functional D{\cal D} is a data fidelity term also depending on some input data ff and measuring the deviation of KuKu from such and R{\cal R} is a regularization functional. Moreover KK is a (often linear) forward operator modeling the dependence of data on an underlying image, and α\alpha is a positive regularization parameter. While D{\cal D} is often smooth and (strictly) convex, the current practice almost exclusively uses nonsmooth regularization functionals. The majority of successful techniques is using nonsmooth and convex functionals like the total variation and generalizations thereof or ℓ1\ell_1-norms of coefficients arising from scalar products with some frame system. The efficient solution of such variational problems in imaging demands for appropriate algorithms. Taking into account the specific structure as a sum of two very different terms to be minimized, splitting algorithms are a quite canonical choice. Consequently this field has revived the interest in techniques like operator splittings or augmented Lagrangians. Here we shall provide an overview of methods currently developed and recent results as well as some computational studies providing a comparison of different methods and also illustrating their success in applications.Comment: 60 pages, 33 figure

    Communication-Efficient Distributed Optimization with Quantized Preconditioners

    Full text link
    We investigate fast and communication-efficient algorithms for the classic problem of minimizing a sum of strongly convex and smooth functions that are distributed among nn different nodes, which can communicate using a limited number of bits. Most previous communication-efficient approaches for this problem are limited to first-order optimization, and therefore have \emph{linear} dependence on the condition number in their communication complexity. We show that this dependence is not inherent: communication-efficient methods can in fact have sublinear dependence on the condition number. For this, we design and analyze the first communication-efficient distributed variants of preconditioned gradient descent for Generalized Linear Models, and for Newton's method. Our results rely on a new technique for quantizing both the preconditioner and the descent direction at each step of the algorithms, while controlling their convergence rate. We also validate our findings experimentally, showing fast convergence and reduced communication
    • 

    corecore