15 research outputs found

    Chebyshev acceleration of iterative refinement

    Get PDF

    Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

    Full text link
    In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in O(1/t)O(1/\sqrt{t}), the structure of the communication network only impacts a second-order term in O(1/t)O(1/t), where tt is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a d1/4d^{1/4} multiplicative factor of the optimal convergence rate, where dd is the underlying dimension.Comment: 17 page

    On the Complexity of Finite-Sum Smooth Optimization under the Polyak-{\L}ojasiewicz Condition

    Full text link
    This paper considers the optimization problem of the form min⁥x∈Rdf(x)≜1n∑i=1nfi(x)\min_{{\bf x}\in{\mathbb R}^d} f({\bf x})\triangleq \frac{1}{n}\sum_{i=1}^n f_i({\bf x}), where f(⋅)f(\cdot) satisfies the Polyak--{\L}ojasiewicz (PL) condition with parameter ÎŒ\mu and {fi(⋅)}i=1n\{f_i(\cdot)\}_{i=1}^n is LL-mean-squared smooth. We show that any gradient method requires at least Ω(n+Îșnlog⁥(1/Ï”))\Omega(n+\kappa\sqrt{n}\log(1/\epsilon)) incremental first-order oracle (IFO) calls to find an Ï”\epsilon-suboptimal solution, where Îș≜L/ÎŒ\kappa\triangleq L/\mu is the condition number of the problem. This result nearly matches upper bounds of IFO complexity for best-known first-order methods. We also study the problem of minimizing the PL function in the distributed setting such that the individuals f1(⋅),
,fn(⋅)f_1(\cdot),\dots,f_n(\cdot) are located on a connected network of nn agents. We provide lower bounds of Ω(Îș/γ log⁥(1/Ï”))\Omega(\kappa/\sqrt{\gamma}\,\log(1/\epsilon)), Ω((Îș+τÎș/γ )log⁥(1/Ï”))\Omega((\kappa+\tau\kappa/\sqrt{\gamma}\,)\log(1/\epsilon)) and Ω(n+Îșnlog⁥(1/Ï”))\Omega\big(n+\kappa\sqrt{n}\log(1/\epsilon)\big) for communication rounds, time cost and local first-order oracle calls respectively, where γ∈(0,1]\gamma\in(0,1] is the spectral gap of the mixing matrix associated with the network and~τ>0\tau>0 is the time cost of per communication round. Furthermore, we propose a decentralized first-order method that nearly matches above lower bounds in expectation

    Accelerated Gossip in Networks of Given Dimension using Jacobi Polynomial Iterations

    Get PDF
    Consider a network of agents connected by communication links, where each agent holds a real value. The gossip problem consists in estimating the average of the values diffused in the network in a distributed manner. We develop a method solving the gossip problem that depends only on the spectral dimension of the network, that is, in the communication network set-up, the dimension of the space in which the agents live. This contrasts with previous work that required the spectral gap of the network as a parameter, or suffered from slow mixing. Our method shows an important improvement over existing algorithms in the non-asymptotic regime, i.e., when the values are far from being fully mixed in the network. Our approach stems from a polynomial-based point of view on gossip algorithms, as well as an approximation of the spectral measure of the graphs with a Jacobi measure. We show the power of the approach with simulations on various graphs, and with performance guarantees on graphs of known spectral dimension, such as grids and random percolation bonds. An extension of this work to distributed Laplacian solvers is discussed. As a side result, we also use the polynomial-based point of view to show the convergence of the message passing algorithm for gossip of Moallemi \& Van Roy on regular graphs. The explicit computation of the rate of the convergence shows that message passing has a slow rate of convergence on graphs with small spectral gap

    Optimal Accelerated Variance Reduced EXTRA and DIGing for Strongly Convex and Smooth Decentralized Optimization

    Full text link
    We study stochastic decentralized optimization for the problem of training machine learning models with large-scale distributed data. We extend the famous EXTRA and DIGing methods with accelerated variance reduction (VR), and propose two methods, which require the time of O((nÎșs+n)log⁥1Ï”)O((\sqrt{n\kappa_s}+n)\log\frac{1}{\epsilon}) stochastic gradient evaluations and O(ÎșbÎșclog⁥1Ï”)O(\sqrt{\kappa_b\kappa_c}\log\frac{1}{\epsilon}) communication rounds to reach precision Ï”\epsilon, where Îșs\kappa_s and Îșb\kappa_b are the stochastic condition number and batch condition number for strongly convex and smooth problems, Îșc\kappa_c is the condition number of the communication network, and nn is the sample size on each distributed node. Our stochastic gradient computation complexity is the same as the single-machine accelerated variance reduction methods, such as Katyusha, and our communication complexity is the same as the accelerated full batch decentralized methods, such as MSDA, and they are both optimal. We also propose the non-accelerated VR based EXTRA and DIGing, and provide explicit complexities, for example, the O((Îșs+n)log⁥1Ï”)O((\kappa_s+n)\log\frac{1}{\epsilon}) stochastic gradient computation complexity and the O((Îșb+Îșc)log⁥1Ï”)O((\kappa_b+\kappa_c)\log\frac{1}{\epsilon}) communication complexity for the VR based EXTRA. The two complexities are also the same as the ones of single-machine VR methods, such as SAG, SAGA, and SVRG, and the non-accelerated full batch decentralized methods, such as EXTRA, respectively

    Optimal algorithms for smooth and strongly convex distributed optimization in networks

    Get PDF
    In this paper, we determine the optimal convergence rates for strongly convex and smooth distributed optimization in two settings: centralized and decentralized communications over a network. For centralized (i.e. master/slave) algorithms, we show that distributing Nesterov's accelerated gradient descent is optimal and achieves a precision Δ>0\varepsilon > 0 in time O(Îșg(1+Δτ)ln⁥(1/Δ))O(\sqrt{\kappa_g}(1+\Delta\tau)\ln(1/\varepsilon)), where Îșg\kappa_g is the condition number of the (global) function to optimize, Δ\Delta is the diameter of the network, and τ\tau (resp. 11) is the time needed to communicate values between two neighbors (resp. perform local computations). For decentralized algorithms based on gossip, we provide the first optimal algorithm, called the multi-step dual accelerated (MSDA) method, that achieves a precision Δ>0\varepsilon > 0 in time O(Îșl(1+Ï„Îł)ln⁥(1/Δ))O(\sqrt{\kappa_l}(1+\frac{\tau}{\sqrt{\gamma}})\ln(1/\varepsilon)), where Îșl\kappa_l is the condition number of the local functions and Îł\gamma is the (normalized) eigengap of the gossip matrix used for communication between nodes. We then verify the efficiency of MSDA against state-of-the-art methods for two problems: least-squares regression and classification by logistic regression
    corecore