91 research outputs found

    Asynchrony and Acceleration in Gossip Algorithms

    Get PDF
    This paper considers the minimization of a sum of smooth and strongly convex functions dispatched over the nodes of a communication network. Previous works on the subject either focus on synchronous algorithms, which can be heavily slowed down by a few slow nodes (the straggler problem), or consider a model of asynchronous operation (Boyd et al., 2006) in which adjacent nodes communicate at the instants of Poisson point processes. We have two main contributions. 1) We propose CACDM (a Continuously Accelerated Coordinate Dual Method), and for the Poisson model of asynchronous operation, we prove CACDM to converge to optimality at an accelerated convergence rate in the sense of Nesterov et Stich, 2017. In contrast, previously proposed asynchronous algorithms have not been proven to achieve such accelerated rate. While CACDM is based on discrete updates, the proof of its convergence crucially depends on a continuous time analysis. 2) We introduce a new communication scheme based on Loss-Networks, that is programmable in a fully asynchronous and decentralized way, unlike the Poisson model of asynchronous operation that does not capture essential aspects of asynchrony such as non-instantaneous communications and computations. Under this Loss-Network model of asynchrony, we establish for CDM (a Coordinate Dual Method) a rate of convergence in terms of the eigengap of the Laplacian of the graph weighted by local effective delays. We believe this eigengap to be a fundamental bottleneck for convergence rates of asynchronous optimization. Finally, we verify empirically that CACDM enjoys an accelerated convergence rate in the Loss-Network model of asynchrony

    A Coordinate Descent Primal-Dual Algorithm and Application to Distributed Asynchronous Optimization

    Get PDF
    Based on the idea of randomized coordinate descent of α\alpha-averaged operators, a randomized primal-dual optimization algorithm is introduced, where a random subset of coordinates is updated at each iteration. The algorithm builds upon a variant of a recent (deterministic) algorithm proposed by V\~u and Condat that includes the well known ADMM as a particular case. The obtained algorithm is used to solve asynchronously a distributed optimization problem. A network of agents, each having a separate cost function containing a differentiable term, seek to find a consensus on the minimum of the aggregate objective. The method yields an algorithm where at each iteration, a random subset of agents wake up, update their local estimates, exchange some data with their neighbors, and go idle. Numerical results demonstrate the attractive performance of the method. The general approach can be naturally adapted to other situations where coordinate descent convex optimization algorithms are used with a random choice of the coordinates.Comment: 10 page

    Randomized iterative methods for linear systems: momentum, inexactness and gossip

    Get PDF
    In the era of big data, one of the key challenges is the development of novel optimization algorithms that can accommodate vast amounts of data while at the same time satisfying constraints and limitations of the problem under study. The need to solve optimization problems is ubiquitous in essentially all quantitative areas of human endeavour, including industry and science. In the last decade there has been a surge in the demand from practitioners, in fields such as machine learning, computer vision, artificial intelligence, signal processing and data science, for new methods able to cope with these new large scale problems. In this thesis we are focusing on the design, complexity analysis and efficient implementations of such algorithms. In particular, we are interested in the development of randomized first order iterative methods for solving large scale linear systems, stochastic quadratic optimization problems and the distributed average consensus problem. In Chapter 2, we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual subspace ascent. This is the first time momentum variants of several of these methods are studied. We choose to perform our analysis in a setting in which all of the above methods are equivalent: convex quadratic problems. We prove global non-asymptotic linear convergence rates for all methods and various measures of success, including primal function values, primal iterates, and dual function values. We also show that the primal iterates converge at an accelerated linear rate in a somewhat weaker sense. This is the first time a linear rate is shown for the stochastic heavy ball method (i.e., stochastic gradient descent method with momentum). Under somewhat weaker conditions, we establish a sublinear convergence rate for Cesaro averages of primal iterates. Moreover, we propose a novel concept, which we call stochastic momentum, aimed at decreasing the cost of performing the momentum step. We prove linear convergence of several stochastic methods with stochastic momentum, and show that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum. Finally, we perform extensive numerical testing on artificial and real datasets. In Chapter 3, we present a convergence rate analysis of inexact variants of stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic subspace ascent. A common feature of these methods is that in their update rule a certain sub-problem needs to be solved exactly. We relax this requirement by allowing for the sub-problem to be solved inexactly. In particular, we propose and analyze inexact randomized iterative methods for solving three closely related problems: a convex stochastic quadratic optimization problem, a best approximation problem and its dual { a concave quadratic maximization problem. We provide iteration complexity results under several assumptions on the inexactness error. Inexact variants of many popular and some more exotic methods, including randomized block Kaczmarz, randomized Gaussian Kaczmarz and randomized block coordinate descent, can be cast as special cases. Finally, we present numerical experiments which demonstrate the benefits of allowing inexactness. When the data describing a given optimization problem is big enough, it becomes impossible to store it on a single machine. In such situations, it is usually preferable to distribute the data among the nodes of a cluster or a supercomputer. In one such setting the nodes cooperate to minimize the sum (or average) of private functions (convex or non-convex) stored at the nodes. Among the most popular protocols for solving this problem in a decentralized fashion (communication is allowed only between neighbours) are randomized gossip algorithms. In Chapter 4 we propose a new approach for the design and analysis of randomized gossip algorithms which can be used to solve the distributed average consensus problem, a fundamental problem in distributed computing, where each node of a network initially holds a number or vector, and the aim is to calculate the average of these objects by communicating only with its neighbours (connected nodes). The new approach consists in establishing new connections to recent literature on randomized iterative methods for solving large-scale linear systems. Our general framework recovers a comprehensive array of well-known gossip protocols as special cases and allow for the development of block and arbitrary sampling variants of all of these methods. In addition, we present novel and provably accelerated randomized gossip protocols where in each step all nodes of the network update their values using their own information but only a subset of them exchange messages. The accelerated protocols are the first randomized gossip algorithms that converge to consensus with a provably accelerated linear rate. The theoretical results are validated via computational testing on typical wireless sensor network topologies. Finally, in Chapter 5, we move towards a different direction and present the first randomized gossip algorithms for solving the average consensus problem while at the same time protecting the private values stored at the nodes as these may be sensitive. In particular, we develop and analyze three privacy preserving variants of the randomized pairwise gossip algorithm ("randomly pick an edge of the network and then replace the values stored at vertices of this edge by their average") first proposed by Boyd et al. [16] for solving the average consensus problem. The randomized methods we propose are all dual in nature. That is, they are designed to solve the dual of the best approximation optimization formulation of the average consensus problem. We call our three privacy preservation techniques "Binary Oracle", "ε -Gap Oracle" and "Controlled Noise Insertion". We give iteration complexity bounds for the proposed privacy preserving randomized gossip protocols and perform extensive numerical experiments

    CORE: Common Random Reconstruction for Distributed Optimization with Provable Low Communication Complexity

    Full text link
    With distributed machine learning being a prominent technique for large-scale machine learning tasks, communication complexity has become a major bottleneck for speeding up training and scaling up machine numbers. In this paper, we propose a new technique named Common randOm REconstruction(CORE), which can be used to compress the information transmitted between machines in order to reduce communication complexity without other strict conditions. Especially, our technique CORE projects the vector-valued information to a low-dimensional one through common random vectors and reconstructs the information with the same random noises after communication. We apply CORE to two distributed tasks, respectively convex optimization on linear models and generic non-convex optimization, and design new distributed algorithms, which achieve provably lower communication complexities. For example, we show for linear models CORE-based algorithm can encode the gradient vector to O(1)\mathcal{O}(1)-bits (against O(d)\mathcal{O}(d)), with the convergence rate not worse, preceding the existing results
    corecore