36,425 research outputs found
Distributed Solution of Large-Scale Linear Systems via Accelerated Projection-Based Consensus
Solving a large-scale system of linear equations is a key step at the heart
of many algorithms in machine learning, scientific computing, and beyond. When
the problem dimension is large, computational and/or memory constraints make it
desirable, or even necessary, to perform the task in a distributed fashion. In
this paper, we consider a common scenario in which a taskmaster intends to
solve a large-scale system of linear equations by distributing subsets of the
equations among a number of computing machines/cores. We propose an accelerated
distributed consensus algorithm, in which at each iteration every machine
updates its solution by adding a scaled version of the projection of an error
signal onto the nullspace of its system of equations, and where the taskmaster
conducts an averaging over the solutions with momentum. The convergence
behavior of the proposed algorithm is analyzed in detail and analytically shown
to compare favorably with the convergence rate of alternative distributed
methods, namely distributed gradient descent, distributed versions of
Nesterov's accelerated gradient descent and heavy-ball method, the block
Cimmino method, and ADMM. On randomly chosen linear systems, as well as on
real-world data sets, the proposed method offers significant speed-up relative
to all the aforementioned methods. Finally, our analysis suggests a novel
variation of the distributed heavy-ball method, which employs a particular
distributed preconditioning, and which achieves the same theoretical
convergence rate as the proposed consensus-based method
Momentum and Stochastic Momentum for Stochastic Gradient, Newton, Proximal Point and Subspace Descent Methods
In this paper we study several classes of stochastic optimization algorithms
enriched with heavy ball momentum. Among the methods studied are: stochastic
gradient descent, stochastic Newton, stochastic proximal point and stochastic
dual subspace ascent. This is the first time momentum variants of several of
these methods are studied. We choose to perform our analysis in a setting in
which all of the above methods are equivalent. We prove global nonassymptotic
linear convergence rates for all methods and various measures of success,
including primal function values, primal iterates (in L2 sense), and dual
function values. We also show that the primal iterates converge at an
accelerated linear rate in the L1 sense. This is the first time a linear rate
is shown for the stochastic heavy ball method (i.e., stochastic gradient
descent method with momentum). Under somewhat weaker conditions, we establish a
sublinear convergence rate for Cesaro averages of primal iterates. Moreover, we
propose a novel concept, which we call stochastic momentum, aimed at decreasing
the cost of performing the momentum step. We prove linear convergence of
several stochastic methods with stochastic momentum, and show that in some
sparse data regimes and for sufficiently small momentum parameters, these
methods enjoy better overall complexity than methods with deterministic
momentum. Finally, we perform extensive numerical testing on artificial and
real datasets, including data coming from average consensus problems.Comment: 47 pages, 7 figures, 7 table
Distributed Solution of Large-Scale Linear Systems via Accelerated Projection-Based Consensus
Solving a large-scale system of linear equations is a key step at the heart of many algorithms in scientific computing, machine learning, and beyond. When the problem dimension is large, computational and/or memory constraints make it desirable, or even necessary, to perform the task in a distributed fashion. In this paper, we consider a common scenario in which a taskmaster intends to solve a large-scale system of linear equations by distributing subsets of the equations among a number of computing machines/cores. We propose a new algorithm called Accelerated Projection-based Consensus , in which at each iteration every machine updates its solution by adding a scaled version of the projection of an error signal onto the nullspace of its system of equations, and the taskmaster conducts an averaging over the solutions with momentum. The convergence behavior of the proposed algorithm is analyzed in detail and analytically shown to compare favorably with the convergence rate of alternative distributed methods, namely distributed gradient descent, distributed versions of Nesterov's accelerated gradient descent and heavy-ball method, the block Cimmino method, and Alternating Direction Method of Multipliers. On randomly chosen linear systems, as well as on real-world data sets, the proposed method offers significant speed-up relative to all the aforementioned methods. Finally, our analysis suggests a novel variation of the distributed heavy-ball method, which employs a particular distributed preconditioning and achieves the same theoretical convergence rate as that in the proposed consensus-based method
Accelerated Consensus via Min-Sum Splitting
We apply the Min-Sum message-passing protocol to solve the consensus problem
in distributed optimization. We show that while the ordinary Min-Sum algorithm
does not converge, a modified version of it known as Splitting yields
convergence to the problem solution. We prove that a proper choice of the
tuning parameters allows Min-Sum Splitting to yield subdiffusive accelerated
convergence rates, matching the rates obtained by shift-register methods. The
acceleration scheme embodied by Min-Sum Splitting for the consensus problem
bears similarities with lifted Markov chains techniques and with multi-step
first order methods in convex optimization
Multi-consensus Decentralized Accelerated Gradient Descent
This paper considers the decentralized optimization problem, which has
applications in large scale machine learning, sensor networks, and control
theory. We propose a novel algorithm that can achieve near optimal
communication complexity, matching the known lower bound up to a logarithmic
factor of the condition number of the problem. Our theoretical results give
affirmative answers to the open problem on whether there exists an algorithm
that can achieve a communication complexity (nearly) matching the lower bound
depending on the global condition number instead of the local one. Moreover,
the proposed algorithm achieves the optimal computation complexity matching the
lower bound up to universal constants. Furthermore, to achieve a linear
convergence rate, our algorithm \emph{doesn't} require the individual functions
to be (strongly) convex. Our method relies on a novel combination of known
techniques including Nesterov's accelerated gradient descent, multi-consensus
and gradient-tracking. The analysis is new, and may be applied to other related
problems. Empirical studies demonstrate the effectiveness of our method for
machine learning applications
- …