3 research outputs found
A Robust Gradient Tracking Method for Distributed Optimization over Directed Networks
In this paper, we consider the problem of distributed consensus optimization
over multi-agent networks with directed network topology. Assuming each agent
has a local cost function that is smooth and strongly convex, the global
objective is to minimize the average of all the local cost functions. To solve
the problem, we introduce a robust gradient tracking method (R-Push-Pull)
adapted from the recently proposed Push-Pull/AB algorithm. R-Push-Pull inherits
the advantages of Push-Pull and enjoys linear convergence to the optimal
solution with exact communication. Under noisy information exchange,
R-Push-Pull is more robust than the existing gradient tracking based
algorithms; the solutions obtained by each agent reach a neighborhood of the
optimum in expectation exponentially fast under a constant stepsize policy. We
provide a numerical example that demonstrate the effectiveness of R-Push-Pull
On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized Optimization
In decentralized optimization, it is common algorithmic practice to have
nodes interleave (local) gradient descent iterations with gossip (i.e.
averaging over the network) steps. Motivated by the training of large-scale
machine learning models, it is also increasingly common to require that
messages be {\em lossy compressed} versions of the local parameters. In this
paper, we show that, in such compressed decentralized optimization settings,
there are benefits to having {\em multiple} gossip steps between subsequent
gradient iterations, even when the cost of doing so is appropriately accounted
for e.g. by means of reducing the precision of compressed information. In
particular, we show that having gradient iterations
{with constant step size} - and gossip steps
between every pair of these iterations - enables convergence to within
of the optimal value for smooth non-convex objectives satisfying
Polyak-\L{}ojasiewicz condition. This result also holds for smooth strongly
convex objectives. To our knowledge, this is the first work that derives
convergence results for nonconvex optimization under arbitrary communication
compression
A Primal-Dual SGD Algorithm for Distributed Nonconvex Optimization
The distributed nonconvex optimization problem of minimizing a global cost
function formed by a sum of local cost functions by using local information
exchange is considered. This problem is an important component of many machine
learning techniques with data parallelism, such as deep learning and federated
learning. We propose a distributed primal-dual stochastic gradient descent
(SGD) algorithm, suitable for arbitrarily connected communication networks and
any smooth (possibly nonconvex) cost functions. We show that the proposed
algorithm achieves the linear speedup convergence rate
for general nonconvex cost functions and
{\color{blue} the linear speedup convergence rate } when
the global cost function satisfies the Polyak-{\L}ojasiewicz (P-L) condition,
where is the total number of iterations. We also show that the output of
the proposed algorithm with fixed parameters linearly converges to a
neighborhood of a global optimum. We demonstrate through numerical experiments
the efficiency of our algorithm in comparison with the baseline centralized SGD
and recently proposed distributed SGD algorithms.Comment: arXiv admin note: text overlap with arXiv:1912.1211