1,435 research outputs found
Near-Optimal Decentralized Momentum Method for Nonconvex-PL Minimax Problems
Minimax optimization plays an important role in many machine learning tasks
such as generative adversarial networks (GANs) and adversarial training.
Although recently a wide variety of optimization methods have been proposed to
solve the minimax problems, most of them ignore the distributed setting where
the data is distributed on multiple workers. Meanwhile, the existing
decentralized minimax optimization methods rely on the strictly assumptions
such as (strongly) concavity and variational inequality conditions. In the
paper, thus, we propose an efficient decentralized momentum-based gradient
descent ascent (DM-GDA) method for the distributed nonconvex-PL minimax
optimization, which is nonconvex in primal variable and is nonconcave in dual
variable and satisfies the Polyak-Lojasiewicz (PL) condition. In particular,
our DM-GDA method simultaneously uses the momentum-based techniques to update
variables and estimate the stochastic gradients. Moreover, we provide a solid
convergence analysis for our DM-GDA method, and prove that it obtains a
near-optimal gradient complexity of for finding an
-stationary solution of the nonconvex-PL stochastic minimax problems,
which reaches the lower bound of nonconvex stochastic optimization. To the best
of our knowledge, we first study the decentralized algorithm for Nonconvex-PL
stochastic minimax optimization over a network.Comment: 31 page
Decentralized projected Riemannian gradient method for smooth optimization on compact submanifolds
We consider the problem of decentralized nonconvex optimization over a
compact submanifold, where each local agent's objective function defined by the
local dataset is smooth. Leveraging the powerful tool of proximal smoothness,
we establish local linear convergence of the projected gradient descent method
with unit step size for solving the consensus problem over the compact
manifold. This serves as the basis for analyzing decentralized algorithms on
manifolds. Then, we propose two decentralized methods, namely the decentralized
projected Riemannian gradient descent (DPRGD) and the decentralized projected
Riemannian gradient tracking (DPRGT) methods. We establish their convergence
rates of and , respectively, to
reach a stationary point. To the best of our knowledge, DPRGT is the first
decentralized algorithm to achieve exact convergence for solving decentralized
optimization over a compact manifold. The key ingredients in the proof are the
Lipschitz-type inequalities of the projection operator on the compact manifold
and smooth functions on the manifold, which could be of independent interest.
Finally, we demonstrate the effectiveness of our proposed methods compared to
state-of-the-art ones through numerical experiments on eigenvalue problems and
low-rank matrix completion.Comment: 32 page
Robust and Communication-Efficient Collaborative Learning
We consider a decentralized learning problem, where a set of computing nodes
aim at solving a non-convex optimization problem collaboratively. It is
well-known that decentralized optimization schemes face two major system
bottlenecks: stragglers' delay and communication overhead. In this paper, we
tackle these bottlenecks by proposing a novel decentralized and gradient-based
optimization algorithm named as QuanTimed-DSGD. Our algorithm stands on two
main ideas: (i) we impose a deadline on the local gradient computations of each
node at each iteration of the algorithm, and (ii) the nodes exchange quantized
versions of their local models. The first idea robustifies to straggling nodes
and the second alleviates communication efficiency. The key technical
contribution of our work is to prove that with non-vanishing noises for
quantization and stochastic gradients, the proposed method exactly converges to
the global optimal for convex loss functions, and finds a first-order
stationary point in non-convex scenarios. Our numerical evaluations of the
QuanTimed-DSGD on training benchmark datasets, MNIST and CIFAR-10, demonstrate
speedups of up to 3x in run-time, compared to state-of-the-art decentralized
optimization methods
- …