17,118 research outputs found
Stochastic Multi-Level Compositional Optimization Algorithms over Networks with Level-Independent Convergence Rate
Stochastic multi-level compositional optimization problems cover many new
machine learning paradigms, e.g., multi-step model-agnostic meta-learning,
which require efficient optimization algorithms for large-scale applications.
This paper studies the decentralized stochastic multi-level optimization
algorithm, which is challenging because the multi-level structure and
decentralized communication scheme may make the number of levels affect the
order of the convergence rate. To this end, we develop two novel decentralized
optimization algorithms to deal with the multi-level function and its gradient.
Our theoretical results show that both algorithms can achieve the
level-independent convergence rate for nonconvex problems under much milder
conditions compared with existing single-machine algorithms. To the best of our
knowledge, this is the first work that achieves the level-independent
convergence rate under the decentralized setting. Moreover, extensive
experiments confirm the efficacy of our proposed algorithms
An Accelerated Decentralized Stochastic Proximal Algorithm for Finite Sums
Modern large-scale finite-sum optimization relies on two key aspects:
distribution and stochastic updates. For smooth and strongly convex problems,
existing decentralized algorithms are slower than modern accelerated
variance-reduced stochastic algorithms when run on a single machine, and are
therefore not efficient. Centralized algorithms are fast, but their scaling is
limited by global aggregation steps that result in communication bottlenecks.
In this work, we propose an efficient \textbf{A}ccelerated
\textbf{D}ecentralized stochastic algorithm for \textbf{F}inite \textbf{S}ums
named ADFS, which uses local stochastic proximal updates and randomized
pairwise communications between nodes. On machines, ADFS learns from
samples in the same time it takes optimal algorithms to learn from samples
on one machine. This scaling holds until a critical network size is reached,
which depends on communication delays, on the number of samples , and on the
network topology. We provide a theoretical analysis based on a novel augmented
graph approach combined with a precise evaluation of synchronization times and
an extension of the accelerated proximal coordinate gradient algorithm to
arbitrary sampling. We illustrate the improvement of ADFS over state-of-the-art
decentralized approaches with experiments.Comment: Code available in source files. arXiv admin note: substantial text
overlap with arXiv:1901.0986
An Accelerated Decentralized Stochastic Proximal Algorithm for Finite Sums
Modern large-scale finite-sum optimization relies on two key aspects: distribution and stochastic updates. For smooth and strongly convex problems, existing decentralized algorithms are slower than modern accelerated variance-reduced stochastic algorithms when run on a single machine, and are therefore not efficient. Centralized algorithms are fast, but their scaling is limited by global aggregation steps that result in communication bottlenecks. In this work, we propose an efficient Accelerated Decentralized stochastic algorithm for Finite Sums named ADFS, which uses local stochastic proximal updates and randomized pairwise communications between nodes. On n machines, ADFS learns from nm samples in the same time it takes optimal algorithms to learn from m samples on one machine. This scaling holds until a critical network size is reached, which depends on communication delays, on the number of samples m, and on the network topology. We provide a theoretical analysis based on a novel augmented graph approach combined with a precise evaluation of synchronization times and an extension of the accelerated proximal coordinate gradient algorithm to arbitrary sampling. We illustrate the improvement of ADFS over state-of-the-art decentralized approaches with experiments
Optimal Complexity in Non-Convex Decentralized Learning over Time-Varying Networks
Decentralized optimization with time-varying networks is an emerging paradigm
in machine learning. It saves remarkable communication overhead in large-scale
deep training and is more robust in wireless scenarios especially when nodes
are moving. Federated learning can also be regarded as decentralized
optimization with time-varying communication patterns alternating between
global averaging and local updates.
While numerous studies exist to clarify its theoretical limits and develop
efficient algorithms, it remains unclear what the optimal complexity is for
non-convex decentralized stochastic optimization over time-varying networks.
The main difficulties lie in how to gauge the effectiveness when transmitting
messages between two nodes via time-varying communications, and how to
establish the lower bound when the network size is fixed (which is a
prerequisite in stochastic optimization). This paper resolves these challenges
and establish the first lower bound complexity. We also develop a new
decentralized algorithm to nearly attain the lower bound, showing the tightness
of the lower bound and the optimality of our algorithm.Comment: Accepted by 14th Annual Workshop on Optimization for Machine
Learning. arXiv admin note: text overlap with arXiv:2210.0786
A Variance-Reduced Stochastic Gradient Tracking Algorithm for Decentralized Optimization with Orthogonality Constraints
Decentralized optimization with orthogonality constraints is found widely in
scientific computing and data science. Since the orthogonality constraints are
nonconvex, it is quite challenging to design efficient algorithms. Existing
approaches leverage the geometric tools from Riemannian optimization to solve
this problem at the cost of high sample and communication complexities. To
relieve this difficulty, based on two novel techniques that can waive the
orthogonality constraints, we propose a variance-reduced stochastic gradient
tracking (VRSGT) algorithm with the convergence rate of to a
stationary point. To the best of our knowledge, VRSGT is the first algorithm
for decentralized optimization with orthogonality constraints that reduces both
sampling and communication complexities simultaneously. In the numerical
experiments, VRSGT has a promising performance in a real-world autonomous
driving application
- …