91,827 research outputs found
Distributed Delayed Stochastic Optimization
We analyze the convergence of gradient-based optimization algorithms that
base their updates on delayed stochastic gradient information. The main
application of our results is to the development of gradient-based distributed
optimization algorithms where a master node performs parameter updates while
worker nodes compute stochastic gradients based on local information in
parallel, which may give rise to delays due to asynchrony. We take motivation
from statistical problems where the size of the data is so large that it cannot
fit on one computer; with the advent of huge datasets in biology, astronomy,
and the internet, such problems are now common. Our main contribution is to
show that for smooth stochastic problems, the delays are asymptotically
negligible and we can achieve order-optimal convergence results. In application
to distributed optimization, we develop procedures that overcome communication
bottlenecks and synchronization requirements. We show -node architectures
whose optimization error in stochastic problems---in spite of asynchronous
delays---scales asymptotically as \order(1 / \sqrt{nT}) after iterations.
This rate is known to be optimal for a distributed system with nodes even
in the absence of delays. We additionally complement our theoretical results
with numerical experiments on a statistical machine learning task.Comment: 27 pages, 4 figure
Bregman Proximal Method for Efficient Communications under Similarity
We propose a novel distributed method for monotone variational inequalities
and convex-concave saddle point problems arising in various machine learning
applications such as game theory and adversarial training. By exploiting
\textit{similarity} our algorithm overcomes communication bottleneck which is a
major issue in distributed optimization. The proposed algorithm enjoys optimal
communication complexity of , where measures the
non-optimality gap function, and is a parameter of similarity. All the
existing distributed algorithms achieving this bound essentially utilize the
Euclidean setup.
In contrast to them, our algorithm is built upon Bregman proximal maps and it
is compatible with an arbitrary Bregman divergence. Thanks to this, it has more
flexibility to fit the problem geometry than algorithms with the Euclidean
setup. Thereby the proposed method bridges the gap between the Euclidean and
non-Euclidean setting.
By using the restart technique, we extend our algorithm to variational
inequalities with -strongly monotone operator, resulting in optimal
communication complexity of (up to a logarithmic factor). Our
theoretical results are confirmed by numerical experiments on a stochastic
matrix game.Comment: 14 page
- …