91,827 research outputs found

    Distributed Delayed Stochastic Optimization

    Full text link
    We analyze the convergence of gradient-based optimization algorithms that base their updates on delayed stochastic gradient information. The main application of our results is to the development of gradient-based distributed optimization algorithms where a master node performs parameter updates while worker nodes compute stochastic gradients based on local information in parallel, which may give rise to delays due to asynchrony. We take motivation from statistical problems where the size of the data is so large that it cannot fit on one computer; with the advent of huge datasets in biology, astronomy, and the internet, such problems are now common. Our main contribution is to show that for smooth stochastic problems, the delays are asymptotically negligible and we can achieve order-optimal convergence results. In application to distributed optimization, we develop procedures that overcome communication bottlenecks and synchronization requirements. We show nn-node architectures whose optimization error in stochastic problems---in spite of asynchronous delays---scales asymptotically as \order(1 / \sqrt{nT}) after TT iterations. This rate is known to be optimal for a distributed system with nn nodes even in the absence of delays. We additionally complement our theoretical results with numerical experiments on a statistical machine learning task.Comment: 27 pages, 4 figure

    Bregman Proximal Method for Efficient Communications under Similarity

    Full text link
    We propose a novel distributed method for monotone variational inequalities and convex-concave saddle point problems arising in various machine learning applications such as game theory and adversarial training. By exploiting \textit{similarity} our algorithm overcomes communication bottleneck which is a major issue in distributed optimization. The proposed algorithm enjoys optimal communication complexity of δ/ϵ\delta/\epsilon, where ϵ\epsilon measures the non-optimality gap function, and δ\delta is a parameter of similarity. All the existing distributed algorithms achieving this bound essentially utilize the Euclidean setup. In contrast to them, our algorithm is built upon Bregman proximal maps and it is compatible with an arbitrary Bregman divergence. Thanks to this, it has more flexibility to fit the problem geometry than algorithms with the Euclidean setup. Thereby the proposed method bridges the gap between the Euclidean and non-Euclidean setting. By using the restart technique, we extend our algorithm to variational inequalities with μ\mu-strongly monotone operator, resulting in optimal communication complexity of δ/μ\delta/\mu (up to a logarithmic factor). Our theoretical results are confirmed by numerical experiments on a stochastic matrix game.Comment: 14 page
    • …
    corecore