research

Byzantine Stochastic Gradient Descent

Abstract

This paper studies the problem of distributed stochastic optimization in an adversarial setting where, out of the mm machines which allegedly compute stochastic gradients every iteration, an α\alpha-fraction are Byzantine, and can behave arbitrarily and adversarially. Our main result is a variant of stochastic gradient descent (SGD) which finds ε\varepsilon-approximate minimizers of convex functions in T=O~(1ε2m+α2ε2)T = \tilde{O}\big( \frac{1}{\varepsilon^2 m} + \frac{\alpha^2}{\varepsilon^2} \big) iterations. In contrast, traditional mini-batch SGD needs T=O(1ε2m)T = O\big( \frac{1}{\varepsilon^2 m} \big) iterations, but cannot tolerate Byzantine failures. Further, we provide a lower bound showing that, up to logarithmic factors, our algorithm is information-theoretically optimal both in terms of sampling complexity and time complexity

    Similar works