This paper studies the problem of distributed stochastic optimization in an
adversarial setting where, out of the m machines which allegedly compute
stochastic gradients every iteration, an α-fraction are Byzantine, and
can behave arbitrarily and adversarially. Our main result is a variant of
stochastic gradient descent (SGD) which finds ε-approximate
minimizers of convex functions in T=O~(ε2m1​+ε2α2​) iterations. In contrast, traditional
mini-batch SGD needs T=O(ε2m1​) iterations,
but cannot tolerate Byzantine failures. Further, we provide a lower bound
showing that, up to logarithmic factors, our algorithm is
information-theoretically optimal both in terms of sampling complexity and time
complexity