218 research outputs found
Trading Communication for Computation in Byzantine-Resilient Gradient Coding
We consider gradient coding in the presence of an adversary, controlling
so-called malicious workers trying to corrupt the computations. Previous works
propose the use of MDS codes to treat the inputs of the malicious workers as
errors and correct them using the error-correction properties of the code. This
comes at the expense of increasing the replication, i.e., the number of workers
each partial gradient is computed by. In this work, we reduce replication by
proposing a method that detects the erroneous inputs from the malicious
workers, hence transforming them into erasures. For malicious workers, our
solution can reduce the replication to instead of for each partial
gradient at the expense of only additional computations at the main node
and additional rounds of light communication between the main node and the
workers. We give fundamental limits of the general framework for fractional
repetition data allocation. Our scheme is optimal in terms of replication and
local computation but incurs a communication cost that is asymptotically, in
the size of the dataset, a multiplicative factor away from the derived bound
The Hidden Vulnerability of Distributed Learning in Byzantium
While machine learning is going through an era of celebrated success,
concerns have been raised about the vulnerability of its backbone: stochastic
gradient descent (SGD). Recent approaches have been proposed to ensure the
robustness of distributed SGD against adversarial (Byzantine) workers sending
poisoned gradients during the training phase. Some of these approaches have
been proven Byzantine-resilient: they ensure the convergence of SGD despite the
presence of a minority of adversarial workers.
We show in this paper that convergence is not enough. In high dimension , an adver\-sary can build on the loss function's non-convexity to make
SGD converge to ineffective models. More precisely, we bring to light that
existing Byzantine-resilient schemes leave a margin of poisoning of
, where increases at least like .
Based on this leeway, we build a simple attack, and experimentally show its
strong to utmost effectivity on CIFAR-10 and MNIST.
We introduce Bulyan, and prove it significantly reduces the attackers leeway
to a narrow bound. We empirically show that Bulyan
does not suffer the fragility of existing aggregation rules and, at a
reasonable cost in terms of required batch size, achieves convergence as if
only non-Byzantine gradients had been used to update the model.Comment: Accepted to ICML 2018 as a long tal
Making Byzantine Decentralized Learning Efficient
Decentralized-SGD (D-SGD) distributes heavy learning tasks across multiple
machines (a.k.a., {\em nodes}), effectively dividing the workload per node by
the size of the system. However, a handful of \emph{Byzantine} (i.e.,
misbehaving) nodes can jeopardize the entire learning procedure. This
vulnerability is further amplified when the system is \emph{asynchronous}.
Although approaches that confer Byzantine resilience to D-SGD have been
proposed, these significantly impact the efficiency of the process to the point
of even negating the benefit of decentralization. This naturally raises the
question: \emph{can decentralized learning simultaneously enjoy Byzantine
resilience and reduced workload per node?}
We answer positively by proposing \newalgorithm{} that ensures Byzantine
resilience without losing the computational efficiency of D-SGD. Essentially,
\newalgorithm{} weakens the impact of Byzantine nodes by reducing the variance
in local updates using \emph{Polyak's momentum}. Then, by establishing
coordination between nodes via {\em signed echo broadcast} and a {\em
nearest-neighbor averaging} scheme, we effectively tolerate Byzantine nodes
whilst distributing the overhead amongst the non-Byzantine nodes. To
demonstrate the correctness of our algorithm, we introduce and analyze a novel
{\em Lyapunov function} that accounts for the {\em non-Markovian model drift}
arising from the use of momentum. We also demonstrate the efficiency of
\newalgorithm{} through experiments on several image classification tasks.Comment: 63 pages,5 figure
On the sample complexity of adversarial multi-source PAC learning
We study the problem of learning from multiple untrusted data sources, a scenario of increasing practical relevance given the recent emergence of crowdsourcing and collaborative learning paradigms. Specifically, we analyze the situation in which a learning system obtains datasets from multiple sources, some of which might be biased or even adversarially perturbed. It is
known that in the single-source case, an adversary with the power to corrupt a fixed fraction of the training data can prevent PAC-learnability, that is, even in the limit of infinitely much training data, no learning system can approach the optimal test error. In this work we show that, surprisingly, the same is not true in the multi-source setting, where the adversary can arbitrarily
corrupt a fixed fraction of the data sources. Our main results are a generalization bound that provides finite-sample guarantees for this learning setting, as well as corresponding lower bounds. Besides establishing PAC-learnability our results also show that in a cooperative learning setting sharing data with other parties has provable benefits, even if some
participants are malicious
SPIRT: A Fault-Tolerant and Reliable Peer-to-Peer Serverless ML Training Architecture
The advent of serverless computing has ushered in notable advancements in
distributed machine learning, particularly within parameter server-based
architectures. Yet, the integration of serverless features within peer-to-peer
(P2P) distributed networks remains largely uncharted. In this paper, we
introduce SPIRT, a fault-tolerant, reliable, and secure serverless P2P ML
training architecture. designed to bridge this existing gap.
Capitalizing on the inherent robustness and reliability innate to P2P
systems, SPIRT employs RedisAI for in-database operations, leading to an 82\%
reduction in the time required for model updates and gradient averaging across
a variety of models and batch sizes. This architecture showcases resilience
against peer failures and adeptly manages the integration of new peers, thereby
highlighting its fault-tolerant characteristics and scalability. Furthermore,
SPIRT ensures secure communication between peers, enhancing the reliability of
distributed machine learning tasks. Even in the face of Byzantine attacks, the
system's robust aggregation algorithms maintain high levels of accuracy. These
findings illuminate the promising potential of serverless architectures in P2P
distributed machine learning, offering a significant stride towards the
development of more efficient, scalable, and resilient applications
- …