557 research outputs found
Federated Variance-Reduced Stochastic Gradient Descent with Robustness to Byzantine Attacks
This paper deals with distributed finite-sum optimization for learning over
networks in the presence of malicious Byzantine attacks. To cope with such
attacks, most resilient approaches so far combine stochastic gradient descent
(SGD) with different robust aggregation rules. However, the sizeable
SGD-induced stochastic gradient noise makes it challenging to distinguish
malicious messages sent by the Byzantine attackers from noisy stochastic
gradients sent by the 'honest' workers. This motivates us to reduce the
variance of stochastic gradients as a means of robustifying SGD in the presence
of Byzantine attacks. To this end, the present work puts forth a Byzantine
attack resilient distributed (Byrd-) SAGA approach for learning tasks involving
finite-sum optimization over networks. Rather than the mean employed by
distributed SAGA, the novel Byrd- SAGA relies on the geometric median to
aggregate the corrected stochastic gradients sent by the workers. When less
than half of the workers are Byzantine attackers, the robustness of geometric
median to outliers enables Byrd-SAGA to attain provably linear convergence to a
neighborhood of the optimal solution, with the asymptotic learning error
determined by the number of Byzantine workers. Numerical tests corroborate the
robustness to various Byzantine attacks, as well as the merits of Byrd- SAGA
over Byzantine attack resilient distributed SGD
Prox-DBRO-VR: A Unified Analysis on Decentralized Byzantine-Resilient Composite Stochastic Optimization with Variance Reduction and Non-Asymptotic Convergence Rates
Decentralized Byzantine-resilient stochastic gradient algorithms resolve
efficiently large-scale optimization problems in adverse conditions, such as
malfunctioning agents, software bugs, and cyber attacks. This paper targets on
handling a class of generic composite optimization problems over multi-agent
cyberphysical systems (CPSs), with the existence of an unknown number of
Byzantine agents. Based on the proximal mapping method, two variance-reduced
(VR) techniques, and a norm-penalized approximation strategy, we propose a
decentralized Byzantine-resilient and proximal-gradient algorithmic framework,
dubbed Prox-DBRO-VR, which achieves an optimization and control goal using only
local computations and communications. To reduce asymptotically the variance
generated by evaluating the noisy stochastic gradients, we incorporate two
localized variance-reduced techniques (SAGA and LSVRG) into Prox-DBRO-VR, to
design Prox-DBRO-SAGA and Prox-DBRO-LSVRG. Via analyzing the contraction
relationships among the gradient-learning error, robust consensus condition,
and optimal gap, the theoretical result demonstrates that both Prox-DBRO-SAGA
and Prox-DBRO-LSVRG, with a well-designed constant (resp., decaying) step-size,
converge linearly (resp., sub-linearly) inside an error ball around the optimal
solution to the optimization problem under standard assumptions. The trade-offs
between the convergence accuracy and the number of Byzantine agents in both
linear and sub-linear cases are characterized. In simulation, the effectiveness
and practicability of the proposed algorithms are manifested via resolving a
sparse machine-learning problem over multi-agent CPSs under various Byzantine
attacks.Comment: 14 pages, 0 figure
Genuinely Distributed Byzantine Machine Learning
Machine Learning (ML) solutions are nowadays distributed, according to the
so-called server/worker architecture. One server holds the model parameters
while several workers train the model. Clearly, such architecture is prone to
various types of component failures, which can be all encompassed within the
spectrum of a Byzantine behavior. Several approaches have been proposed
recently to tolerate Byzantine workers. Yet all require trusting a central
parameter server. We initiate in this paper the study of the ``general''
Byzantine-resilient distributed machine learning problem where no individual
component is trusted.
We show that this problem can be solved in an asynchronous system, despite
the presence of Byzantine parameter servers and
Byzantine workers (which is optimal). We present a new algorithm, ByzSGD, which
solves the general Byzantine-resilient distributed machine learning problem by
relying on three major schemes. The first, Scatter/Gather, is a communication
scheme whose goal is to bound the maximum drift among models on correct
servers. The second, Distributed Median Contraction (DMC), leverages the
geometric properties of the median in high dimensional spaces to bring
parameters within the correct servers back close to each other, ensuring
learning convergence. The third, Minimum-Diameter Averaging (MDA), is a
statistically-robust gradient aggregation rule whose goal is to tolerate
Byzantine workers. MDA requires loose bound on the variance of non-Byzantine
gradient estimates, compared to existing alternatives (e.g., Krum).
Interestingly, ByzSGD ensures Byzantine resilience without adding communication
rounds (on a normal path), compared to vanilla non-Byzantine alternatives.
ByzSGD requires, however, a larger number of messages which, we show, can be
reduced if we assume synchrony.Comment: This is a merge of arXiv:1905.03853 and arXiv:1911.07537;
arXiv:1911.07537 will be retracte
Finite-Time Resilient Formation Control with Bounded Inputs
In this paper we consider the problem of a multi-agent system achieving a
formation in the presence of misbehaving or adversarial agents. We introduce a
novel continuous time resilient controller to guarantee that normally behaving
agents can converge to a formation with respect to a set of leaders. The
controller employs a norm-based filtering mechanism, and unlike most prior
algorithms, also incorporates input bounds. In addition, the controller is
shown to guarantee convergence in finite time. A sufficient condition for the
controller to guarantee convergence is shown to be a graph theoretical
structure which we denote as Resilient Directed Acyclic Graph (RDAG). Further,
we employ our filtering mechanism on a discrete time system which is shown to
have exponential convergence. Our results are demonstrated through simulations
- …