12 research outputs found
Fault-Tolerant Federated Reinforcement Learning with Theoretical Guarantee
The growing literature of Federated Learning (FL) has recently inspired
Federated Reinforcement Learning (FRL) to encourage multiple agents to
federatively build a better decision-making policy without sharing raw
trajectories. Despite its promising applications, existing works on FRL fail to
I) provide theoretical analysis on its convergence, and II) account for random
system failures and adversarial attacks. Towards this end, we propose the first
FRL framework the convergence of which is guaranteed and tolerant to less than
half of the participating agents being random system failures or adversarial
attackers. We prove that the sample efficiency of the proposed framework is
guaranteed to improve with the number of agents and is able to account for such
potential failures or attacks. All theoretical results are empirically verified
on various RL benchmark tasks.Comment: Published at NeurIPS 2021. Extended version with proofs and
additional experimental details and results. New version changes: reduced
file size of figures; added a diagram illustrating the problem setting; added
link to code on GitHub; modified proof for Theorem 6 (highlighted in red
ON ROBUST MACHINE LEARNING IN THE PRESENCE OF ADVERSARIES
In today\u27s highly connected world, the number of smart devices worldwide has increased exponentially. These devices generate huge amounts of real-time data, perform complicated computational tasks, and provide actionable information. Over the past decade, numerous machine learning approaches have been widely adopted to infer hidden information from this massive and complex data. Accuracy is not enough when developing machine learning systems for some crucial application domains. The safety and reliability guarantees on the underlying learning models are critical requirements as well. This in turn necessitates that the learned models be robust towards processing corrupted data. Data can be corrupted by adversarial attacks where the attack may consist of data taking arbitrary values adversely affecting the efficiency of the algorithm. An adversary can replace samples with erroneous or malicious samples such as false labels or arbitrary inputs. In this dissertation, we refer to this type of attack as attack on data. Moreover, with the rapid increase in the volume of the data, storing and processing all this data at a central location becomes computationally expensive. Therefore, utilizing a distributed system is warranted to distribute tasks across multiple machines (known as distributed learning). Improvement of the efficiency of the optimization algorithms with respect to computational and communication costs along with maintaining a high level of accuracy is critical in distributed learning. However, an attack can occur by replacing the transmitted data of the machines in the system with arbitrary values that may negatively impact the performance of the learning task. We refer to this attack as attack on devices. The aforementioned attack scenarios can significantly impact the accuracy of the results, thereby, negatively impacting the expected model outcome. Hence, the development of a new generation of systems that are robust to such adversarial attacks and provide provable performance guarantees is warranted. The goal of this dissertation is to develop learning algorithms that are robust to such adversarial attacks. In this dissertation, we propose learning algorithms that are robust to adversarial attacks under two frameworks: 1) supervised learning, where the true label of the samples are known; and 2) unsupervised learning, where the labels are not known. Although neural networks have gained widespread success, theoretical understanding of their performance is lacking. Therefore, in the first part of the dissertation (Chapter 2), we try to understand the inner workings of a neural network. We achieve this by learning the parameters of the network. In fact, we generalize the estimation procedure by considering the robustness aspect along with the parameter estimation in the presence of adversarial attacks (attack on data). We devise a learning algorithm to estimate the parameters (weight matrix and bias vector) of a single-layer neural network with rectified linear unit activation in the unsupervised learning framework where each output sample can potentially be an arbitrary outlier with a fixed probability. Our estimation algorithm uses gradient descent algorithms along with the median-based filter to mitigate the effect of the outliers. We further determine the number of samples required to estimate the parameters of the network in the presence of the outliers. Combining the use of distributed systems to solve large-scale problems with the recent success of deep learning, there has been a surge of development in the field of distributed learning. In fact, the research in this direction has been further catalyzed by the development of federated learning. Despite extensive research in this area, distributed learning faces the challenge of training a high-dimensional model in a distributed manner while maintaining robustness against adversarial attacks. Hence, in the second part of the dissertation (Chapters 3 and 4), we study the problem of distributed learning in the presence of adversarial nodes (attack on nodes). Specifically, we consider the worker-server architecture to minimize a global loss function under both the learning frameworks in the presence of adversarial nodes (Byzantines). Each honest node performs some computation based only on its own local data, then communicates with the central server that performs aggregation. However, an adversarial node may send arbitrary information to the central server. In Chapter 3, we consider robust distributed learning under the supervised learning framework. We propose a novel algorithm that combines the idea of variance-reduction with a filtering technique based on vector median to mitigate the effect of the Byzantines. We prove the convergence of the approach to a first-order stationary point. Further, in Chapter 4, we consider robust distributed learning under the unsupervised learning framework (robust clustering). We propose a novel algorithm that combines the idea of redundant data assignment with the paradigm of distributed clustering. We show that our proposed approaches obtain constant factor approximate solutions in the presence of adversarial nodes
Prox-DBRO-VR: A Unified Analysis on Decentralized Byzantine-Resilient Composite Stochastic Optimization with Variance Reduction and Non-Asymptotic Convergence Rates
Decentralized Byzantine-resilient stochastic gradient algorithms resolve
efficiently large-scale optimization problems in adverse conditions, such as
malfunctioning agents, software bugs, and cyber attacks. This paper targets on
handling a class of generic composite optimization problems over multi-agent
cyberphysical systems (CPSs), with the existence of an unknown number of
Byzantine agents. Based on the proximal mapping method, two variance-reduced
(VR) techniques, and a norm-penalized approximation strategy, we propose a
decentralized Byzantine-resilient and proximal-gradient algorithmic framework,
dubbed Prox-DBRO-VR, which achieves an optimization and control goal using only
local computations and communications. To reduce asymptotically the variance
generated by evaluating the noisy stochastic gradients, we incorporate two
localized variance-reduced techniques (SAGA and LSVRG) into Prox-DBRO-VR, to
design Prox-DBRO-SAGA and Prox-DBRO-LSVRG. Via analyzing the contraction
relationships among the gradient-learning error, robust consensus condition,
and optimal gap, the theoretical result demonstrates that both Prox-DBRO-SAGA
and Prox-DBRO-LSVRG, with a well-designed constant (resp., decaying) step-size,
converge linearly (resp., sub-linearly) inside an error ball around the optimal
solution to the optimization problem under standard assumptions. The trade-offs
between the convergence accuracy and the number of Byzantine agents in both
linear and sub-linear cases are characterized. In simulation, the effectiveness
and practicability of the proposed algorithms are manifested via resolving a
sparse machine-learning problem over multi-agent CPSs under various Byzantine
attacks.Comment: 14 pages, 0 figure
Federated Variance-Reduced Stochastic Gradient Descent with Robustness to Byzantine Attacks
This paper deals with distributed finite-sum optimization for learning over
networks in the presence of malicious Byzantine attacks. To cope with such
attacks, most resilient approaches so far combine stochastic gradient descent
(SGD) with different robust aggregation rules. However, the sizeable
SGD-induced stochastic gradient noise makes it challenging to distinguish
malicious messages sent by the Byzantine attackers from noisy stochastic
gradients sent by the 'honest' workers. This motivates us to reduce the
variance of stochastic gradients as a means of robustifying SGD in the presence
of Byzantine attacks. To this end, the present work puts forth a Byzantine
attack resilient distributed (Byrd-) SAGA approach for learning tasks involving
finite-sum optimization over networks. Rather than the mean employed by
distributed SAGA, the novel Byrd- SAGA relies on the geometric median to
aggregate the corrected stochastic gradients sent by the workers. When less
than half of the workers are Byzantine attackers, the robustness of geometric
median to outliers enables Byrd-SAGA to attain provably linear convergence to a
neighborhood of the optimal solution, with the asymptotic learning error
determined by the number of Byzantine workers. Numerical tests corroborate the
robustness to various Byzantine attacks, as well as the merits of Byrd- SAGA
over Byzantine attack resilient distributed SGD
Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker Assumptions and Communication Compression as a Cherry on the Top
Byzantine-robustness has been gaining a lot of attention due to the growth of
the interest in collaborative and federated learning. However, many fruitful
directions, such as the usage of variance reduction for achieving robustness
and communication compression for reducing communication costs, remain weakly
explored in the field. This work addresses this gap and proposes Byz-VR-MARINA
- a new Byzantine-tolerant method with variance reduction and compression. A
key message of our paper is that variance reduction is key to fighting
Byzantine workers more effectively. At the same time, communication compression
is a bonus that makes the process more communication efficient. We derive
theoretical convergence guarantees for Byz-VR-MARINA outperforming previous
state-of-the-art for general non-convex and Polyak-Lojasiewicz loss functions.
Unlike the concurrent Byzantine-robust methods with variance reduction and/or
compression, our complexity results are tight and do not rely on restrictive
assumptions such as boundedness of the gradients or limited compression.
Moreover, we provide the first analysis of a Byzantine-tolerant method
supporting non-uniform sampling of stochastic gradients. Numerical experiments
corroborate our theoretical findings.Comment: 41 pages, 6 figures. Changes in v2: few typos and inaccuracies were
fixed, more clarifications were added. Code:
https://github.com/SamuelHorvath/VR_Byzantin