12 research outputs found

    Fault-Tolerant Federated Reinforcement Learning with Theoretical Guarantee

    Full text link
    The growing literature of Federated Learning (FL) has recently inspired Federated Reinforcement Learning (FRL) to encourage multiple agents to federatively build a better decision-making policy without sharing raw trajectories. Despite its promising applications, existing works on FRL fail to I) provide theoretical analysis on its convergence, and II) account for random system failures and adversarial attacks. Towards this end, we propose the first FRL framework the convergence of which is guaranteed and tolerant to less than half of the participating agents being random system failures or adversarial attackers. We prove that the sample efficiency of the proposed framework is guaranteed to improve with the number of agents and is able to account for such potential failures or attacks. All theoretical results are empirically verified on various RL benchmark tasks.Comment: Published at NeurIPS 2021. Extended version with proofs and additional experimental details and results. New version changes: reduced file size of figures; added a diagram illustrating the problem setting; added link to code on GitHub; modified proof for Theorem 6 (highlighted in red

    ON ROBUST MACHINE LEARNING IN THE PRESENCE OF ADVERSARIES

    Get PDF
    In today\u27s highly connected world, the number of smart devices worldwide has increased exponentially. These devices generate huge amounts of real-time data, perform complicated computational tasks, and provide actionable information. Over the past decade, numerous machine learning approaches have been widely adopted to infer hidden information from this massive and complex data. Accuracy is not enough when developing machine learning systems for some crucial application domains. The safety and reliability guarantees on the underlying learning models are critical requirements as well. This in turn necessitates that the learned models be robust towards processing corrupted data. Data can be corrupted by adversarial attacks where the attack may consist of data taking arbitrary values adversely affecting the efficiency of the algorithm. An adversary can replace samples with erroneous or malicious samples such as false labels or arbitrary inputs. In this dissertation, we refer to this type of attack as attack on data. Moreover, with the rapid increase in the volume of the data, storing and processing all this data at a central location becomes computationally expensive. Therefore, utilizing a distributed system is warranted to distribute tasks across multiple machines (known as distributed learning). Improvement of the efficiency of the optimization algorithms with respect to computational and communication costs along with maintaining a high level of accuracy is critical in distributed learning. However, an attack can occur by replacing the transmitted data of the machines in the system with arbitrary values that may negatively impact the performance of the learning task. We refer to this attack as attack on devices. The aforementioned attack scenarios can significantly impact the accuracy of the results, thereby, negatively impacting the expected model outcome. Hence, the development of a new generation of systems that are robust to such adversarial attacks and provide provable performance guarantees is warranted. The goal of this dissertation is to develop learning algorithms that are robust to such adversarial attacks. In this dissertation, we propose learning algorithms that are robust to adversarial attacks under two frameworks: 1) supervised learning, where the true label of the samples are known; and 2) unsupervised learning, where the labels are not known. Although neural networks have gained widespread success, theoretical understanding of their performance is lacking. Therefore, in the first part of the dissertation (Chapter 2), we try to understand the inner workings of a neural network. We achieve this by learning the parameters of the network. In fact, we generalize the estimation procedure by considering the robustness aspect along with the parameter estimation in the presence of adversarial attacks (attack on data). We devise a learning algorithm to estimate the parameters (weight matrix and bias vector) of a single-layer neural network with rectified linear unit activation in the unsupervised learning framework where each output sample can potentially be an arbitrary outlier with a fixed probability. Our estimation algorithm uses gradient descent algorithms along with the median-based filter to mitigate the effect of the outliers. We further determine the number of samples required to estimate the parameters of the network in the presence of the outliers. Combining the use of distributed systems to solve large-scale problems with the recent success of deep learning, there has been a surge of development in the field of distributed learning. In fact, the research in this direction has been further catalyzed by the development of federated learning. Despite extensive research in this area, distributed learning faces the challenge of training a high-dimensional model in a distributed manner while maintaining robustness against adversarial attacks. Hence, in the second part of the dissertation (Chapters 3 and 4), we study the problem of distributed learning in the presence of adversarial nodes (attack on nodes). Specifically, we consider the worker-server architecture to minimize a global loss function under both the learning frameworks in the presence of adversarial nodes (Byzantines). Each honest node performs some computation based only on its own local data, then communicates with the central server that performs aggregation. However, an adversarial node may send arbitrary information to the central server. In Chapter 3, we consider robust distributed learning under the supervised learning framework. We propose a novel algorithm that combines the idea of variance-reduction with a filtering technique based on vector median to mitigate the effect of the Byzantines. We prove the convergence of the approach to a first-order stationary point. Further, in Chapter 4, we consider robust distributed learning under the unsupervised learning framework (robust clustering). We propose a novel algorithm that combines the idea of redundant data assignment with the paradigm of distributed clustering. We show that our proposed approaches obtain constant factor approximate solutions in the presence of adversarial nodes

    Prox-DBRO-VR: A Unified Analysis on Decentralized Byzantine-Resilient Composite Stochastic Optimization with Variance Reduction and Non-Asymptotic Convergence Rates

    Full text link
    Decentralized Byzantine-resilient stochastic gradient algorithms resolve efficiently large-scale optimization problems in adverse conditions, such as malfunctioning agents, software bugs, and cyber attacks. This paper targets on handling a class of generic composite optimization problems over multi-agent cyberphysical systems (CPSs), with the existence of an unknown number of Byzantine agents. Based on the proximal mapping method, two variance-reduced (VR) techniques, and a norm-penalized approximation strategy, we propose a decentralized Byzantine-resilient and proximal-gradient algorithmic framework, dubbed Prox-DBRO-VR, which achieves an optimization and control goal using only local computations and communications. To reduce asymptotically the variance generated by evaluating the noisy stochastic gradients, we incorporate two localized variance-reduced techniques (SAGA and LSVRG) into Prox-DBRO-VR, to design Prox-DBRO-SAGA and Prox-DBRO-LSVRG. Via analyzing the contraction relationships among the gradient-learning error, robust consensus condition, and optimal gap, the theoretical result demonstrates that both Prox-DBRO-SAGA and Prox-DBRO-LSVRG, with a well-designed constant (resp., decaying) step-size, converge linearly (resp., sub-linearly) inside an error ball around the optimal solution to the optimization problem under standard assumptions. The trade-offs between the convergence accuracy and the number of Byzantine agents in both linear and sub-linear cases are characterized. In simulation, the effectiveness and practicability of the proposed algorithms are manifested via resolving a sparse machine-learning problem over multi-agent CPSs under various Byzantine attacks.Comment: 14 pages, 0 figure

    Federated Variance-Reduced Stochastic Gradient Descent with Robustness to Byzantine Attacks

    Full text link
    This paper deals with distributed finite-sum optimization for learning over networks in the presence of malicious Byzantine attacks. To cope with such attacks, most resilient approaches so far combine stochastic gradient descent (SGD) with different robust aggregation rules. However, the sizeable SGD-induced stochastic gradient noise makes it challenging to distinguish malicious messages sent by the Byzantine attackers from noisy stochastic gradients sent by the 'honest' workers. This motivates us to reduce the variance of stochastic gradients as a means of robustifying SGD in the presence of Byzantine attacks. To this end, the present work puts forth a Byzantine attack resilient distributed (Byrd-) SAGA approach for learning tasks involving finite-sum optimization over networks. Rather than the mean employed by distributed SAGA, the novel Byrd- SAGA relies on the geometric median to aggregate the corrected stochastic gradients sent by the workers. When less than half of the workers are Byzantine attackers, the robustness of geometric median to outliers enables Byrd-SAGA to attain provably linear convergence to a neighborhood of the optimal solution, with the asymptotic learning error determined by the number of Byzantine workers. Numerical tests corroborate the robustness to various Byzantine attacks, as well as the merits of Byrd- SAGA over Byzantine attack resilient distributed SGD

    Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker Assumptions and Communication Compression as a Cherry on the Top

    Full text link
    Byzantine-robustness has been gaining a lot of attention due to the growth of the interest in collaborative and federated learning. However, many fruitful directions, such as the usage of variance reduction for achieving robustness and communication compression for reducing communication costs, remain weakly explored in the field. This work addresses this gap and proposes Byz-VR-MARINA - a new Byzantine-tolerant method with variance reduction and compression. A key message of our paper is that variance reduction is key to fighting Byzantine workers more effectively. At the same time, communication compression is a bonus that makes the process more communication efficient. We derive theoretical convergence guarantees for Byz-VR-MARINA outperforming previous state-of-the-art for general non-convex and Polyak-Lojasiewicz loss functions. Unlike the concurrent Byzantine-robust methods with variance reduction and/or compression, our complexity results are tight and do not rely on restrictive assumptions such as boundedness of the gradients or limited compression. Moreover, we provide the first analysis of a Byzantine-tolerant method supporting non-uniform sampling of stochastic gradients. Numerical experiments corroborate our theoretical findings.Comment: 41 pages, 6 figures. Changes in v2: few typos and inaccuracies were fixed, more clarifications were added. Code: https://github.com/SamuelHorvath/VR_Byzantin
    corecore