557 research outputs found

    Federated Variance-Reduced Stochastic Gradient Descent with Robustness to Byzantine Attacks

    Full text link
    This paper deals with distributed finite-sum optimization for learning over networks in the presence of malicious Byzantine attacks. To cope with such attacks, most resilient approaches so far combine stochastic gradient descent (SGD) with different robust aggregation rules. However, the sizeable SGD-induced stochastic gradient noise makes it challenging to distinguish malicious messages sent by the Byzantine attackers from noisy stochastic gradients sent by the 'honest' workers. This motivates us to reduce the variance of stochastic gradients as a means of robustifying SGD in the presence of Byzantine attacks. To this end, the present work puts forth a Byzantine attack resilient distributed (Byrd-) SAGA approach for learning tasks involving finite-sum optimization over networks. Rather than the mean employed by distributed SAGA, the novel Byrd- SAGA relies on the geometric median to aggregate the corrected stochastic gradients sent by the workers. When less than half of the workers are Byzantine attackers, the robustness of geometric median to outliers enables Byrd-SAGA to attain provably linear convergence to a neighborhood of the optimal solution, with the asymptotic learning error determined by the number of Byzantine workers. Numerical tests corroborate the robustness to various Byzantine attacks, as well as the merits of Byrd- SAGA over Byzantine attack resilient distributed SGD

    Prox-DBRO-VR: A Unified Analysis on Decentralized Byzantine-Resilient Composite Stochastic Optimization with Variance Reduction and Non-Asymptotic Convergence Rates

    Full text link
    Decentralized Byzantine-resilient stochastic gradient algorithms resolve efficiently large-scale optimization problems in adverse conditions, such as malfunctioning agents, software bugs, and cyber attacks. This paper targets on handling a class of generic composite optimization problems over multi-agent cyberphysical systems (CPSs), with the existence of an unknown number of Byzantine agents. Based on the proximal mapping method, two variance-reduced (VR) techniques, and a norm-penalized approximation strategy, we propose a decentralized Byzantine-resilient and proximal-gradient algorithmic framework, dubbed Prox-DBRO-VR, which achieves an optimization and control goal using only local computations and communications. To reduce asymptotically the variance generated by evaluating the noisy stochastic gradients, we incorporate two localized variance-reduced techniques (SAGA and LSVRG) into Prox-DBRO-VR, to design Prox-DBRO-SAGA and Prox-DBRO-LSVRG. Via analyzing the contraction relationships among the gradient-learning error, robust consensus condition, and optimal gap, the theoretical result demonstrates that both Prox-DBRO-SAGA and Prox-DBRO-LSVRG, with a well-designed constant (resp., decaying) step-size, converge linearly (resp., sub-linearly) inside an error ball around the optimal solution to the optimization problem under standard assumptions. The trade-offs between the convergence accuracy and the number of Byzantine agents in both linear and sub-linear cases are characterized. In simulation, the effectiveness and practicability of the proposed algorithms are manifested via resolving a sparse machine-learning problem over multi-agent CPSs under various Byzantine attacks.Comment: 14 pages, 0 figure

    Genuinely Distributed Byzantine Machine Learning

    Full text link
    Machine Learning (ML) solutions are nowadays distributed, according to the so-called server/worker architecture. One server holds the model parameters while several workers train the model. Clearly, such architecture is prone to various types of component failures, which can be all encompassed within the spectrum of a Byzantine behavior. Several approaches have been proposed recently to tolerate Byzantine workers. Yet all require trusting a central parameter server. We initiate in this paper the study of the ``general'' Byzantine-resilient distributed machine learning problem where no individual component is trusted. We show that this problem can be solved in an asynchronous system, despite the presence of 13\frac{1}{3} Byzantine parameter servers and 13\frac{1}{3} Byzantine workers (which is optimal). We present a new algorithm, ByzSGD, which solves the general Byzantine-resilient distributed machine learning problem by relying on three major schemes. The first, Scatter/Gather, is a communication scheme whose goal is to bound the maximum drift among models on correct servers. The second, Distributed Median Contraction (DMC), leverages the geometric properties of the median in high dimensional spaces to bring parameters within the correct servers back close to each other, ensuring learning convergence. The third, Minimum-Diameter Averaging (MDA), is a statistically-robust gradient aggregation rule whose goal is to tolerate Byzantine workers. MDA requires loose bound on the variance of non-Byzantine gradient estimates, compared to existing alternatives (e.g., Krum). Interestingly, ByzSGD ensures Byzantine resilience without adding communication rounds (on a normal path), compared to vanilla non-Byzantine alternatives. ByzSGD requires, however, a larger number of messages which, we show, can be reduced if we assume synchrony.Comment: This is a merge of arXiv:1905.03853 and arXiv:1911.07537; arXiv:1911.07537 will be retracte

    Finite-Time Resilient Formation Control with Bounded Inputs

    Full text link
    In this paper we consider the problem of a multi-agent system achieving a formation in the presence of misbehaving or adversarial agents. We introduce a novel continuous time resilient controller to guarantee that normally behaving agents can converge to a formation with respect to a set of leaders. The controller employs a norm-based filtering mechanism, and unlike most prior algorithms, also incorporates input bounds. In addition, the controller is shown to guarantee convergence in finite time. A sufficient condition for the controller to guarantee convergence is shown to be a graph theoretical structure which we denote as Resilient Directed Acyclic Graph (RDAG). Further, we employ our filtering mechanism on a discrete time system which is shown to have exponential convergence. Our results are demonstrated through simulations
    • …
    corecore