452 research outputs found

    Does Differential Privacy Prevent Backdoor Attacks in Practice?

    Full text link
    Differential Privacy (DP) was originally developed to protect privacy. However, it has recently been utilized to secure machine learning (ML) models from poisoning attacks, with DP-SGD receiving substantial attention. Nevertheless, a thorough investigation is required to assess the effectiveness of different DP techniques in preventing backdoor attacks in practice. In this paper, we investigate the effectiveness of DP-SGD and, for the first time in literature, examine PATE in the context of backdoor attacks. We also explore the role of different components of DP algorithms in defending against backdoor attacks and will show that PATE is effective against these attacks due to the bagging structure of the teacher models it employs. Our experiments reveal that hyperparameters and the number of backdoors in the training dataset impact the success of DP algorithms. Additionally, we propose Label-DP as a faster and more accurate alternative to DP-SGD and PATE. We conclude that while Label-DP algorithms generally offer weaker privacy protection, accurate hyper-parameter tuning can make them more effective than DP methods in defending against backdoor attacks while maintaining model accuracy

    An Experimental Study of Byzantine-Robust Aggregation Schemes in Federated Learning

    Full text link
    Byzantine-robust federated learning aims at mitigating Byzantine failures during the federated training process, where malicious participants may upload arbitrary local updates to the central server to degrade the performance of the global model. In recent years, several robust aggregation schemes have been proposed to defend against malicious updates from Byzantine clients and improve the robustness of federated learning. These solutions were claimed to be Byzantine-robust, under certain assumptions. Other than that, new attack strategies are emerging, striving to circumvent the defense schemes. However, there is a lack of systematic comparison and empirical study thereof. In this paper, we conduct an experimental study of Byzantine-robust aggregation schemes under different attacks using two popular algorithms in federated learning, FedSGD and FedAvg . We first survey existing Byzantine attack strategies and Byzantine-robust aggregation schemes that aim to defend against Byzantine attacks. We also propose a new scheme, ClippedClustering , to enhance the robustness of a clustering-based scheme by automatically clipping the updates. Then we provide an experimental evaluation of eight aggregation schemes in the scenario of five different Byzantine attacks. Our results show that these aggregation schemes sustain relatively high accuracy in some cases but are ineffective in others. In particular, our proposed ClippedClustering successfully defends against most attacks under independent and IID local datasets. However, when the local datasets are Non-IID, the performance of all the aggregation schemes significantly decreases. With Non-IID data, some of these aggregation schemes fail even in the complete absence of Byzantine clients. We conclude that the robustness of all the aggregation schemes is limited, highlighting the need for new defense strategies, in particular for Non-IID datasets.Comment: This paper has been accepted for publication in IEEE Transactions on Big Dat

    Taming Fat-Tailed ("Heavier-Tailed'' with Potentially Infinite Variance) Noise in Federated Learning

    Full text link
    A key assumption in most existing works on FL algorithms' convergence analysis is that the noise in stochastic first-order information has a finite variance. Although this assumption covers all light-tailed (i.e., sub-exponential) and some heavy-tailed noise distributions (e.g., log-normal, Weibull, and some Pareto distributions), it fails for many fat-tailed noise distributions (i.e., ``heavier-tailed'' with potentially infinite variance) that have been empirically observed in the FL literature. To date, it remains unclear whether one can design convergent algorithms for FL systems that experience fat-tailed noise. This motivates us to fill this gap in this paper by proposing an algorithmic framework called FAT-Clipping (\ul{f}ederated \ul{a}veraging with \ul{t}wo-sided learning rates and \ul{clipping}), which contains two variants: FAT-Clipping per-round (FAT-Clipping-PR) and FAT-Clipping per-iteration (FAT-Clipping-PI). Specifically, for the largest α∈(1,2]\alpha \in (1,2] such that the fat-tailed noise in FL still has a bounded α\alpha-moment, we show that both variants achieve O((mT)2−αα)\mathcal{O}((mT)^{\frac{2-\alpha}{\alpha}}) and O((mT)1−α3α−2)\mathcal{O}((mT)^{\frac{1-\alpha}{3\alpha-2}}) convergence rates in the strongly-convex and general non-convex settings, respectively, where mm and TT are the numbers of clients and communication rounds. Moreover, at the expense of more clipping operations compared to FAT-Clipping-PR, FAT-Clipping-PI further enjoys a linear speedup effect with respect to the number of local updates at each client and being lower-bound-matching (i.e., order-optimal). Collectively, our results advance the understanding of designing efficient algorithms for FL systems that exhibit fat-tailed first-order oracle information.Comment: Published as a conference paper at NeurIPS 202

    THE INTERPLAY BETWEEN PRIVACY AND FAIRNESS IN LEARNING AND DECISION MAKING PROBLEMS

    Get PDF
    The availability of large datasets and computational resources has driven significant progress in Artificial Intelligence (AI) and, especially,Machine Learning (ML). These advances have rendered AI systems instrumental for many decision making and policy operations involving individuals: they include assistance in legal decisions, lending, and hiring, as well determinations of resources and benefits, all of which have profound social and economic impacts. While data-driven systems have been successful in an increasing number of tasks, the use of rich datasets, combined with the adoption of black-box algorithms, has sparked concerns about how these systems operate. How much information these systems leak about the individuals whose data is used as input and how they handle biases and fairness issues are two of these critical concerns. While some people argue that privacy and fairness are in alignment, the majority instead believe these are two contrasting metrics. This thesis firstly studies the interaction between privacy and fairness in machine learning and decision problems. It focuses on the scenario when fairness and privacy are at odds and investigates different factors that can explain for such behaviors. It then proposes effective and efficient mitigation solutions to improve fairness under privacy constraints. In the second part, it analyzes the connection between fairness and other machine learning concepts such as model compression and adversarial robustness. Finally, it introduces a novel privacy concept and an initial implementation to protect such proposed users privacy at inference time

    Achieving Differential Privacy and Fairness in Machine Learning

    Get PDF
    Machine learning algorithms are used to make decisions in various applications, such as recruiting, lending and policing. These algorithms rely on large amounts of sensitive individual information to work properly. Hence, there are sociological concerns about machine learning algorithms on matters like privacy and fairness. Currently, many studies only focus on protecting individual privacy or ensuring fairness of algorithms separately without taking consideration of their connection. However, there are new challenges arising in privacy preserving and fairness-aware machine learning. On one hand, there is fairness within the private model, i.e., how to meet both privacy and fairness requirements simultaneously in machine learning algorithms. On the other hand, there is fairness between the private model and the non-private model, i.e., how to ensure the utility loss due to differential privacy is the same towards each group. The goal of this dissertation is to address challenging issues in privacy preserving and fairness-aware machine learning: achieving differential privacy with satisfactory utility and efficiency in complex and emerging tasks, using generative models to generate fair data and to assist fair classification, achieving both differential privacy and fairness simultaneously within the same model, and achieving equal utility loss w.r.t. each group between the private model and the non-private model. In this dissertation, we develop the following algorithms to address the above challenges. (1) We develop PrivPC and DPNE algorithms to achieve differential privacy in complex and emerging tasks of causal graph discovery and network embedding, respectively. (2) We develop the fair generative adversarial neural networks framework and three algorithms (FairGAN, FairGAN+ and CFGAN) to achieve fair data generation and classification through generative models based on different association-based and causation-based fairness notions. (3) We develop PFLR and PFLR* algorithms to simultaneously achieve both differential privacy and fairness in logistic regression. (4) We develop a DPSGD-F algorithm to remove the disparate impact of differential privacy on model accuracy w.r.t. each group
    • …
    corecore