5,932 research outputs found

    Backdoor Learning for NLP: Recent Advances, Challenges, and Future Research Directions

    Full text link
    Although backdoor learning is an active research topic in the NLP domain, the literature lacks studies that systematically categorize and summarize backdoor attacks and defenses. To bridge the gap, we present a comprehensive and unifying study of backdoor learning for NLP by summarizing the literature in a systematic manner. We first present and motivate the importance of backdoor learning for building robust NLP systems. Next, we provide a thorough account of backdoor attack techniques, their applications, defenses against backdoor attacks, and various mitigation techniques to remove backdoor attacks. We then provide a detailed review and analysis of evaluation metrics, benchmark datasets, threat models, and challenges related to backdoor learning in NLP. Ultimately, our work aims to crystallize and contextualize the landscape of existing literature in backdoor learning for the text domain and motivate further research in the field. To this end, we identify troubling gaps in the literature and offer insights and ideas into open challenges and future research directions. Finally, we provide a GitHub repository with a list of backdoor learning papers that will be continuously updated at https://github.com/marwanomar1/Backdoor-Learning-for-NLP

    A new Backdoor Attack in CNNs by training set corruption without label poisoning

    Full text link
    Backdoor attacks against CNNs represent a new threat against deep learning systems, due to the possibility of corrupting the training set so to induce an incorrect behaviour at test time. To avoid that the trainer recognises the presence of the corrupted samples, the corruption of the training set must be as stealthy as possible. Previous works have focused on the stealthiness of the perturbation injected into the training samples, however they all assume that the labels of the corrupted samples are also poisoned. This greatly reduces the stealthiness of the attack, since samples whose content does not agree with the label can be identified by visual inspection of the training set or by running a pre-classification step. In this paper we present a new backdoor attack without label poisoning Since the attack works by corrupting only samples of the target class, it has the additional advantage that it does not need to identify beforehand the class of the samples to be attacked at test time. Results obtained on the MNIST digits recognition task and the traffic signs classification task show that backdoor attacks without label poisoning are indeed possible, thus raising a new alarm regarding the use of deep learning in security-critical applications

    Trembling triggers: exploring the sensitivity of backdoors in DNN-based face recognition

    Get PDF
    Abstract Backdoor attacks against supervised machine learning methods seek to modify the training samples in such a way that, at inference time, the presence of a specific pattern (trigger) in the input data causes misclassifications to a target class chosen by the adversary. Successful backdoor attacks have been presented in particular for face recognition systems based on deep neural networks (DNNs). These attacks were evaluated for identical triggers at training and inference time. However, the vulnerability to backdoor attacks in practice crucially depends on the sensitivity of the backdoored classifier to approximate trigger inputs. To assess this, we study the response of a backdoored DNN for face recognition to trigger signals that have been transformed with typical image processing operators of varying strength. Results for different kinds of geometric and color transformations suggest that in particular geometric misplacements and partial occlusions of the trigger limit the effectiveness of the backdoor attacks considered. Moreover, our analysis reveals that the spatial interaction of the trigger with the subject's face affects the success of the attack. Experiments with physical triggers inserted in live acquisitions validate the observed response of the DNN when triggers are inserted digitally

    Backdoor Attacks and Defences on Neural Networks

    Get PDF
    openIn recent years, we have seen an explosion of activity in deep learning in both academia and industry. Deep Neural Networks (DNNs) significantly outperform previous machine learning techniques in various domains, e.g., image recognition, speech processing, and translation. However, the safety of DNNs has now been recognized as a realistic security concern. The basic concept of a backdoor attack is to hide a secret functionality in a system, in our case, a DNN. The system behaves as expected for most inputs, but malicious input activates the backdoor. Deep learning models can be trained and provided by third parties or outsourced to the cloud. The reason behind this practice is that the computational power required to train reliable models is not always available to engineers or small companies. Apart from outsourcing the training phase, another strategy used is transfer learning. In this case, an existing model is fine-tuned for a new task. These scenarios allow adversaries to manipulate model training to create backdoors. The thesis investigates different aspects of the broad scenario of backdoor attacks in DNNs. We present a new type of trigger that can be used in audio signals obtained using the echo. Smaller echoes (less than 1 ms) are not even audible to humans, but they can still be used as a trigger for command recognition systems. We showed that with this trigger, we could bypass STRIP-ViTA, a popular defence mechanism against backdoors. We also analyzed the neuron activations in backdoor models and designed a possible defence based on empirical observations. The neurons of the last layer of a DNN show high variance in their activations when the input samples contain the trigger. Finally, we analyzed and evaluated the blind backdoor attacks, which are backdoor attacks that are based on both code and data poisoning, and tested them with an untested defence. We also proposed a way to bypass the defence.In recent years, we have seen an explosion of activity in deep learning in both academia and industry. Deep Neural Networks (DNNs) significantly outperform previous machine learning techniques in various domains, e.g., image recognition, speech processing, and translation. However, the safety of DNNs has now been recognized as a realistic security concern. The basic concept of a backdoor attack is to hide a secret functionality in a system, in our case, a DNN. The system behaves as expected for most inputs, but malicious input activates the backdoor. Deep learning models can be trained and provided by third parties or outsourced to the cloud. The reason behind this practice is that the computational power required to train reliable models is not always available to engineers or small companies. Apart from outsourcing the training phase, another strategy used is transfer learning. In this case, an existing model is fine-tuned for a new task. These scenarios allow adversaries to manipulate model training to create backdoors. The thesis investigates different aspects of the broad scenario of backdoor attacks in DNNs. We present a new type of trigger that can be used in audio signals obtained using the echo. Smaller echoes (less than 1 ms) are not even audible to humans, but they can still be used as a trigger for command recognition systems. We showed that with this trigger, we could bypass STRIP-ViTA, a popular defence mechanism against backdoors. We also analyzed the neuron activations in backdoor models and designed a possible defence based on empirical observations. The neurons of the last layer of a DNN show high variance in their activations when the input samples contain the trigger. Finally, we analyzed and evaluated the blind backdoor attacks, which are backdoor attacks that are based on both code and data poisoning, and tested them with an untested defence. We also proposed a way to bypass the defence

    Towards a Robust Defense: A Multifaceted Approach to the Detection and Mitigation of Neural Backdoor Attacks through Feature Space Exploration and Analysis

    Get PDF
    From voice assistants to self-driving vehicles, machine learning(ML), especially deep learning, revolutionizes the way we work and live, through the wide adoption in a broad range of applications. Unfortunately, this widespread use makes deep learning-based systems a desirable target for cyberattacks, such as generating adversarial examples to fool a deep learning system to make wrong decisions. In particular, many recent studies have revealed that attackers can corrupt the training of a deep learning model, e.g., through data poisoning, or distribute a deep learning model they created with “backdoors” planted, e.g., distributed as part of a software library, so that the attacker can easily craft system inputs that grant unauthorized access or lead to catastrophic errors or failures. This dissertation aims to develop a multifaceted approach for detecting and mitigating such neural backdoor attacks by exploiting their unique characteristics in the feature space. First of all, a framework called GangSweep is designed to utilize the capabilities of Generative Adversarial Networks (GAN) to approximate poisoned sample distributions in the feature space, to detect neural backdoor attacks. Unlike conventional methods, GangSweep exposes all attacker-induced artifacts, irrespective of their complexity or obscurity. By leveraging the statistical disparities between these artifacts and natural adversarial perturbations, an efficient detection scheme is devised. Accordingly, the backdoored model can be purified through label correction and fine-tuning Secondly, this dissertation focuses on the sample-targeted backdoor attacks, a variant of neural backdoor that targets specific samples. Given the absence of explicit triggers in such models, traditional detection methods falter. Through extensive analysis, I have identified a unique feature space property of these attacks, where they induce boundary alterations, creating discernible “pockets” around target samples. Based on this critical observation, I introduce a novel defense scheme that encapsulates these malicious pockets within a tight convex hull in the feature space, and then design an algorithm to identify such hulls and remove the backdoor through model fine-tuning. The algorithm demonstrates high efficacy against a spectrum of sample-targeted backdoor attacks. Lastly, I address the emerging challenge of backdoor attacks in multimodal deep neural networks, in particular vision-language model, a growing concern in real-world applications. Discovering that there is a strong association between the image trigger and the target text in the feature space of the backdoored vision-language model, I design an effective algorithm to expose the malicious text and image trigger by jointly searching in the shared feature space of the vision and language modalities

    Security and Privacy Issues of Federated Learning

    Full text link
    Federated Learning (FL) has emerged as a promising approach to address data privacy and confidentiality concerns by allowing multiple participants to construct a shared model without centralizing sensitive data. However, this decentralized paradigm introduces new security challenges, necessitating a comprehensive identification and classification of potential risks to ensure FL's security guarantees. This paper presents a comprehensive taxonomy of security and privacy challenges in Federated Learning (FL) across various machine learning models, including large language models. We specifically categorize attacks performed by the aggregator and participants, focusing on poisoning attacks, backdoor attacks, membership inference attacks, generative adversarial network (GAN) based attacks, and differential privacy attacks. Additionally, we propose new directions for future research, seeking innovative solutions to fortify FL systems against emerging security risks and uphold sensitive data confidentiality in distributed learning environments.Comment: 6 pages, 2 figure

    A Survey on Backdoor Attack and Defense in Natural Language Processing

    Full text link
    Deep learning is becoming increasingly popular in real-life applications, especially in natural language processing (NLP). Users often choose training outsourcing or adopt third-party data and models due to data and computation resources being limited. In such a situation, training data and models are exposed to the public. As a result, attackers can manipulate the training process to inject some triggers into the model, which is called backdoor attack. Backdoor attack is quite stealthy and difficult to be detected because it has little inferior influence on the model's performance for the clean samples. To get a precise grasp and understanding of this problem, in this paper, we conduct a comprehensive review of backdoor attacks and defenses in the field of NLP. Besides, we summarize benchmark datasets and point out the open issues to design credible systems to defend against backdoor attacks.Comment: 12 pages, QRS202
    • …
    corecore