5,932 research outputs found
Backdoor Learning for NLP: Recent Advances, Challenges, and Future Research Directions
Although backdoor learning is an active research topic in the NLP domain, the
literature lacks studies that systematically categorize and summarize backdoor
attacks and defenses. To bridge the gap, we present a comprehensive and
unifying study of backdoor learning for NLP by summarizing the literature in a
systematic manner. We first present and motivate the importance of backdoor
learning for building robust NLP systems. Next, we provide a thorough account
of backdoor attack techniques, their applications, defenses against backdoor
attacks, and various mitigation techniques to remove backdoor attacks. We then
provide a detailed review and analysis of evaluation metrics, benchmark
datasets, threat models, and challenges related to backdoor learning in NLP.
Ultimately, our work aims to crystallize and contextualize the landscape of
existing literature in backdoor learning for the text domain and motivate
further research in the field. To this end, we identify troubling gaps in the
literature and offer insights and ideas into open challenges and future
research directions. Finally, we provide a GitHub repository with a list of
backdoor learning papers that will be continuously updated at
https://github.com/marwanomar1/Backdoor-Learning-for-NLP
A new Backdoor Attack in CNNs by training set corruption without label poisoning
Backdoor attacks against CNNs represent a new threat against deep learning
systems, due to the possibility of corrupting the training set so to induce an
incorrect behaviour at test time. To avoid that the trainer recognises the
presence of the corrupted samples, the corruption of the training set must be
as stealthy as possible. Previous works have focused on the stealthiness of the
perturbation injected into the training samples, however they all assume that
the labels of the corrupted samples are also poisoned. This greatly reduces the
stealthiness of the attack, since samples whose content does not agree with the
label can be identified by visual inspection of the training set or by running
a pre-classification step. In this paper we present a new backdoor attack
without label poisoning Since the attack works by corrupting only samples of
the target class, it has the additional advantage that it does not need to
identify beforehand the class of the samples to be attacked at test time.
Results obtained on the MNIST digits recognition task and the traffic signs
classification task show that backdoor attacks without label poisoning are
indeed possible, thus raising a new alarm regarding the use of deep learning in
security-critical applications
Trembling triggers: exploring the sensitivity of backdoors in DNN-based face recognition
Abstract Backdoor attacks against supervised machine learning methods seek to modify the training samples in such a way that, at inference time, the presence of a specific pattern (trigger) in the input data causes misclassifications to a target class chosen by the adversary. Successful backdoor attacks have been presented in particular for face recognition systems based on deep neural networks (DNNs). These attacks were evaluated for identical triggers at training and inference time. However, the vulnerability to backdoor attacks in practice crucially depends on the sensitivity of the backdoored classifier to approximate trigger inputs. To assess this, we study the response of a backdoored DNN for face recognition to trigger signals that have been transformed with typical image processing operators of varying strength. Results for different kinds of geometric and color transformations suggest that in particular geometric misplacements and partial occlusions of the trigger limit the effectiveness of the backdoor attacks considered. Moreover, our analysis reveals that the spatial interaction of the trigger with the subject's face affects the success of the attack. Experiments with physical triggers inserted in live acquisitions validate the observed response of the DNN when triggers are inserted digitally
Backdoor Attacks and Defences on Neural Networks
openIn recent years, we have seen an explosion of activity in deep learning in both academia and industry. Deep Neural Networks (DNNs) significantly outperform previous machine learning techniques in various domains, e.g., image recognition, speech processing, and translation. However, the safety of DNNs has now been recognized as a realistic security concern.
The basic concept of a backdoor attack is to hide a secret functionality in a system, in our case, a DNN. The system behaves as expected for most inputs, but malicious input activates the backdoor.
Deep learning models can be trained and provided by third parties or outsourced to the cloud. The reason behind this practice is that the computational power required to train reliable models is not always available to engineers or small companies. Apart from outsourcing the training phase, another strategy used is transfer learning. In this case, an existing model is fine-tuned for a new task. These scenarios allow adversaries to manipulate model training to create backdoors.
The thesis investigates different aspects of the broad scenario of backdoor attacks in DNNs. We present a new type of trigger that can be used in audio signals obtained using the echo. Smaller echoes (less than 1 ms) are not even audible to humans, but they can still be used as a trigger for command recognition systems. We showed that with this trigger, we could bypass STRIP-ViTA, a popular defence mechanism against backdoors.
We also analyzed the neuron activations in backdoor models and designed a possible defence based on empirical observations. The neurons of the last layer of a DNN show high variance in their activations when the input samples contain the trigger.
Finally, we analyzed and evaluated the blind backdoor attacks, which are backdoor attacks that are based on both code and data poisoning, and tested them with an untested defence. We also proposed a way to bypass the defence.In recent years, we have seen an explosion of activity in deep learning in both academia and industry. Deep Neural Networks (DNNs) significantly outperform previous machine learning techniques in various domains, e.g., image recognition, speech processing, and translation. However, the safety of DNNs has now been recognized as a realistic security concern.
The basic concept of a backdoor attack is to hide a secret functionality in a system, in our case, a DNN. The system behaves as expected for most inputs, but malicious input activates the backdoor.
Deep learning models can be trained and provided by third parties or outsourced to the cloud. The reason behind this practice is that the computational power required to train reliable models is not always available to engineers or small companies. Apart from outsourcing the training phase, another strategy used is transfer learning. In this case, an existing model is fine-tuned for a new task. These scenarios allow adversaries to manipulate model training to create backdoors.
The thesis investigates different aspects of the broad scenario of backdoor attacks in DNNs. We present a new type of trigger that can be used in audio signals obtained using the echo. Smaller echoes (less than 1 ms) are not even audible to humans, but they can still be used as a trigger for command recognition systems. We showed that with this trigger, we could bypass STRIP-ViTA, a popular defence mechanism against backdoors.
We also analyzed the neuron activations in backdoor models and designed a possible defence based on empirical observations. The neurons of the last layer of a DNN show high variance in their activations when the input samples contain the trigger.
Finally, we analyzed and evaluated the blind backdoor attacks, which are backdoor attacks that are based on both code and data poisoning, and tested them with an untested defence. We also proposed a way to bypass the defence
Towards a Robust Defense: A Multifaceted Approach to the Detection and Mitigation of Neural Backdoor Attacks through Feature Space Exploration and Analysis
From voice assistants to self-driving vehicles, machine learning(ML), especially deep learning, revolutionizes the way we work and live, through the wide adoption in a broad range of applications. Unfortunately, this widespread use makes deep learning-based systems a desirable target for cyberattacks, such as generating adversarial examples to fool a deep learning system to make wrong decisions. In particular, many recent studies have revealed that attackers can corrupt the training of a deep learning model, e.g., through data poisoning, or distribute a deep learning model they created with “backdoors” planted, e.g., distributed as part of a software library, so that the attacker can easily craft system inputs that grant unauthorized access or lead to catastrophic errors or failures.
This dissertation aims to develop a multifaceted approach for detecting and mitigating such neural backdoor attacks by exploiting their unique characteristics in the feature space. First of all, a framework called GangSweep is designed to utilize the capabilities of Generative Adversarial Networks (GAN) to approximate poisoned sample distributions in the feature space, to detect neural backdoor attacks. Unlike conventional methods, GangSweep exposes all attacker-induced artifacts, irrespective of their complexity or obscurity. By leveraging the statistical disparities between these artifacts and natural adversarial perturbations, an efficient detection scheme is devised. Accordingly, the backdoored model can be purified through label correction and fine-tuning
Secondly, this dissertation focuses on the sample-targeted backdoor attacks, a variant of neural backdoor that targets specific samples. Given the absence of explicit triggers in such models, traditional detection methods falter. Through extensive analysis, I have identified a unique feature space property of these attacks, where they induce boundary alterations, creating discernible “pockets” around target samples. Based on this critical observation, I introduce a novel defense scheme that encapsulates these malicious pockets within a tight convex hull in the feature space, and then design an algorithm to identify such hulls and remove the backdoor through model fine-tuning. The algorithm demonstrates high efficacy against a spectrum of sample-targeted backdoor attacks.
Lastly, I address the emerging challenge of backdoor attacks in multimodal deep neural networks, in particular vision-language model, a growing concern in real-world applications. Discovering that there is a strong association between the image trigger and the target text in the feature space of the backdoored vision-language model, I design an effective algorithm to expose the malicious text and image trigger by jointly searching in the shared feature space of the vision and language modalities
Security and Privacy Issues of Federated Learning
Federated Learning (FL) has emerged as a promising approach to address data
privacy and confidentiality concerns by allowing multiple participants to
construct a shared model without centralizing sensitive data. However, this
decentralized paradigm introduces new security challenges, necessitating a
comprehensive identification and classification of potential risks to ensure
FL's security guarantees. This paper presents a comprehensive taxonomy of
security and privacy challenges in Federated Learning (FL) across various
machine learning models, including large language models. We specifically
categorize attacks performed by the aggregator and participants, focusing on
poisoning attacks, backdoor attacks, membership inference attacks, generative
adversarial network (GAN) based attacks, and differential privacy attacks.
Additionally, we propose new directions for future research, seeking innovative
solutions to fortify FL systems against emerging security risks and uphold
sensitive data confidentiality in distributed learning environments.Comment: 6 pages, 2 figure
A Survey on Backdoor Attack and Defense in Natural Language Processing
Deep learning is becoming increasingly popular in real-life applications,
especially in natural language processing (NLP). Users often choose training
outsourcing or adopt third-party data and models due to data and computation
resources being limited. In such a situation, training data and models are
exposed to the public. As a result, attackers can manipulate the training
process to inject some triggers into the model, which is called backdoor
attack. Backdoor attack is quite stealthy and difficult to be detected because
it has little inferior influence on the model's performance for the clean
samples. To get a precise grasp and understanding of this problem, in this
paper, we conduct a comprehensive review of backdoor attacks and defenses in
the field of NLP. Besides, we summarize benchmark datasets and point out the
open issues to design credible systems to defend against backdoor attacks.Comment: 12 pages, QRS202
- …