293 research outputs found

    Backdoor Attacks and Defences on Neural Networks

    Get PDF
    openIn recent years, we have seen an explosion of activity in deep learning in both academia and industry. Deep Neural Networks (DNNs) significantly outperform previous machine learning techniques in various domains, e.g., image recognition, speech processing, and translation. However, the safety of DNNs has now been recognized as a realistic security concern. The basic concept of a backdoor attack is to hide a secret functionality in a system, in our case, a DNN. The system behaves as expected for most inputs, but malicious input activates the backdoor. Deep learning models can be trained and provided by third parties or outsourced to the cloud. The reason behind this practice is that the computational power required to train reliable models is not always available to engineers or small companies. Apart from outsourcing the training phase, another strategy used is transfer learning. In this case, an existing model is fine-tuned for a new task. These scenarios allow adversaries to manipulate model training to create backdoors. The thesis investigates different aspects of the broad scenario of backdoor attacks in DNNs. We present a new type of trigger that can be used in audio signals obtained using the echo. Smaller echoes (less than 1 ms) are not even audible to humans, but they can still be used as a trigger for command recognition systems. We showed that with this trigger, we could bypass STRIP-ViTA, a popular defence mechanism against backdoors. We also analyzed the neuron activations in backdoor models and designed a possible defence based on empirical observations. The neurons of the last layer of a DNN show high variance in their activations when the input samples contain the trigger. Finally, we analyzed and evaluated the blind backdoor attacks, which are backdoor attacks that are based on both code and data poisoning, and tested them with an untested defence. We also proposed a way to bypass the defence.In recent years, we have seen an explosion of activity in deep learning in both academia and industry. Deep Neural Networks (DNNs) significantly outperform previous machine learning techniques in various domains, e.g., image recognition, speech processing, and translation. However, the safety of DNNs has now been recognized as a realistic security concern. The basic concept of a backdoor attack is to hide a secret functionality in a system, in our case, a DNN. The system behaves as expected for most inputs, but malicious input activates the backdoor. Deep learning models can be trained and provided by third parties or outsourced to the cloud. The reason behind this practice is that the computational power required to train reliable models is not always available to engineers or small companies. Apart from outsourcing the training phase, another strategy used is transfer learning. In this case, an existing model is fine-tuned for a new task. These scenarios allow adversaries to manipulate model training to create backdoors. The thesis investigates different aspects of the broad scenario of backdoor attacks in DNNs. We present a new type of trigger that can be used in audio signals obtained using the echo. Smaller echoes (less than 1 ms) are not even audible to humans, but they can still be used as a trigger for command recognition systems. We showed that with this trigger, we could bypass STRIP-ViTA, a popular defence mechanism against backdoors. We also analyzed the neuron activations in backdoor models and designed a possible defence based on empirical observations. The neurons of the last layer of a DNN show high variance in their activations when the input samples contain the trigger. Finally, we analyzed and evaluated the blind backdoor attacks, which are backdoor attacks that are based on both code and data poisoning, and tested them with an untested defence. We also proposed a way to bypass the defence

    Februus: Input Purification Defense Against Trojan Attacks on Deep Neural Network Systems

    Full text link
    We propose Februus; a new idea to neutralize highly potent and insidious Trojan attacks on Deep Neural Network (DNN) systems at run-time. In Trojan attacks, an adversary activates a backdoor crafted in a deep neural network model using a secret trigger, a Trojan, applied to any input to alter the model's decision to a target prediction---a target determined by and only known to the attacker. Februus sanitizes the incoming input by surgically removing the potential trigger artifacts and restoring the input for the classification task. Februus enables effective Trojan mitigation by sanitizing inputs with no loss of performance for sanitized inputs, Trojaned or benign. Our extensive evaluations on multiple infected models based on four popular datasets across three contrasting vision applications and trigger types demonstrate the high efficacy of Februus. We dramatically reduced attack success rates from 100% to near 0% for all cases (achieving 0% on multiple cases) and evaluated the generalizability of Februus to defend against complex adaptive attacks; notably, we realized the first defense against the advanced partial Trojan attack. To the best of our knowledge, Februus is the first backdoor defense method for operation at run-time capable of sanitizing Trojaned inputs without requiring anomaly detection methods, model retraining or costly labeled data.Comment: 16 pages, to appear in the 36th Annual Computer Security Applications Conference (ACSAC 2020

    An Overview of Backdoor Attacks Against Deep Neural Networks and Possible Defences

    Get PDF
    Together with impressive advances touching every aspect of our society, AI technology based on Deep Neural Networks (DNN) is bringing increasing security concerns. While attacks operating at test time have monopolised the initial attention of researchers, backdoor attacks, exploiting the possibility of corrupting DNN models by interfering with the training process, represent a further serious threat undermining the dependability of AI techniques. In backdoor attacks, the attacker corrupts the training data to induce an erroneous behaviour at test time. Test-time errors, however, are activated only in the presence of a triggering event. In this way, the corrupted network continues to work as expected for regular inputs, and the malicious behaviour occurs only when the attacker decides to activate the backdoor hidden within the network. Recently, backdoor attacks have been an intense research domain focusing on both the development of new classes of attacks, and the proposal of possible countermeasures. The goal of this overview is to review the works published until now, classifying the different types of attacks and defences proposed so far. The classification guiding the analysis is based on the amount of control that the attacker has on the training process, and the capability of the defender to verify the integrity of the data used for training, and to monitor the operations of the DNN at training and test time. Hence, the proposed analysis is suited to highlight the strengths and weaknesses of both attacks and defences with reference to the application scenarios they are operating in

    An Overview of Backdoor Attacks Against Deep Neural Networks and Possible Defences

    Get PDF
    Together with impressive advances touching every aspect of our society, AI technology based on Deep Neural Networks (DNN) is bringing increasing security concerns. While attacks operating at test time have monopolised the initial attention of researchers, backdoor attacks, exploiting the possibility of corrupting DNN models by interfering with the training process, represent a further serious threat undermining the dependability of AI techniques. In backdoor attacks, the attacker corrupts the training data to induce an erroneous behaviour at test time. Test-time errors, however, are activated only in the presence of a triggering event. In this way, the corrupted network continues to work as expected for regular inputs, and the malicious behaviour occurs only when the attacker decides to activate the backdoor hidden within the network. Recently, backdoor attacks have been an intense research domain focusing on both the development of new classes of attacks, and the proposal of possible countermeasures. The goal of this overview is to review the works published until now, classifying the different types of attacks and defences proposed so far. The classification guiding the analysis is based on the amount of control that the attacker has on the training process, and the capability of the defender to verify the integrity of the data used for training, and to monitor the operations of the DNN at training and test time. Hence, the proposed analysis is suited to highlight the strengths and weaknesses of both attacks and defences with reference to the application scenarios they are operating in

    Towards Understanding How Self-training Tolerates Data Backdoor Poisoning

    Full text link
    Recent studies on backdoor attacks in model training have shown that polluting a small portion of training data is sufficient to produce incorrect manipulated predictions on poisoned test-time data while maintaining high clean accuracy in downstream tasks. The stealthiness of backdoor attacks has imposed tremendous defense challenges in today's machine learning paradigm. In this paper, we explore the potential of self-training via additional unlabeled data for mitigating backdoor attacks. We begin by making a pilot study to show that vanilla self-training is not effective in backdoor mitigation. Spurred by that, we propose to defend the backdoor attacks by leveraging strong but proper data augmentations in the self-training pseudo-labeling stage. We find that the new self-training regime help in defending against backdoor attacks to a great extent. Its effectiveness is demonstrated through experiments for different backdoor triggers on CIFAR-10 and a combination of CIFAR-10 with an additional unlabeled 500K TinyImages dataset. Finally, we explore the direction of combining self-supervised representation learning with self-training for further improvement in backdoor defense.Comment: Accepted at SafeAI 2023: AAAI's Workshop on Artificial Intelligence Safet
    • …
    corecore