66 research outputs found
Backdoor Attack against Object Detection with Clean Annotation
Deep neural networks (DNNs) have shown unprecedented success in object
detection tasks. However, it was also discovered that DNNs are vulnerable to
multiple kinds of attacks, including Backdoor Attacks. Through the attack, the
attacker manages to embed a hidden backdoor into the DNN such that the model
behaves normally on benign data samples, but makes attacker-specified judgments
given the occurrence of a predefined trigger. Although numerous backdoor
attacks have been experimented on image classification, backdoor attacks on
object detection tasks have not been properly investigated and explored. As
object detection has been adopted as an important module in multiple
security-sensitive applications such as autonomous driving, backdoor attacks on
object detection could pose even more severe threats. Inspired by the inherent
property of deep learning-based object detectors, we propose a simple yet
effective backdoor attack method against object detection without modifying the
ground truth annotations, specifically focusing on the object disappearance
attack and object generation attack. Extensive experiments and ablation studies
prove the effectiveness of our attack on two benchmark object detection
datasets, PASCAL VOC07+12 and MSCOCO, on which we achieve an attack success
rate of more than 92% with a poison rate of only 5%
Towards Robust Neural Networks via Random Self-ensemble
Recent studies have revealed the vulnerability of deep neural networks: A
small adversarial perturbation that is imperceptible to human can easily make a
well-trained deep neural network misclassify. This makes it unsafe to apply
neural networks in security-critical applications. In this paper, we propose a
new defense algorithm called Random Self-Ensemble (RSE) by combining two
important concepts: {\bf randomness} and {\bf ensemble}. To protect a targeted
model, RSE adds random noise layers to the neural network to prevent the strong
gradient-based attacks, and ensembles the prediction over random noises to
stabilize the performance. We show that our algorithm is equivalent to ensemble
an infinite number of noisy models without any additional memory
overhead, and the proposed training procedure based on noisy stochastic
gradient descent can ensure the ensemble model has a good predictive
capability. Our algorithm significantly outperforms previous defense techniques
on real data sets. For instance, on CIFAR-10 with VGG network (which has 92\%
accuracy without any attack), under the strong C\&W attack within a certain
distortion tolerance, the accuracy of unprotected model drops to less than
10\%, the best previous defense technique has accuracy, while our method
still has prediction accuracy under the same level of attack. Finally,
our method is simple and easy to integrate into any neural network.Comment: ECCV 2018 camera read
Towards Stable Backdoor Purification through Feature Shift Tuning
It has been widely observed that deep neural networks (DNN) are vulnerable to
backdoor attacks where attackers could manipulate the model behavior
maliciously by tampering with a small set of training samples. Although a line
of defense methods is proposed to mitigate this threat, they either require
complicated modifications to the training process or heavily rely on the
specific model architecture, which makes them hard to deploy into real-world
applications. Therefore, in this paper, we instead start with fine-tuning, one
of the most common and easy-to-deploy backdoor defenses, through comprehensive
evaluations against diverse attack scenarios. Observations made through initial
experiments show that in contrast to the promising defensive results on high
poisoning rates, vanilla tuning methods completely fail at low poisoning rate
scenarios. Our analysis shows that with the low poisoning rate, the
entanglement between backdoor and clean features undermines the effect of
tuning-based defenses. Therefore, it is necessary to disentangle the backdoor
and clean features in order to improve backdoor purification. To address this,
we introduce Feature Shift Tuning (FST), a method for tuning-based backdoor
purification. Specifically, FST encourages feature shifts by actively deviating
the classifier weights from the originally compromised weights. Extensive
experiments demonstrate that our FST provides consistently stable performance
under different attack settings. Without complex parameter adjustments, FST
also achieves much lower tuning costs, only 10 epochs. Our codes are available
at https://github.com/AISafety-HKUST/stable_backdoor_purification.Comment: NeurIPS 2023 paper. The first two authors contributed equall
Revisiting Personalized Federated Learning: Robustness Against Backdoor Attacks
In this work, besides improving prediction accuracy, we study whether
personalization could bring robustness benefits to backdoor attacks. We conduct
the first study of backdoor attacks in the pFL framework, testing 4 widely used
backdoor attacks against 6 pFL methods on benchmark datasets FEMNIST and
CIFAR-10, a total of 600 experiments. The study shows that pFL methods with
partial model-sharing can significantly boost robustness against backdoor
attacks. In contrast, pFL methods with full model-sharing do not show
robustness. To analyze the reasons for varying robustness performances, we
provide comprehensive ablation studies on different pFL methods. Based on our
findings, we further propose a lightweight defense method, Simple-Tuning, which
empirically improves defense performance against backdoor attacks. We believe
that our work could provide both guidance for pFL application in terms of its
robustness and offer valuable insights to design more robust FL methods in the
future. We open-source our code to establish the first benchmark for black-box
backdoor attacks in pFL:
https://github.com/alibaba/FederatedScope/tree/backdoor-bench.Comment: KDD 202
- …