43 research outputs found
Mitigating Backdoor Attack Via Prerequisite Transformation
In recent years, with the successful application of DNN in fields such as NLP
and CV, its security has also received widespread attention. (Author) proposed
the method of backdoor attack in Badnet. Switch implanted backdoor into the
model by poisoning the training samples. The model with backdoor did not
exhibit any abnormalities on the normal validation sample set, but in the input
with trigger, they were mistakenly classified as the attacker's designated
category or randomly classified as a different category from the ground truth,
This attack method seriously threatens the normal application of DNN in real
life, such as autonomous driving, object detection, etc.This article proposes a
new method to combat backdoor attacks. We refer to the features in the area
covered by the trigger as trigger features, and the remaining areas as normal
features. By introducing prerequisite calculation conditions during the
training process, these conditions have little impact on normal features and
trigger features, and can complete the training of a standard backdoor model.
The model trained under these prerequisite calculation conditions can, In the
verification set D'val with the same premise calculation conditions, the
performance is consistent with that of the ordinary backdoor model. However, in
the verification set Dval without the premise calculation conditions, the
verification accuracy decreases very little (7%~12%), while the attack success
rate (ASR) decreases from 90% to about 8%.Author call this method Prerequisite
Transformation(PT).Comment: 7 pages,7 figures,2 table
BadSQA: Stealthy Backdoor Attacks Using Presence Events as Triggers in Non-Intrusive Speech Quality Assessment
Non-Intrusive speech quality assessment (NISQA) has gained significant
attention for predicting the mean opinion score (MOS) of speech without
requiring the reference speech. In practical NISQA scenarios, untrusted
third-party resources are often employed during deep neural network training to
reduce costs. However, it would introduce a potential security vulnerability as
specially designed untrusted resources can launch backdoor attacks against
NISQA systems. Existing backdoor attacks primarily focus on classification
tasks and are not directly applicable to NISQA which is a regression task. In
this paper, we propose a novel backdoor attack on NISQA tasks, leveraging
presence events as triggers to achieving highly stealthy attacks. To evaluate
the effectiveness of our proposed approach, we conducted experiments on four
benchmark datasets and employed two state-of-the-art NISQA models. The results
demonstrate that the proposed backdoor attack achieved an average attack
success rate of up to 99% with a poisoning rate of only 3%.Comment: 5 pages, 6 figures,conferenc
SATBA: An Invisible Backdoor Attack Based On Spatial Attention
Backdoor attacks pose a new and emerging threat to AI security, where Deep
Neural Networks (DNNs) are trained on datasets added to hidden trigger
patterns. Although the poisoned model behaves normally on benign samples, it
produces anomalous results on samples containing the trigger pattern.
Nevertheless, most existing backdoor attacks face two significant drawbacks:
their trigger patterns are visible and easy to detect by human inspection, and
their injection process leads to the loss of natural sample features and
trigger patterns, thereby reducing the attack success rate and the model
accuracy. In this paper, we propose a novel backdoor attack named SATBA that
overcomes these limitations by using spatial attention mechanism and U-type
model. Our attack leverages spatial attention mechanism to extract data
features and generate invisible trigger patterns that are correlated with clean
data. Then it uses U-type model to plant these trigger patterns into the
original data without causing noticeable feature loss. We evaluate our attack
on three prominent image classification DNNs across three standard datasets and
demonstrate that it achieves high attack success rate and robustness against
backdoor defenses. Additionally, we also conduct extensive experiments on image
similarity to highlight the stealthiness of our attack.Comment: 15 pages, 6 figure
Poison Dart Frog: A Clean-Label Attack with Low Poisoning Rate and High Attack Success Rate in the Absence of Training Data
To successfully launch backdoor attacks, injected data needs to be correctly
labeled; otherwise, they can be easily detected by even basic data filters.
Hence, the concept of clean-label attacks was introduced, which is more
dangerous as it doesn't require changing the labels of injected data. To the
best of our knowledge, the existing clean-label backdoor attacks largely relies
on an understanding of the entire training set or a portion of it. However, in
practice, it is very difficult for attackers to have it because of training
datasets often collected from multiple independent sources. Unlike all current
clean-label attacks, we propose a novel clean label method called 'Poison Dart
Frog'. Poison Dart Frog does not require access to any training data; it only
necessitates knowledge of the target class for the attack, such as 'frog'. On
CIFAR10, Tiny-ImageNet, and TSRD, with a mere 0.1\%, 0.025\%, and 0.4\%
poisoning rate of the training set size, respectively, Poison Dart Frog
achieves a high Attack Success Rate compared to LC, HTBA, BadNets, and Blend.
Furthermore, compared to the state-of-the-art attack, NARCISSUS, Poison Dart
Frog achieves similar attack success rates without any training data. Finally,
we demonstrate that four typical backdoor defense algorithms struggle to
counter Poison Dart Frog
Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks
Contrastive Language-Image Pre-training (CLIP) on large image-caption
datasets has achieved remarkable success in zero-shot classification and
enabled transferability to new domains. However, CLIP is extremely more
vulnerable to targeted data poisoning and backdoor attacks, compared to
supervised learning. Perhaps surprisingly, poisoning 0.0001% of CLIP
pre-training data is enough to make targeted data poisoning attacks successful.
This is four orders of magnitude smaller than what is required to poison
supervised models. Despite this vulnerability, existing methods are very
limited in defending CLIP models during pre-training. In this work, we propose
a strong defense, SAFECLIP, to safely pre-train CLIP against targeted data
poisoning and backdoor attacks. SAFECLIP warms up the model by applying
unimodal contrastive learning (CL) on image and text modalities separately.
Then, it carefully divides the data into safe and risky subsets. SAFECLIP
trains on the risky data by applying unimodal CL to image and text modalities
separately, and trains on the safe data using the CLIP loss. By gradually
increasing the size of the safe subset during the training, SAFECLIP
effectively breaks targeted data poisoning and backdoor attacks without harming
the CLIP performance. Our extensive experiments show that SAFECLIP decrease the
attack success rate of targeted data poisoning attacks from 93.75% to 0% and
that of the backdoor attacks from 100% to 0%, without harming the CLIP
performance on various datasets
Backdoor Attack against Object Detection with Clean Annotation
Deep neural networks (DNNs) have shown unprecedented success in object
detection tasks. However, it was also discovered that DNNs are vulnerable to
multiple kinds of attacks, including Backdoor Attacks. Through the attack, the
attacker manages to embed a hidden backdoor into the DNN such that the model
behaves normally on benign data samples, but makes attacker-specified judgments
given the occurrence of a predefined trigger. Although numerous backdoor
attacks have been experimented on image classification, backdoor attacks on
object detection tasks have not been properly investigated and explored. As
object detection has been adopted as an important module in multiple
security-sensitive applications such as autonomous driving, backdoor attacks on
object detection could pose even more severe threats. Inspired by the inherent
property of deep learning-based object detectors, we propose a simple yet
effective backdoor attack method against object detection without modifying the
ground truth annotations, specifically focusing on the object disappearance
attack and object generation attack. Extensive experiments and ablation studies
prove the effectiveness of our attack on two benchmark object detection
datasets, PASCAL VOC07+12 and MSCOCO, on which we achieve an attack success
rate of more than 92% with a poison rate of only 5%
Backdoor Cleansing with Unlabeled Data
Due to the increasing computational demand of Deep Neural Networks (DNNs),
companies and organizations have begun to outsource the training process.
However, the externally trained DNNs can potentially be backdoor attacked. It
is crucial to defend against such attacks, i.e., to postprocess a suspicious
model so that its backdoor behavior is mitigated while its normal prediction
power on clean inputs remain uncompromised. To remove the abnormal backdoor
behavior, existing methods mostly rely on additional labeled clean samples.
However, such requirement may be unrealistic as the training data are often
unavailable to end users. In this paper, we investigate the possibility of
circumventing such barrier. We propose a novel defense method that does not
require training labels. Through a carefully designed layer-wise weight
re-initialization and knowledge distillation, our method can effectively
cleanse backdoor behaviors of a suspicious network with negligible compromise
in its normal behavior. In experiments, we show that our method, trained
without labels, is on-par with state-of-the-art defense methods trained using
labels. We also observe promising defense results even on out-of-distribution
data. This makes our method very practical