56 research outputs found
Hidden Trigger Backdoor Attacks
With the success of deep learning algorithms in various domains, studying
adversarial attacks to secure deep models in real world applications has become
an important research topic. Backdoor attacks are a form of adversarial attacks
on deep networks where the attacker provides poisoned data to the victim to
train the model with, and then activates the attack by showing a specific small
trigger pattern at the test time. Most state-of-the-art backdoor attacks either
provide mislabeled poisoning data that is possible to identify by visual
inspection, reveal the trigger in the poisoned data, or use noise to hide the
trigger. We propose a novel form of backdoor attack where poisoned data look
natural with correct labels and also more importantly, the attacker hides the
trigger in the poisoned data and keeps the trigger secret until the test time.
We perform an extensive study on various image classification settings and show
that our attack can fool the model by pasting the trigger at random locations
on unseen images although the model performs well on clean data. We also show
that our proposed attack cannot be easily defended using a state-of-the-art
defense algorithm for backdoor attacks.Comment: AAAI 2020 - Main Technical Track (Oral
Robust Contrastive Language-Image Pre-training against Data Poisoning and Backdoor Attacks
Contrastive vision-language representation learning has achieved
state-of-the-art performance for zero-shot classification, by learning from
millions of image-caption pairs crawled from the internet. However, the massive
data that powers large multimodal models such as CLIP, makes them extremely
vulnerable to various types of targeted data poisoning and backdoor attacks.
Despite this vulnerability, robust contrastive vision-language pre-training
against such attacks has remained unaddressed. In this work, we propose ROCLIP,
the first effective method for robust pre-training multimodal vision-language
models against targeted data poisoning and backdoor attacks. ROCLIP effectively
breaks the association between poisoned image-caption pairs by considering a
relatively large and varying pool of random captions, and matching every image
with the text that is most similar to it in the pool instead of its own
caption, every few epochs.It also leverages image and text augmentations to
further strengthen the defense and improve the performance of the model. Our
extensive experiments show that ROCLIP renders state-of-the-art targeted data
poisoning and backdoor attacks ineffective during pre-training CLIP models. In
particular, ROCLIP decreases the success rate for targeted data poisoning
attacks from 93.75% to 12.5% and that of backdoor attacks down to 0%, while
improving the model's linear probe performance by 10% and maintains a similar
zero shot performance compared to CLIP. By increasing the frequency of
matching, ROCLIP is able to defend strong attacks, which add up to 1% poisoned
examples to the data, and successfully maintain a low attack success rate of
12.5%, while trading off the performance on some tasks
Single Image Backdoor Inversion via Robust Smoothed Classifiers
Backdoor inversion, a central step in many backdoor defenses, is a
reverse-engineering process to recover the hidden backdoor trigger inserted
into a machine learning model. Existing approaches tackle this problem by
searching for a backdoor pattern that is able to flip a set of clean images
into the target class, while the exact size needed of this support set is
rarely investigated. In this work, we present a new approach for backdoor
inversion, which is able to recover the hidden backdoor with as few as a single
image. Insipired by recent advances in adversarial robustness, our method
SmoothInv starts from a single clean image, and then performs projected
gradient descent towards the target class on a robust smoothed version of the
original backdoored classifier. We find that backdoor patterns emerge naturally
from such optimization process. Compared to existing backdoor inversion
methods, SmoothInv introduces minimum optimization variables and does not
require complex regularization schemes. We perform a comprehensive quantitative
and qualitative study on backdoored classifiers obtained from existing backdoor
attacks. We demonstrate that SmoothInv consistently recovers successful
backdoors from single images: for backdoored ImageNet classifiers, our
reconstructed backdoors have close to 100% attack success rates. We also show
that they maintain high fidelity to the underlying true backdoors. Last, we
propose and analyze two countermeasures to our approach and show that SmoothInv
remains robust in the face of an adaptive attacker. Our code is available at
https://github.com/locuslab/smoothinv.Comment: CVPR 2023. v2: improved writin
BadSQA: Stealthy Backdoor Attacks Using Presence Events as Triggers in Non-Intrusive Speech Quality Assessment
Non-Intrusive speech quality assessment (NISQA) has gained significant
attention for predicting the mean opinion score (MOS) of speech without
requiring the reference speech. In practical NISQA scenarios, untrusted
third-party resources are often employed during deep neural network training to
reduce costs. However, it would introduce a potential security vulnerability as
specially designed untrusted resources can launch backdoor attacks against
NISQA systems. Existing backdoor attacks primarily focus on classification
tasks and are not directly applicable to NISQA which is a regression task. In
this paper, we propose a novel backdoor attack on NISQA tasks, leveraging
presence events as triggers to achieving highly stealthy attacks. To evaluate
the effectiveness of our proposed approach, we conducted experiments on four
benchmark datasets and employed two state-of-the-art NISQA models. The results
demonstrate that the proposed backdoor attack achieved an average attack
success rate of up to 99% with a poisoning rate of only 3%.Comment: 5 pages, 6 figures,conferenc
An Evasion Attack against Stacked Capsule Autoencoder
Capsule network is a type of neural network that uses the spatial
relationship between features to classify images. By capturing the poses and
relative positions between features, its ability to recognize affine
transformation is improved, and it surpasses traditional convolutional neural
networks (CNNs) when handling translation, rotation and scaling. The Stacked
Capsule Autoencoder (SCAE) is the state-of-the-art capsule network. The SCAE
encodes an image as capsules, each of which contains poses of features and
their correlations. The encoded contents are then input into the downstream
classifier to predict the categories of the images. Existing research mainly
focuses on the security of capsule networks with dynamic routing or EM routing,
and little attention has been given to the security and robustness of the SCAE.
In this paper, we propose an evasion attack against the SCAE. After a
perturbation is generated based on the output of the object capsules in the
model, it is added to an image to reduce the contribution of the object
capsules related to the original category of the image so that the perturbed
image will be misclassified. We evaluate the attack using an image
classification experiment, and the experimental results indicate that the
attack can achieve high success rates and stealthiness. It confirms that the
SCAE has a security vulnerability whereby it is possible to craft adversarial
samples without changing the original structure of the image to fool the
classifiers. We hope that our work will make the community aware of the threat
of this attack and raise the attention given to the SCAE's security
Poison Dart Frog: A Clean-Label Attack with Low Poisoning Rate and High Attack Success Rate in the Absence of Training Data
To successfully launch backdoor attacks, injected data needs to be correctly
labeled; otherwise, they can be easily detected by even basic data filters.
Hence, the concept of clean-label attacks was introduced, which is more
dangerous as it doesn't require changing the labels of injected data. To the
best of our knowledge, the existing clean-label backdoor attacks largely relies
on an understanding of the entire training set or a portion of it. However, in
practice, it is very difficult for attackers to have it because of training
datasets often collected from multiple independent sources. Unlike all current
clean-label attacks, we propose a novel clean label method called 'Poison Dart
Frog'. Poison Dart Frog does not require access to any training data; it only
necessitates knowledge of the target class for the attack, such as 'frog'. On
CIFAR10, Tiny-ImageNet, and TSRD, with a mere 0.1\%, 0.025\%, and 0.4\%
poisoning rate of the training set size, respectively, Poison Dart Frog
achieves a high Attack Success Rate compared to LC, HTBA, BadNets, and Blend.
Furthermore, compared to the state-of-the-art attack, NARCISSUS, Poison Dart
Frog achieves similar attack success rates without any training data. Finally,
we demonstrate that four typical backdoor defense algorithms struggle to
counter Poison Dart Frog
- …