569 research outputs found
Spectral Signatures in Backdoor Attacks
A recent line of work has uncovered a new form of data poisoning: so-called
\emph{backdoor} attacks. These attacks are particularly dangerous because they
do not affect a network's behavior on typical, benign data. Rather, the network
only deviates from its expected output when triggered by a perturbation planted
by an adversary.
In this paper, we identify a new property of all known backdoor attacks,
which we call \emph{spectral signatures}. This property allows us to utilize
tools from robust statistics to thwart the attacks. We demonstrate the efficacy
of these signatures in detecting and removing poisoned examples on real image
sets and state of the art neural network architectures. We believe that
understanding spectral signatures is a crucial first step towards designing ML
systems secure against such backdoor attacksComment: 16 pages, accepted to NIPS 201
TrojDRL: Trojan Attacks on Deep Reinforcement Learning Agents
Recent work has identified that classification models implemented as
neural networks are vulnerable to
data-poisoning and Trojan attacks at training time.
In this work, we show that these
training-time vulnerabilities extend to
deep reinforcement learning (DRL) agents
and can be exploited by an adversary with access
to the training process.
In particular, we focus on
Trojan attacks that augment the function of
reinforcement learning policies
with hidden behaviors.
We demonstrate that such attacks can be implemented
through minuscule data poisoning (as little as 0.025% of the training data) and
in-band
reward modification that does not affect
the reward on normal inputs.
The policies learned with our proposed attack approach perform imperceptibly similar to benign policies but deteriorate drastically when the Trojan is triggered
in both targeted and untargeted settings.
Furthermore, we show that existing Trojan defense mechanisms for classification tasks are not effective in the reinforcement learning setting
Backdoors in Neural Models of Source Code
Deep neural networks are vulnerable to a range of adversaries. A particularly
pernicious class of vulnerabilities are backdoors, where model predictions
diverge in the presence of subtle triggers in inputs. An attacker can implant a
backdoor by poisoning the training data to yield a desired target prediction on
triggered inputs. We study backdoors in the context of deep-learning for source
code. (1) We define a range of backdoor classes for source-code tasks and show
how to poison a dataset to install such backdoors. (2) We adapt and improve
recent algorithms from robust statistics for our setting, showing that
backdoors leave a spectral signature in the learned representation of source
code, thus enabling detection of poisoned data. (3) We conduct a thorough
evaluation on different architectures and languages, showing the ease of
injecting backdoors and our ability to eliminate them
Backdoor Attacks in the Supply Chain of Masked Image Modeling
Masked image modeling (MIM) revolutionizes self-supervised learning (SSL) for
image pre-training. In contrast to previous dominating self-supervised methods,
i.e., contrastive learning, MIM attains state-of-the-art performance by masking
and reconstructing random patches of the input image. However, the associated
security and privacy risks of this novel generative method are unexplored. In
this paper, we perform the first security risk quantification of MIM through
the lens of backdoor attacks. Different from previous work, we are the first to
systematically threat modeling on SSL in every phase of the model supply chain,
i.e., pre-training, release, and downstream phases. Our evaluation shows that
models built with MIM are vulnerable to existing backdoor attacks in release
and downstream phases and are compromised by our proposed method in
pre-training phase. For instance, on CIFAR10, the attack success rate can reach
99.62%, 96.48%, and 98.89% in the downstream phase, release phase, and
pre-training phase, respectively. We also take the first step to investigate
the success factors of backdoor attacks in the pre-training phase and find the
trigger number and trigger pattern play key roles in the success of backdoor
attacks while trigger location has only tiny effects. In the end, our empirical
study of the defense mechanisms across three detection-level on model supply
chain phases indicates that different defenses are suitable for backdoor
attacks in different phases. However, backdoor attacks in the release phase
cannot be detected by all three detection-level methods, calling for more
effective defenses in future research
Just Rotate it: Deploying Backdoor Attacks via Rotation Transformation
Recent works have demonstrated that deep learning models are vulnerable to
backdoor poisoning attacks, where these attacks instill spurious correlations
to external trigger patterns or objects (e.g., stickers, sunglasses, etc.). We
find that such external trigger signals are unnecessary, as highly effective
backdoors can be easily inserted using rotation-based image transformation. Our
method constructs the poisoned dataset by rotating a limited amount of objects
and labeling them incorrectly; once trained with it, the victim's model will
make undesirable predictions during run-time inference. It exhibits a
significantly high attack success rate while maintaining clean performance
through comprehensive empirical studies on image classification and object
detection tasks. Furthermore, we evaluate standard data augmentation techniques
and four different backdoor defenses against our attack and find that none of
them can serve as a consistent mitigation approach. Our attack can be easily
deployed in the real world since it only requires rotating the object, as we
show in both image classification and object detection applications. Overall,
our work highlights a new, simple, physically realizable, and highly effective
vector for backdoor attacks. Our video demo is available at
https://youtu.be/6JIF8wnX34M.Comment: 25 page
- …