835 research outputs found
Backdoors in Neural Models of Source Code
Deep neural networks are vulnerable to a range of adversaries. A particularly
pernicious class of vulnerabilities are backdoors, where model predictions
diverge in the presence of subtle triggers in inputs. An attacker can implant a
backdoor by poisoning the training data to yield a desired target prediction on
triggered inputs. We study backdoors in the context of deep-learning for source
code. (1) We define a range of backdoor classes for source-code tasks and show
how to poison a dataset to install such backdoors. (2) We adapt and improve
recent algorithms from robust statistics for our setting, showing that
backdoors leave a spectral signature in the learned representation of source
code, thus enabling detection of poisoned data. (3) We conduct a thorough
evaluation on different architectures and languages, showing the ease of
injecting backdoors and our ability to eliminate them
ImpNet: Imperceptible and blackbox-undetectable backdoors in compiled neural networks
Early backdoor attacks against machine learning set off an arms race in
attack and defence development. Defences have since appeared demonstrating some
ability to detect backdoors in models or even remove them. These defences work
by inspecting the training data, the model, or the integrity of the training
procedure. In this work, we show that backdoors can be added during
compilation, circumventing any safeguards in the data preparation and model
training stages. As an illustration, the attacker can insert weight-based
backdoors during the hardware compilation step that will not be detected by any
training or data-preparation process. Next, we demonstrate that some backdoors,
such as ImpNet, can only be reliably detected at the stage where they are
inserted and removing them anywhere else presents a significant challenge. We
conclude that machine-learning model security requires assurance of provenance
along the entire technical pipeline, including the data, model architecture,
compiler, and hardware specification.Comment: 10 pages, 6 figures. For website see https://mlbackdoors.soc.srcf.net
. For source code, see https://git.sr.ht/~tim-clifford/impnet_sourc
Augmentation Backdoors
Data augmentation is used extensively to improve model generalisation.
However, reliance on external libraries to implement augmentation methods
introduces a vulnerability into the machine learning pipeline. It is well known
that backdoors can be inserted into machine learning models through serving a
modified dataset to train on. Augmentation therefore presents a perfect
opportunity to perform this modification without requiring an initially
backdoored dataset. In this paper we present three backdoor attacks that can be
covertly inserted into data augmentation. Our attacks each insert a backdoor
using a different type of computer vision augmentation transform, covering
simple image transforms, GAN-based augmentation, and composition-based
augmentation. By inserting the backdoor using these augmentation transforms, we
make our backdoors difficult to detect, while still supporting arbitrary
backdoor functionality. We evaluate our attacks on a range of computer vision
benchmarks and demonstrate that an attacker is able to introduce backdoors
through just a malicious augmentation routine.Comment: 12 pages, 8 figure
Stealthy Backdoor Attack for Code Models
Code models, such as CodeBERT and CodeT5, offer general-purpose
representations of code and play a vital role in supporting downstream
automated software engineering tasks. Most recently, code models were revealed
to be vulnerable to backdoor attacks. A code model that is backdoor-attacked
can behave normally on clean examples but will produce pre-defined malicious
outputs on examples injected with triggers that activate the backdoors.
Existing backdoor attacks on code models use unstealthy and easy-to-detect
triggers. This paper aims to investigate the vulnerability of code models with
stealthy backdoor attacks. To this end, we propose AFRAIDOOR (Adversarial
Feature as Adaptive Backdoor). AFRAIDOOR achieves stealthiness by leveraging
adversarial perturbations to inject adaptive triggers into different inputs. We
evaluate AFRAIDOOR on three widely adopted code models (CodeBERT, PLBART and
CodeT5) and two downstream tasks (code summarization and method name
prediction). We find that around 85% of adaptive triggers in AFRAIDOOR bypass
the detection in the defense process. By contrast, only less than 12% of the
triggers from previous work bypass the defense. When the defense method is not
applied, both AFRAIDOOR and baselines have almost perfect attack success rates.
However, once a defense is applied, the success rates of baselines decrease
dramatically to 10.47% and 12.06%, while the success rate of AFRAIDOOR are
77.05% and 92.98% on the two tasks. Our finding exposes security weaknesses in
code models under stealthy backdoor attacks and shows that the state-of-the-art
defense method cannot provide sufficient protection. We call for more research
efforts in understanding security threats to code models and developing more
effective countermeasures.Comment: 18 pages, Under review of IEEE Transactions on Software Engineerin
- …