634 research outputs found
Spectral Signatures in Backdoor Attacks
A recent line of work has uncovered a new form of data poisoning: so-called
\emph{backdoor} attacks. These attacks are particularly dangerous because they
do not affect a network's behavior on typical, benign data. Rather, the network
only deviates from its expected output when triggered by a perturbation planted
by an adversary.
In this paper, we identify a new property of all known backdoor attacks,
which we call \emph{spectral signatures}. This property allows us to utilize
tools from robust statistics to thwart the attacks. We demonstrate the efficacy
of these signatures in detecting and removing poisoned examples on real image
sets and state of the art neural network architectures. We believe that
understanding spectral signatures is a crucial first step towards designing ML
systems secure against such backdoor attacksComment: 16 pages, accepted to NIPS 201
WaveAttack: Asymmetric Frequency Obfuscation-based Backdoor Attacks Against Deep Neural Networks
Due to the popularity of Artificial Intelligence (AI) technology, numerous
backdoor attacks are designed by adversaries to mislead deep neural network
predictions by manipulating training samples and training processes. Although
backdoor attacks are effective in various real scenarios, they still suffer
from the problems of both low fidelity of poisoned samples and non-negligible
transfer in latent space, which make them easily detectable by existing
backdoor detection algorithms. To overcome the weakness, this paper proposes a
novel frequency-based backdoor attack method named WaveAttack, which obtains
image high-frequency features through Discrete Wavelet Transform (DWT) to
generate backdoor triggers. Furthermore, we introduce an asymmetric frequency
obfuscation method, which can add an adaptive residual in the training and
inference stage to improve the impact of triggers and further enhance the
effectiveness of WaveAttack. Comprehensive experimental results show that
WaveAttack not only achieves higher stealthiness and effectiveness, but also
outperforms state-of-the-art (SOTA) backdoor attack methods in the fidelity of
images by up to 28.27\% improvement in PSNR, 1.61\% improvement in SSIM, and
70.59\% reduction in IS
- …