14,484 research outputs found
Spectral Signatures in Backdoor Attacks
A recent line of work has uncovered a new form of data poisoning: so-called
\emph{backdoor} attacks. These attacks are particularly dangerous because they
do not affect a network's behavior on typical, benign data. Rather, the network
only deviates from its expected output when triggered by a perturbation planted
by an adversary.
In this paper, we identify a new property of all known backdoor attacks,
which we call \emph{spectral signatures}. This property allows us to utilize
tools from robust statistics to thwart the attacks. We demonstrate the efficacy
of these signatures in detecting and removing poisoned examples on real image
sets and state of the art neural network architectures. We believe that
understanding spectral signatures is a crucial first step towards designing ML
systems secure against such backdoor attacksComment: 16 pages, accepted to NIPS 201
On Cryptographic Attacks Using Backdoors for SAT
Propositional satisfiability (SAT) is at the nucleus of state-of-the-art
approaches to a variety of computationally hard problems, one of which is
cryptanalysis. Moreover, a number of practical applications of SAT can only be
tackled efficiently by identifying and exploiting a subset of formula's
variables called backdoor set (or simply backdoors). This paper proposes a new
class of backdoor sets for SAT used in the context of cryptographic attacks,
namely guess-and-determine attacks. The idea is to identify the best set of
backdoor variables subject to a statistically estimated hardness of the
guess-and-determine attack using a SAT solver. Experimental results on weakened
variants of the renowned encryption algorithms exhibit advantage of the
proposed approach compared to the state of the art in terms of the estimated
hardness of the resulting guess-and-determine attacks
A new Backdoor Attack in CNNs by training set corruption without label poisoning
Backdoor attacks against CNNs represent a new threat against deep learning
systems, due to the possibility of corrupting the training set so to induce an
incorrect behaviour at test time. To avoid that the trainer recognises the
presence of the corrupted samples, the corruption of the training set must be
as stealthy as possible. Previous works have focused on the stealthiness of the
perturbation injected into the training samples, however they all assume that
the labels of the corrupted samples are also poisoned. This greatly reduces the
stealthiness of the attack, since samples whose content does not agree with the
label can be identified by visual inspection of the training set or by running
a pre-classification step. In this paper we present a new backdoor attack
without label poisoning Since the attack works by corrupting only samples of
the target class, it has the additional advantage that it does not need to
identify beforehand the class of the samples to be attacked at test time.
Results obtained on the MNIST digits recognition task and the traffic signs
classification task show that backdoor attacks without label poisoning are
indeed possible, thus raising a new alarm regarding the use of deep learning in
security-critical applications
- …
