8 research outputs found
Backdoor Smoothing: Demystifying Backdoor Attacks on Deep Neural Networks
Backdoor attacks mislead machine-learning models to output an
attacker-specified class when presented a specific trigger at test time. These
attacks require poisoning the training data to compromise the learning
algorithm, e.g., by injecting poisoning samples containing the trigger into the
training set, along with the desired class label. Despite the increasing number
of studies on backdoor attacks and defenses, the underlying factors affecting
the success of backdoor attacks, along with their impact on the learning
algorithm, are not yet well understood. In this work, we aim to shed light on
this issue by unveiling that backdoor attacks induce a smoother decision
function around the triggered samples -- a phenomenon which we refer to as
\textit{backdoor smoothing}. To quantify backdoor smoothing, we define a
measure that evaluates the uncertainty associated to the predictions of a
classifier around the input samples.
Our experiments show that smoothness increases when the trigger is added to
the input samples, and that this phenomenon is more pronounced for more
successful attacks.
We also provide preliminary evidence that backdoor triggers are not the only
smoothing-inducing patterns, but that also other artificial patterns can be
detected by our approach, paving the way towards understanding the limitations
of current defenses and designing novel ones.Comment: 9 pages, 7 figures, under submissio
Accelerated Policy Evaluation: Learning Adversarial Environments with Adaptive Importance Sampling
The evaluation of rare but high-stakes events remains one of the main
difficulties in obtaining reliable policies from intelligent agents, especially
in large or continuous state/action spaces where limited scalability enforces
the use of a prohibitively large number of testing iterations. On the other
hand, a biased or inaccurate policy evaluation in a safety-critical system
could potentially cause unexpected catastrophic failures during deployment. In
this paper, we propose the Accelerated Policy Evaluation (APE) method, which
simultaneously uncovers rare events and estimates the rare event probability in
Markov decision processes. The APE method treats the environment nature as an
adversarial agent and learns towards, through adaptive importance sampling, the
zero-variance sampling distribution for the policy evaluation. Moreover, APE is
scalable to large discrete or continuous spaces by incorporating function
approximators. We investigate the convergence properties of proposed algorithms
under suitable regularity conditions. Our empirical studies show that APE
estimates rare event probability with a smaller variance while only using
orders of magnitude fewer samples compared to baseline methods in both
multi-agent and single-agent environments.Comment: 10 pages, 5 figure
Why is Machine Learning Security so hard?
The increase of available data and computing power has fueled a wide application of machine learning (ML). At the same time, security concerns are raised: ML models were shown to be easily fooled by slight perturbations on their inputs. Furthermore, by querying a model and analyzing output and input pairs, an attacker can infer the training data or replicate the model, thereby harming the owner’s intellectual property. Also, altering the training data can lure the model into producing specific or generally wrong outputs at test time. So far, none of the attacks studied in the field has been satisfactorily defended. In this work, we shed light on these difficulties. We first consider classifier evasion or adversarial examples. The computation of such examples is an inherent problem, as opposed to a bug that can be fixed. We also show that adversarial examples often transfer from one model to another, different model. Afterwards, we point out that the detection of backdoors (a training-time attack) is hindered as natural backdoor-like patterns occur even in benign neural networks. The question whether a pattern is benign or malicious then turns into a question of intention, which is hard to tackle. A different kind of complexity is added with the large libraries nowadays in use to implement machine learning. We introduce an attack that alters the library, thereby decreasing the accuracy a user can achieve. In case the user is aware of the attack, however, it is straightforward to defeat. This is not the case for most classical attacks described above. Additional difficulty is added if several attacks are studied at once: we show that even if the model is configured for one attack to be less effective, another attack might perform even better. We conclude by pointing out the necessity of understanding the ML model under attack. On the one hand, as we have seen throughout the examples given here, understanding precedes defenses and attacks. On the other hand, an attack, even a failed one, often yields new insights and knowledge about the algorithm studied.This work was supported by the German Federal Ministry of Education and Research (BMBF) through funding for the Center for IT-Security,Privacy and Accountability (CISPA) (FKZ: 16KIS0753