3,950 research outputs found
Certified Defenses against Adversarial Examples
While neural networks have achieved high accuracy on standard image
classification benchmarks, their accuracy drops to nearly zero in the presence
of small adversarial perturbations to test inputs. Defenses based on
regularization and adversarial training have been proposed, but often followed
by new, stronger attacks that defeat these defenses. Can we somehow end this
arms race? In this work, we study this problem for neural networks with one
hidden layer. We first propose a method based on a semidefinite relaxation that
outputs a certificate that for a given network and test input, no attack can
force the error to exceed a certain value. Second, as this certificate is
differentiable, we jointly optimize it with the network parameters, providing
an adaptive regularizer that encourages robustness against all attacks. On
MNIST, our approach produces a network and a certificate that no attack that
perturbs each pixel by at most \epsilon = 0.1 can cause more than 35% test
error
Adversarial Image Translation: Unrestricted Adversarial Examples in Face Recognition Systems
Thanks to recent advances in deep neural networks (DNNs), face recognition
systems have become highly accurate in classifying a large number of face
images. However, recent studies have found that DNNs could be vulnerable to
adversarial examples, raising concerns about the robustness of such systems.
Adversarial examples that are not restricted to small perturbations could be
more serious since conventional certified defenses might be ineffective against
them. To shed light on the vulnerability to such adversarial examples, we
propose a flexible and efficient method for generating unrestricted adversarial
examples using image translation techniques. Our method enables us to translate
a source image into any desired facial appearance with large perturbations to
deceive target face recognition systems. Our experimental results indicate that
our method achieved about and attack success rates under white- and
black-box settings, respectively, and that the translated images are
perceptually realistic and maintain the identifiability of the individual while
the perturbations are large enough to bypass certified defenses.Comment: Kazuya Kakizaki and Kosuke Yoshida share equal contributions.
Accepted at AAAI Workshop on Artificial Intelligence Safety (2020
Semidefinite relaxations for certifying robustness to adversarial examples
Despite their impressive performance on diverse tasks, neural networks fail
catastrophically in the presence of adversarial inputs---imperceptibly but
adversarially perturbed versions of natural inputs. We have witnessed an arms
race between defenders who attempt to train robust networks and attackers who
try to construct adversarial examples. One promise of ending the arms race is
developing certified defenses, ones which are provably robust against all
attackers in some family. These certified defenses are based on convex
relaxations which construct an upper bound on the worst case loss over all
attackers in the family. Previous relaxations are loose on networks that are
not trained against the respective relaxation. In this paper, we propose a new
semidefinite relaxation for certifying robustness that applies to arbitrary
ReLU networks. We show that our proposed relaxation is tighter than previous
relaxations and produces meaningful robustness guarantees on three different
"foreign networks" whose training objectives are agnostic to our proposed
relaxation.Comment: To appear at NIPS 201
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples
We identify obfuscated gradients, a kind of gradient masking, as a phenomenon
that leads to a false sense of security in defenses against adversarial
examples. While defenses that cause obfuscated gradients appear to defeat
iterative optimization-based attacks, we find defenses relying on this effect
can be circumvented. We describe characteristic behaviors of defenses
exhibiting the effect, and for each of the three types of obfuscated gradients
we discover, we develop attack techniques to overcome it. In a case study,
examining non-certified white-box-secure defenses at ICLR 2018, we find
obfuscated gradients are a common occurrence, with 7 of 9 defenses relying on
obfuscated gradients. Our new attacks successfully circumvent 6 completely, and
1 partially, in the original threat model each paper considers.Comment: ICML 2018. Source code at
https://github.com/anishathalye/obfuscated-gradient
Certified Robustness to Adversarial Examples with Differential Privacy
Adversarial examples that fool machine learning models, particularly deep
neural networks, have been a topic of intense research interest, with attacks
and defenses being developed in a tight back-and-forth. Most past defenses are
best effort and have been shown to be vulnerable to sophisticated attacks.
Recently a set of certified defenses have been introduced, which provide
guarantees of robustness to norm-bounded attacks, but they either do not scale
to large datasets or are limited in the types of models they can support. This
paper presents the first certified defense that both scales to large networks
and datasets (such as Google's Inception network for ImageNet) and applies
broadly to arbitrary model types. Our defense, called PixelDP, is based on a
novel connection between robustness against adversarial examples and
differential privacy, a cryptographically-inspired formalism, that provides a
rigorous, generic, and flexible foundation for defense
On Certifying Robustness against Backdoor Attacks via Randomized Smoothing
Backdoor attack is a severe security threat to deep neural networks (DNNs).
We envision that, like adversarial examples, there will be a cat-and-mouse game
for backdoor attacks, i.e., new empirical defenses are developed to defend
against backdoor attacks but they are soon broken by strong adaptive backdoor
attacks. To prevent such cat-and-mouse game, we take the first step towards
certified defenses against backdoor attacks. Specifically, in this work, we
study the feasibility and effectiveness of certifying robustness against
backdoor attacks using a recent technique called randomized smoothing.
Randomized smoothing was originally developed to certify robustness against
adversarial examples. We generalize randomized smoothing to defend against
backdoor attacks. Our results show the theoretical feasibility of using
randomized smoothing to certify robustness against backdoor attacks. However,
we also find that existing randomized smoothing methods have limited
effectiveness at defending against backdoor attacks, which highlight the needs
of new theory and methods to certify robustness against backdoor attacks.Comment: CVPR 2020 Workshop on Adversarial Machine Learning in Computer
Vision, 2020. DeepMind Best Extended Abstrac
(De)Randomized Smoothing for Certifiable Defense against Patch Attacks
Patch adversarial attacks on images, in which the attacker can distort pixels
within a region of bounded size, are an important threat model since they
provide a quantitative model for physical adversarial attacks. In this paper,
we introduce a certifiable defense against patch attacks that guarantees for a
given image and patch attack size, no patch adversarial examples exist. Our
method is related to the broad class of randomized smoothing robustness schemes
which provide high-confidence probabilistic robustness certificates. By
exploiting the fact that patch attacks are more constrained than general sparse
attacks, we derive meaningfully large robustness certificates against them.
Additionally, in contrast to smoothing-based defenses against L_p and sparse
attacks, our defense method against patch attacks is de-randomized, yielding
improved, deterministic certificates. Compared to the existing patch
certification method proposed by Chiang et al. (2020), which relies on interval
bound propagation, our method can be trained significantly faster, achieves
high clean and certified robust accuracy on CIFAR-10, and provides certificates
at ImageNet scale. For example, for a 5-by-5 patch attack on CIFAR-10, our
method achieves up to around 57.6% certified accuracy (with a classifier with
around 83.8% clean accuracy), compared to at most 30.3% certified accuracy for
the existing method (with a classifier with around 47.8% clean accuracy). Our
results effectively establish a new state-of-the-art of certifiable defense
against patch attacks on CIFAR-10 and ImageNet. Code is available at
https://github.com/alevine0/patchSmoothing.Comment: NeurIPS 202
Towards Understanding the Adversarial Vulnerability of Skeleton-based Action Recognition
Skeleton-based action recognition has attracted increasing attention due to
its strong adaptability to dynamic circumstances and potential for broad
applications such as autonomous and anonymous surveillance. With the help of
deep learning techniques, it has also witnessed substantial progress and
currently achieved around 90\% accuracy in benign environment. On the other
hand, research on the vulnerability of skeleton-based action recognition under
different adversarial settings remains scant, which may raise security concerns
about deploying such techniques into real-world systems. However, filling this
research gap is challenging due to the unique physical constraints of skeletons
and human actions. In this paper, we attempt to conduct a thorough study
towards understanding the adversarial vulnerability of skeleton-based action
recognition. We first formulate generation of adversarial skeleton actions as a
constrained optimization problem by representing or approximating the
physiological and physical constraints with mathematical formulations. Since
the primal optimization problem with equality constraints is intractable, we
propose to solve it by optimizing its unconstrained dual problem using ADMM. We
then specify an efficient plug-in defense, inspired by recent theories and
empirical observations, against the adversarial skeleton actions. Extensive
evaluations demonstrate the effectiveness of the attack and defense method
under different settings
PatchGuard: A Provably Robust Defense against Adversarial Patches via Small Receptive Fields and Masking
Localized adversarial patches aim to induce misclassification in machine
learning models by arbitrarily modifying pixels within a restricted region of
an image. Such attacks can be realized in the physical world by attaching the
adversarial patch to the object to be misclassified, and defending against such
attacks is an unsolved/open problem. In this paper, we propose a general
defense framework called PatchGuard that can achieve high provable robustness
while maintaining high clean accuracy against localized adversarial patches.
The cornerstone of PatchGuard involves the use of CNNs with small receptive
fields to impose a bound on the number of features corrupted by an adversarial
patch. Given a bounded number of corrupted features, the problem of designing
an adversarial patch defense reduces to that of designing a secure feature
aggregation mechanism. Towards this end, we present our robust masking defense
that robustly detects and masks corrupted features to recover the correct
prediction. Notably, we can prove the robustness of our defense against any
adversary within our threat model. Our extensive evaluation on ImageNet,
ImageNette (a 10-class subset of ImageNet), and CIFAR-10 datasets demonstrates
that our defense achieves state-of-the-art performance in terms of both
provable robust accuracy and clean accuracy.Comment: USENIX Security Symposium 2021; extended technical repor
Cost-Sensitive Robustness against Adversarial Examples
Several recent works have developed methods for training classifiers that are
certifiably robust against norm-bounded adversarial perturbations. These
methods assume that all the adversarial transformations are equally important,
which is seldom the case in real-world applications. We advocate for
cost-sensitive robustness as the criteria for measuring the classifier's
performance for tasks where some adversarial transformation are more important
than others. We encode the potential harm of each adversarial transformation in
a cost matrix, and propose a general objective function to adapt the robust
training method of Wong & Kolter (2018) to optimize for cost-sensitive
robustness. Our experiments on simple MNIST and CIFAR10 models with a variety
of cost matrices show that the proposed approach can produce models with
substantially reduced cost-sensitive robust error, while maintaining
classification accuracy.Comment: ICLR final versio
- …