1,776 research outputs found
Adversarial Robustness Against the Union of Multiple Perturbation Models
Owing to the susceptibility of deep learning systems to adversarial attacks,
there has been a great deal of work in developing (both empirically and
certifiably) robust classifiers. While most work has defended against a single
type of attack, recent work has looked at defending against multiple
perturbation models using simple aggregations of multiple attacks. However,
these methods can be difficult to tune, and can easily result in imbalanced
degrees of robustness to individual perturbation models, resulting in a
sub-optimal worst-case loss over the union. In this work, we develop a natural
generalization of the standard PGD-based procedure to incorporate multiple
perturbation models into a single attack, by taking the worst-case over all
steepest descent directions. This approach has the advantage of directly
converging upon a trade-off between different perturbation models which
minimizes the worst-case performance over the union. With this approach, we are
able to train standard architectures which are simultaneously robust against
, , and attacks, outperforming past approaches on
the MNIST and CIFAR10 datasets and achieving adversarial accuracy of 47.0%
against the union of (, , ) perturbations with
radius = (0.03, 0.5, 12) on the latter, improving upon previous approaches
which achieve 40.6% accuracy.Comment: ICML 2020 Final Versio
Adversarial Training and Robustness for Multiple Perturbations
Defenses against adversarial examples, such as adversarial training, are
typically tailored to a single perturbation type (e.g., small
-noise). For other perturbations, these defenses offer no
guarantees and, at times, even increase the model's vulnerability. Our aim is
to understand the reasons underlying this robustness trade-off, and to train
models that are simultaneously robust to multiple perturbation types. We prove
that a trade-off in robustness to different types of -bounded and
spatial perturbations must exist in a natural and simple statistical setting.
We corroborate our formal analysis by demonstrating similar robustness
trade-offs on MNIST and CIFAR10. Building upon new multi-perturbation
adversarial training schemes, and a novel efficient attack for finding
-bounded adversarial examples, we show that no model trained against
multiple attacks achieves robustness competitive with that of models trained on
each attack individually. In particular, we uncover a pernicious
gradient-masking phenomenon on MNIST, which causes adversarial training with
first-order and adversaries to achieve merely
accuracy. Our results question the viability and computational
scalability of extending adversarial robustness, and adversarial training, to
multiple perturbation types.Comment: Accepted at NeurIPS 2019, 23 page
MULDEF: Multi-model-based Defense Against Adversarial Examples for Neural Networks
Despite being popularly used in many applications, neural network models have
been found to be vulnerable to adversarial examples, i.e., carefully crafted
examples aiming to mislead machine learning models. Adversarial examples can
pose potential risks on safety and security critical applications. However,
existing defense approaches are still vulnerable to attacks, especially in a
white-box attack scenario. To address this issue, we propose a new defense
approach, named MulDef, based on robustness diversity. Our approach consists of
(1) a general defense framework based on multiple models and (2) a technique
for generating these multiple models to achieve high defense capability. In
particular, given a target model, our framework includes multiple models
(constructed from the target model) to form a model family. The model family is
designed to achieve robustness diversity (i.e., an adversarial example
successfully attacking one model cannot succeed in attacking other models in
the family). At runtime, a model is randomly selected from the family to be
applied on each input example. Our general framework can inspire rich future
research to construct a desirable model family achieving higher robustness
diversity. Our evaluation results show that MulDef (with only up to 5 models in
the family) can substantially improve the target model's accuracy on
adversarial examples by 22-74% in a white-box attack scenario, while
maintaining similar accuracy on legitimate examples
Enhancing Adversarial Defense by k-Winners-Take-All
We propose a simple change to existing neural network structures for better
defending against gradient-based adversarial attacks. Instead of using popular
activation functions (such as ReLU), we advocate the use of k-Winners-Take-All
(k-WTA) activation, a C0 discontinuous function that purposely invalidates the
neural network model's gradient at densely distributed input data points. The
proposed k-WTA activation can be readily used in nearly all existing networks
and training methods with no significant overhead. Our proposal is
theoretically rationalized. We analyze why the discontinuities in k-WTA
networks can largely prevent gradient-based search of adversarial examples and
why they at the same time remain innocuous to the network training. This
understanding is also empirically backed. We test k-WTA activation on various
network structures optimized by a training method, be it adversarial training
or not. In all cases, the robustness of k-WTA networks outperforms that of
traditional networks under white-box attacks
Adversarial Examples for Semantic Segmentation and Object Detection
It has been well demonstrated that adversarial examples, i.e., natural images
with visually imperceptible perturbations added, generally exist for deep
networks to fail on image classification. In this paper, we extend adversarial
examples to semantic segmentation and object detection which are much more
difficult. Our observation is that both segmentation and detection are based on
classifying multiple targets on an image (e.g., the basic target is a pixel or
a receptive field in segmentation, and an object proposal in detection), which
inspires us to optimize a loss function over a set of pixels/proposals for
generating adversarial perturbations. Based on this idea, we propose a novel
algorithm named Dense Adversary Generation (DAG), which generates a large
family of adversarial examples, and applies to a wide range of state-of-the-art
deep networks for segmentation and detection. We also find that the adversarial
perturbations can be transferred across networks with different training data,
based on different architectures, and even for different recognition tasks. In
particular, the transferability across networks with the same architecture is
more significant than in other cases. Besides, summing up heterogeneous
perturbations often leads to better transfer performance, which provides an
effective method of black-box adversarial attack.Comment: To appear in ICCV 201
Task-generalizable Adversarial Attack based on Perceptual Metric
Deep neural networks (DNNs) can be easily fooled by adding human
imperceptible perturbations to the images. These perturbed images are known as
`adversarial examples' and pose a serious threat to security and safety
critical systems. A litmus test for the strength of adversarial examples is
their transferability across different DNN models in a black box setting (i.e.
when the target model's architecture and parameters are not known to attacker).
Current attack algorithms that seek to enhance adversarial transferability work
on the decision level i.e. generate perturbations that alter the network
decisions. This leads to two key limitations: (a) An attack is dependent on the
task-specific loss function (e.g. softmax cross-entropy for object recognition)
and therefore does not generalize beyond its original task. (b) The adversarial
examples are specific to the network architecture and demonstrate poor
transferability to other network architectures. We propose a novel approach to
create adversarial examples that can broadly fool different networks on
multiple tasks. Our approach is based on the following intuition: "Perpetual
metrics based on neural network features are highly generalizable and show
excellent performance in measuring and stabilizing input distortions. Therefore
an ideal attack that creates maximum distortions in the network feature space
should realize highly transferable examples". We report extensive experiments
to show how adversarial examples generalize across multiple networks for
classification, object detection and segmentation tasks
Proper measure for adversarial robustness
This paper analyzes the problems of adversarial accuracy and adversarial
training. We argue that standard adversarial accuracy fails to properly measure
the robustness of classifiers. In order to handle the problems of the standard
adversarial accuracy, we introduce a new measure for the robustness of
classifiers called genuine adversarial accuracy. It can measure adversarial
robustness of classifiers without trading off accuracy on clean data and
accuracy on the adversarially perturbed samples. In addition, it does not favor
a model with invariance-based adversarial examples, samples whose predicted
classes are unchanged even if the perceptual classes are changed. We prove that
a single nearest neighbor (1-NN) classifier is the most robust classifier
according to genuine adversarial accuracy for given data and a distance metric
when the class for each data point is unique. Based on this result, we suggest
that using poor distance metric might be the reason for the tradeoff between
test accuracy and norm-based test adversarial robustness. Codes for
experiments and projections for genuine adversarial accuracy are available at
https://github.com/hjk92g/proper_measure_robustness.Comment: 18 pages. This paper supersedes the paper "Finding a human-like
classifier". (https://openreview.net/forum?id=BJeGFs9FsH
Analytical Moment Regularizer for Gaussian Robust Networks
Despite the impressive performance of deep neural networks (DNNs) on numerous
vision tasks, they still exhibit yet-to-understand uncouth behaviours. One
puzzling behaviour is the subtle sensitive reaction of DNNs to various noise
attacks. Such a nuisance has strengthened the line of research around
developing and training noise-robust networks. In this work, we propose a new
training regularizer that aims to minimize the probabilistic expected training
loss of a DNN subject to a generic Gaussian input. We provide an efficient and
simple approach to approximate such a regularizer for arbitrary deep networks.
This is done by leveraging the analytic expression of the output mean of a
shallow neural network; avoiding the need for the memory and computationally
expensive data augmentation. We conduct extensive experiments on LeNet and
AlexNet on various datasets including MNIST, CIFAR10, and CIFAR100
demonstrating the effectiveness of our proposed regularizer. In particular, we
show that networks that are trained with the proposed regularizer benefit from
a boost in robustness equivalent to performing 3-21 folds of data augmentation
Generalizable Adversarial Training via Spectral Normalization
Deep neural networks (DNNs) have set benchmarks on a wide array of supervised
learning tasks. Trained DNNs, however, often lack robustness to minor
adversarial perturbations to the input, which undermines their true
practicality. Recent works have increased the robustness of DNNs by fitting
networks using adversarially-perturbed training samples, but the improved
performance can still be far below the performance seen in non-adversarial
settings. A significant portion of this gap can be attributed to the decrease
in generalization performance due to adversarial training. In this work, we
extend the notion of margin loss to adversarial settings and bound the
generalization error for DNNs trained under several well-known gradient-based
attack schemes, motivating an effective regularization scheme based on spectral
normalization of the DNN's weight matrices. We also provide a
computationally-efficient method for normalizing the spectral norm of
convolutional layers with arbitrary stride and padding schemes in deep
convolutional networks. We evaluate the power of spectral normalization
extensively on combinations of datasets, network architectures, and adversarial
training schemes. The code is available at
https://github.com/jessemzhang/dl_spectral_normalization
Towards Robustness against Unsuspicious Adversarial Examples
Despite the remarkable success of deep neural networks, significant concerns
have emerged about their robustness to adversarial perturbations to inputs.
While most attacks aim to ensure that these are imperceptible, physical
perturbation attacks typically aim for being unsuspicious, even if perceptible.
However, there is no universal notion of what it means for adversarial examples
to be unsuspicious. We propose an approach for modeling suspiciousness by
leveraging cognitive salience. Specifically, we split an image into foreground
(salient region) and background (the rest), and allow significantly larger
adversarial perturbations in the background, while ensuring that cognitive
salience of background remains low. We describe how to compute the resulting
non-salience-preserving dual-perturbation attacks on classifiers. We then
experimentally demonstrate that our attacks indeed do not significantly change
perceptual salience of the background, but are highly effective against
classifiers robust to conventional attacks. Furthermore, we show that
adversarial training with dual-perturbation attacks yields classifiers that are
more robust to these than state-of-the-art robust learning approaches, and
comparable in terms of robustness to conventional attacks.Comment: v2.
- β¦