901 research outputs found
A randomized gradient-free attack on ReLU networks
It has recently been shown that neural networks but also other classifiers
are vulnerable to so called adversarial attacks e.g. in object recognition an
almost non-perceivable change of the image changes the decision of the
classifier. Relatively fast heuristics have been proposed to produce these
adversarial inputs but the problem of finding the optimal adversarial input,
that is with the minimal change of the input, is NP-hard. While methods based
on mixed-integer optimization which find the optimal adversarial input have
been developed, they do not scale to large networks. Currently, the attack
scheme proposed by Carlini and Wagner is considered to produce the best
adversarial inputs. In this paper we propose a new attack scheme for the class
of ReLU networks based on a direct optimization on the resulting linear
regions. In our experimental validation we improve in all except one experiment
out of 18 over the Carlini-Wagner attack with a relative improvement of up to
9\%. As our approach is based on the geometrical structure of ReLU networks, it
is less susceptible to defences targeting their functional properties.Comment: In GCPR 201
Scaleable input gradient regularization for adversarial robustness
In this work we revisit gradient regularization for adversarial robustness
with some new ingredients. First, we derive new per-image theoretical
robustness bounds based on local gradient information. These bounds strongly
motivate input gradient regularization. Second, we implement a scaleable
version of input gradient regularization which avoids double backpropagation:
adversarially robust ImageNet models are trained in 33 hours on four consumer
grade GPUs. Finally, we show experimentally and through theoretical
certification that input gradient regularization is competitive with
adversarial training. Moreover we demonstrate that gradient regularization does
not lead to gradient obfuscation or gradient masking
Enhancing Adversarial Defense by k-Winners-Take-All
We propose a simple change to existing neural network structures for better
defending against gradient-based adversarial attacks. Instead of using popular
activation functions (such as ReLU), we advocate the use of k-Winners-Take-All
(k-WTA) activation, a C0 discontinuous function that purposely invalidates the
neural network model's gradient at densely distributed input data points. The
proposed k-WTA activation can be readily used in nearly all existing networks
and training methods with no significant overhead. Our proposal is
theoretically rationalized. We analyze why the discontinuities in k-WTA
networks can largely prevent gradient-based search of adversarial examples and
why they at the same time remain innocuous to the network training. This
understanding is also empirically backed. We test k-WTA activation on various
network structures optimized by a training method, be it adversarial training
or not. In all cases, the robustness of k-WTA networks outperforms that of
traditional networks under white-box attacks
Excessive Invariance Causes Adversarial Vulnerability
Despite their impressive performance, deep neural networks exhibit striking
failures on out-of-distribution inputs. One core idea of adversarial example
research is to reveal neural network errors under such distribution shifts. We
decompose these errors into two complementary sources: sensitivity and
invariance. We show deep networks are not only too sensitive to task-irrelevant
changes of their input, as is well-known from epsilon-adversarial examples, but
are also too invariant to a wide range of task-relevant changes, thus making
vast regions in input space vulnerable to adversarial attacks. We show such
excessive invariance occurs across various tasks and architecture types. On
MNIST and ImageNet one can manipulate the class-specific content of almost any
image without changing the hidden activations. We identify an insufficiency of
the standard cross-entropy loss as a reason for these failures. Further, we
extend this objective based on an information-theoretic analysis so it
encourages the model to consider all task-dependent features in its decision.
This provides the first approach tailored explicitly to overcome excessive
invariance and resulting vulnerabilities
Certifiably Robust Interpretation in Deep Learning
Deep learning interpretation is essential to explain the reasoning behind
model predictions. Understanding the robustness of interpretation methods is
important especially in sensitive domains such as medical applications since
interpretation results are often used in downstream tasks. Although
gradient-based saliency maps are popular methods for deep learning
interpretation, recent works show that they can be vulnerable to adversarial
attacks. In this paper, we address this problem and provide a certifiable
defense method for deep learning interpretation. We show that a sparsified
version of the popular SmoothGrad method, which computes the average saliency
maps over random perturbations of the input, is certifiably robust against
adversarial perturbations. We obtain this result by extending recent bounds for
certifiably robust smooth classifiers to the interpretation setting.
Experiments on ImageNet samples validate our theory
First-order Adversarial Vulnerability of Neural Networks and Input Dimension
Over the past few years, neural networks were proven vulnerable to
adversarial images: targeted but imperceptible image perturbations lead to
drastically different predictions. We show that adversarial vulnerability
increases with the gradients of the training objective when viewed as a
function of the inputs. Surprisingly, vulnerability does not depend on network
topology: for many standard network architectures, we prove that at
initialization, the -norm of these gradients grows as the square root
of the input dimension, leaving the networks increasingly vulnerable with
growing image size. We empirically show that this dimension dependence persists
after either usual or robust training, but gets attenuated with higher
regularization.Comment: Paper previously called: "Adversarial Vulnerability of Neural
Networks Increases with Input Dimension". 9 pages main text and references,
11 pages appendix, 14 figure
GradDiv: Adversarial Robustness of Randomized Neural Networks via Gradient Diversity Regularization
Deep learning is vulnerable to adversarial examples. Many defenses based on
randomized neural networks have been proposed to solve the problem, but fail to
achieve robustness against attacks using proxy gradients such as the
Expectation over Transformation (EOT) attack. We investigate the effect of the
adversarial attacks using proxy gradients on randomized neural networks and
demonstrate that it highly relies on the directional distribution of the loss
gradients of the randomized neural network. We show in particular that proxy
gradients are less effective when the gradients are more scattered. To this
end, we propose Gradient Diversity (GradDiv) regularizations that minimize the
concentration of the gradients to build a robust randomized neural network. Our
experiments on MNIST, CIFAR10, and STL10 show that our proposed GradDiv
regularizations improve the adversarial robustness of randomized neural
networks against a variety of state-of-the-art attack methods. Moreover, our
method efficiently reduces the transferability among sample models of
randomized neural networks
ReLU Code Space: A Basis for Rating Network Quality Besides Accuracy
We propose a new metric space of ReLU activation codes equipped with a
truncated Hamming distance which establishes an isometry between its elements
and polyhedral bodies in the input space which have recently been shown to be
strongly related to safety, robustness, and confidence. This isometry allows
the efficient computation of adjacency relations between the polyhedral bodies.
Experiments on MNIST and CIFAR-10 indicate that information besides accuracy
might be stored in the code space.Comment: in ICLR 2020 Workshop on Neural Architecture Search (NAS 2020
Towards Certifying L-infinity Robustness using Neural Networks with L-inf-dist Neurons
It is well-known that standard neural networks, even with a high
classification accuracy, are vulnerable to small -norm bounded
adversarial perturbations. Although many attempts have been made, most previous
works either can only provide empirical verification of the defense to a
particular attack method, or can only develop a certified guarantee of the
model robustness in limited scenarios. In this paper, we seek for a new
approach to develop a theoretically principled neural network that inherently
resists perturbations. In particular, we design a novel neuron
that uses -distance as its basic operation (which we call
-dist neuron), and show that any neural network constructed with
-dist neurons (called -dist net) is naturally a
1-Lipschitz function with respect to -norm. This directly provides
a rigorous guarantee of the certified robustness based on the margin of
prediction outputs. We then prove that such networks have enough expressive
power to approximate any 1-Lipschitz function with robust generalization
guarantee. We further provide a holistic training strategy that can greatly
alleviate optimization difficulties. Experimental results show that using
-dist nets as basic building blocks, we consistently achieve
state-of-the-art performance on commonly used datasets: 93.09% certified
accuracy on MNIST (), 35.42% on CIFAR-10 () and
16.31% on TinyImageNet ().Comment: Appearing at International Conference on Machine Learning (ICML) 202
Analysis of Confident-Classifiers for Out-of-distribution Detection
Discriminatively trained neural classifiers can be trusted, only when the
input data comes from the training distribution (in-distribution). Therefore,
detecting out-of-distribution (OOD) samples is very important to avoid
classification errors. In the context of OOD detection for image
classification, one of the recent approaches proposes training a classifier
called "confident-classifier" by minimizing the standard cross-entropy loss on
in-distribution samples and minimizing the KL divergence between the predictive
distribution of OOD samples in the low-density regions of in-distribution and
the uniform distribution (maximizing the entropy of the outputs). Thus, the
samples could be detected as OOD if they have low confidence or high entropy.
In this paper, we analyze this setting both theoretically and experimentally.
We conclude that the resulting confident-classifier still yields arbitrarily
high confidence for OOD samples far away from the in-distribution. We instead
suggest training a classifier by adding an explicit "reject" class for OOD
samples.Comment: SafeML 2019 ICLR workshop pape
- …