977 research outputs found
Adversarial Defense via Data Dependent Activation Function and Total Variation Minimization
We improve the robustness of Deep Neural Net (DNN) to adversarial attacks by
using an interpolating function as the output activation. This data-dependent
activation remarkably improves both the generalization and robustness of DNN.
In the CIFAR10 benchmark, we raise the robust accuracy of the adversarially
trained ResNet20 from to under the state-of-the-art
Iterative Fast Gradient Sign Method (IFGSM) based adversarial attack. When we
combine this data-dependent activation with total variation minimization on
adversarial images and training data augmentation, we achieve an improvement in
robust accuracy by 38.9 for ResNet56 under the strongest IFGSM attack.
Furthermore, We provide an intuitive explanation of our defense by analyzing
the geometry of the feature space.Comment: 17 pages, 6 figure
Mathematical Analysis of Adversarial Attacks
In this paper, we analyze efficacy of the fast gradient sign method (FGSM)
and the Carlini-Wagner's L2 (CW-L2) attack. We prove that, within a certain
regime, the untargeted FGSM can fool any convolutional neural nets (CNNs) with
ReLU activation; the targeted FGSM can mislead any CNNs with ReLU activation to
classify any given image into any prescribed class. For a special two-layer
neural network: a linear layer followed by the softmax output activation, we
show that the CW-L2 attack increases the ratio of the classification
probability between the target and ground truth classes. Moreover, we provide
numerical results to verify all our theoretical results.Comment: 21 page
White-Box Adversarial Defense via Self-Supervised Data Estimation
In this paper, we study the problem of how to defend classifiers against
adversarial attacks that fool the classifiers using subtly modified input data.
In contrast to previous works, here we focus on the white-box adversarial
defense where the attackers are granted full access to not only the classifiers
but also defenders to produce as strong attacks as possible. In such a context
we propose viewing a defender as a functional, a higher-order function that
takes functions as its argument to represent a function space, rather than
fixed functions conventionally. From this perspective, a defender should be
realized and optimized individually for each adversarial input. To this end, we
propose RIDE, an efficient and provably convergent self-supervised learning
algorithm for individual data estimation to protect the predictions from
adversarial attacks. We demonstrate the significant improvement of adversarial
defense performance on image recognition, eg, 98%, 76%, 43% test accuracy on
MNIST, CIFAR-10, and ImageNet datasets respectively under the state-of-the-art
BPDA attacker
Graph Interpolating Activation Improves Both Natural and Robust Accuracies in Data-Efficient Deep Learning
Improving the accuracy and robustness of deep neural nets (DNNs) and adapting
them to small training data are primary tasks in deep learning research. In
this paper, we replace the output activation function of DNNs, typically the
data-agnostic softmax function, with a graph Laplacian-based high dimensional
interpolating function which, in the continuum limit, converges to the solution
of a Laplace-Beltrami equation on a high dimensional manifold. Furthermore, we
propose end-to-end training and testing algorithms for this new architecture.
The proposed DNN with graph interpolating activation integrates the advantages
of both deep learning and manifold learning. Compared to the conventional DNNs
with the softmax function as output activation, the new framework demonstrates
the following major advantages: First, it is better applicable to
data-efficient learning in which we train high capacity DNNs without using a
large number of training data. Second, it remarkably improves both natural
accuracy on the clean images and robust accuracy on the adversarial images
crafted by both white-box and black-box adversarial attacks. Third, it is a
natural choice for semi-supervised learning. For reproducibility, the code is
available at \url{https://github.com/BaoWangMath/DNN-DataDependentActivation}.Comment: 34 pages, 10 figure
PacGAN: The power of two samples in generative adversarial networks
Generative adversarial networks (GANs) are innovative techniques for learning
generative models of complex data distributions from samples. Despite
remarkable recent improvements in generating realistic images, one of their
major shortcomings is the fact that in practice, they tend to produce samples
with little diversity, even when trained on diverse datasets. This phenomenon,
known as mode collapse, has been the main focus of several recent advances in
GANs. Yet there is little understanding of why mode collapse happens and why
existing approaches are able to mitigate mode collapse. We propose a principled
approach to handling mode collapse, which we call packing. The main idea is to
modify the discriminator to make decisions based on multiple samples from the
same class, either real or artificially generated. We borrow analysis tools
from binary hypothesis testing---in particular the seminal result of Blackwell
[Bla53]---to prove a fundamental connection between packing and mode collapse.
We show that packing naturally penalizes generators with mode collapse, thereby
favoring generator distributions with less mode collapse during the training
process. Numerical experiments on benchmark datasets suggests that packing
provides significant improvements in practice as well.Comment: 49 pages, 24 figure
ResNets Ensemble via the Feynman-Kac Formalism to Improve Natural and Robust Accuracies
Empirical adversarial risk minimization (EARM) is a widely used mathematical
framework to robustly train deep neural nets (DNNs) that are resistant to
adversarial attacks. However, both natural and robust accuracies, in
classifying clean and adversarial images, respectively, of the trained robust
models are far from satisfactory. In this work, we unify the theory of optimal
control of transport equations with the practice of training and testing of
ResNets. Based on this unified viewpoint, we propose a simple yet effective
ResNets ensemble algorithm to boost the accuracy of the robustly trained model
on both clean and adversarial images. The proposed algorithm consists of two
components: First, we modify the base ResNets by injecting a variance specified
Gaussian noise to the output of each residual mapping. Second, we average over
the production of multiple jointly trained modified ResNets to get the final
prediction. These two steps give an approximation to the Feynman-Kac formula
for representing the solution of a transport equation with viscosity, or a
convection-diffusion equation. For the CIFAR10 benchmark, this simple algorithm
leads to a robust model with a natural accuracy of {\bf 85.62}\% on clean
images and a robust accuracy of under the 20 iterations of the
IFGSM attack, which outperforms the current state-of-the-art in defending
against IFGSM attack on the CIFAR10. Both natural and robust accuracies of the
proposed ResNets ensemble can be improved dynamically as the building block
ResNet advances. The code is available at:
\url{https://github.com/BaoWangMath/EnResNet}.Comment: 18 pages, 6 figure
Improving Adversarial Robustness via Channel-wise Activation Suppressing
The study of adversarial examples and their activation has attracted
significant attention for secure and robust learning with deep neural networks
(DNNs). Different from existing works, in this paper, we highlight two new
characteristics of adversarial examples from the channel-wise activation
perspective: 1) the activation magnitudes of adversarial examples are higher
than that of natural examples; and 2) the channels are activated more uniformly
by adversarial examples than natural examples. We find that the
state-of-the-art defense adversarial training has addressed the first issue of
high activation magnitudes via training on adversarial examples, while the
second issue of uniform activation remains. This motivates us to suppress
redundant activation from being activated by adversarial perturbations via a
Channel-wise Activation Suppressing (CAS) strategy. We show that CAS can train
a model that inherently suppresses adversarial activation, and can be easily
applied to existing defense methods to further improve their robustness. Our
work provides a simple but generic training strategy for robustifying the
intermediate layer activation of DNNs.Comment: ICLR2021 accepted pape
Monge blunts Bayes: Hardness Results for Adversarial Training
The last few years have seen a staggering number of empirical studies of the
robustness of neural networks in a model of adversarial perturbations of their
inputs. Most rely on an adversary which carries out local modifications within
prescribed balls. None however has so far questioned the broader picture: how
to frame a resource-bounded adversary so that it can be severely detrimental to
learning, a non-trivial problem which entails at a minimum the choice of loss
and classifiers.
We suggest a formal answer for losses that satisfy the minimal statistical
requirement of being proper. We pin down a simple sufficient property for any
given class of adversaries to be detrimental to learning, involving a central
measure of "harmfulness" which generalizes the well-known class of integral
probability metrics. A key feature of our result is that it holds for all
proper losses, and for a popular subset of these, the optimisation of this
central measure appears to be independent of the loss. When classifiers are
Lipschitz -- a now popular approach in adversarial training --, this
optimisation resorts to optimal transport to make a low-budget compression of
class marginals. Toy experiments reveal a finding recently separately observed:
training against a sufficiently budgeted adversary of this kind improves
generalization
Security Matters: A Survey on Adversarial Machine Learning
Adversarial machine learning is a fast growing research area, which considers
the scenarios when machine learning systems may face potential adversarial
attackers, who intentionally synthesize input data to make a well-trained model
to make mistake. It always involves a defending side, usually a classifier, and
an attacking side that aims to cause incorrect output. The earliest studies on
the adversarial examples for machine learning algorithms start from the
information security area, which considers a much wider varieties of attacking
methods. But recent research focus that popularized by the deep learning
community places strong emphasis on how the "imperceivable" perturbations on
the normal inputs may cause dramatic mistakes by the deep learning with
supposed super-human accuracy. This paper serves to give a comprehensive
introduction to a range of aspects of the adversarial deep learning topic,
including its foundations, typical attacking and defending strategies, and some
extended studies
SPLASH: Learnable Activation Functions for Improving Accuracy and Adversarial Robustness
We introduce SPLASH units, a class of learnable activation functions shown to
simultaneously improve the accuracy of deep neural networks while also
improving their robustness to adversarial attacks. SPLASH units have both a
simple parameterization and maintain the ability to approximate a wide range of
non-linear functions. SPLASH units are: 1) continuous; 2) grounded (f(0) = 0);
3) use symmetric hinges; and 4) the locations of the hinges are derived
directly from the data (i.e. no learning required). Compared to nine other
learned and fixed activation functions, including ReLU and its variants, SPLASH
units show superior performance across three datasets (MNIST, CIFAR-10, and
CIFAR-100) and four architectures (LeNet5, All-CNN, ResNet-20, and
Network-in-Network). Furthermore, we show that SPLASH units significantly
increase the robustness of deep neural networks to adversarial attacks. Our
experiments on both black-box and open-box adversarial attacks show that
commonly-used architectures, namely LeNet5, All-CNN, ResNet-20, and
Network-in-Network, can be up to 31% more robust to adversarial attacks by
simply using SPLASH units instead of ReLUs
- …