3,722 research outputs found
Transferable Perturbations of Deep Feature Distributions
Almost all current adversarial attacks of CNN classifiers rely on information
derived from the output layer of the network. This work presents a new
adversarial attack based on the modeling and exploitation of class-wise and
layer-wise deep feature distributions. We achieve state-of-the-art targeted
blackbox transfer-based attack results for undefended ImageNet models. Further,
we place a priority on explainability and interpretability of the attacking
process. Our methodology affords an analysis of how adversarial attacks change
the intermediate feature distributions of CNNs, as well as a measure of
layer-wise and class-wise feature distributional separability/entanglement. We
also conceptualize a transition from task/data-specific to model-specific
features within a CNN architecture that directly impacts the transferability of
adversarial examples.Comment: Published as a conference paper at ICLR 202
Adversarial Attacks on Neural Networks for Graph Data
Deep learning models for graphs have achieved strong performance for the task
of node classification. Despite their proliferation, currently there is no
study of their robustness to adversarial attacks. Yet, in domains where they
are likely to be used, e.g. the web, adversaries are common. Can deep learning
models for graphs be easily fooled? In this work, we introduce the first study
of adversarial attacks on attributed graphs, specifically focusing on models
exploiting ideas of graph convolutions. In addition to attacks at test time, we
tackle the more challenging class of poisoning/causative attacks, which focus
on the training phase of a machine learning model. We generate adversarial
perturbations targeting the node's features and the graph structure, thus,
taking the dependencies between instances in account. Moreover, we ensure that
the perturbations remain unnoticeable by preserving important data
characteristics. To cope with the underlying discrete domain we propose an
efficient algorithm Nettack exploiting incremental computations. Our
experimental study shows that accuracy of node classification significantly
drops even when performing only few perturbations. Even more, our attacks are
transferable: the learned attacks generalize to other state-of-the-art node
classification models and unsupervised approaches, and likewise are successful
even when only limited knowledge about the graph is given.Comment: Accepted as a full paper at KDD 2018 on May 6, 201
Unsupervised Domain Adaptation with Residual Transfer Networks
The recent success of deep neural networks relies on massive amounts of
labeled data. For a target task where labeled data is unavailable, domain
adaptation can transfer a learner from a different source domain. In this
paper, we propose a new approach to domain adaptation in deep networks that can
jointly learn adaptive classifiers and transferable features from labeled data
in the source domain and unlabeled data in the target domain. We relax a
shared-classifier assumption made by previous methods and assume that the
source classifier and target classifier differ by a residual function. We
enable classifier adaptation by plugging several layers into deep network to
explicitly learn the residual function with reference to the target classifier.
We fuse features of multiple layers with tensor product and embed them into
reproducing kernel Hilbert spaces to match distributions for feature
adaptation. The adaptation can be achieved in most feed-forward models by
extending them with new residual layers and loss functions, which can be
trained efficiently via back-propagation. Empirical evidence shows that the new
approach outperforms state of the art methods on standard domain adaptation
benchmarks.Comment: 30th Conference on Neural Information Processing Systems (NIPS 2016),
Barcelona, Spai
PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples
Adversarial perturbations of normal images are usually imperceptible to
humans, but they can seriously confuse state-of-the-art machine learning
models. What makes them so special in the eyes of image classifiers? In this
paper, we show empirically that adversarial examples mainly lie in the low
probability regions of the training distribution, regardless of attack types
and targeted models. Using statistical hypothesis testing, we find that modern
neural density models are surprisingly good at detecting imperceptible image
perturbations. Based on this discovery, we devised PixelDefend, a new approach
that purifies a maliciously perturbed image by moving it back towards the
distribution seen in the training data. The purified image is then run through
an unmodified classifier, making our method agnostic to both the classifier and
the attacking method. As a result, PixelDefend can be used to protect already
deployed models and be combined with other model-specific defenses. Experiments
show that our method greatly improves resilience across a wide variety of
state-of-the-art attacking methods, increasing accuracy on the strongest attack
from 63% to 84% for Fashion MNIST and from 32% to 70% for CIFAR-10.Comment: ICLR 201
Sitatapatra: Blocking the Transfer of Adversarial Samples
Convolutional Neural Networks (CNNs) are widely used to solve classification
tasks in computer vision. However, they can be tricked into misclassifying
specially crafted `adversarial' samples -- and samples built to trick one model
often work alarmingly well against other models trained on the same task. In
this paper we introduce Sitatapatra, a system designed to block the transfer of
adversarial samples. It diversifies neural networks using a key, as in
cryptography, and provides a mechanism for detecting attacks. What's more, when
adversarial samples are detected they can typically be traced back to the
individual device that was used to develop them. The run-time overheads are
minimal permitting the use of Sitatapatra on constrained systems
Fast Feature Fool: A data independent approach to universal adversarial perturbations
State-of-the-art object recognition Convolutional Neural Networks (CNNs) are
shown to be fooled by image agnostic perturbations, called universal
adversarial perturbations. It is also observed that these perturbations
generalize across multiple networks trained on the same target data. However,
these algorithms require training data on which the CNNs were trained and
compute adversarial perturbations via complex optimization. The fooling
performance of these approaches is directly proportional to the amount of
available training data. This makes them unsuitable for practical attacks since
its unreasonable for an attacker to have access to the training data. In this
paper, for the first time, we propose a novel data independent approach to
generate image agnostic perturbations for a range of CNNs trained for object
recognition. We further show that these perturbations are transferable across
multiple network architectures trained either on same or different data. In the
absence of data, our method generates universal adversarial perturbations
efficiently via fooling the features learned at multiple layers thereby causing
CNNs to misclassify. Experiments demonstrate impressive fooling rates and
surprising transferability for the proposed universal perturbations generated
without any training data.Comment: BMVC 2017 and codes are available at
https://github.com/utsavgarg/fast-feature-foo
Improving the Generalization of Adversarial Training with Domain Adaptation
By injecting adversarial examples into training data, adversarial training is
promising for improving the robustness of deep learning models. However, most
existing adversarial training approaches are based on a specific type of
adversarial attack. It may not provide sufficiently representative samples from
the adversarial domain, leading to a weak generalization ability on adversarial
examples from other attacks. Moreover, during the adversarial training,
adversarial perturbations on inputs are usually crafted by fast single-step
adversaries so as to scale to large datasets. This work is mainly focused on
the adversarial training yet efficient FGSM adversary. In this scenario, it is
difficult to train a model with great generalization due to the lack of
representative adversarial samples, aka the samples are unable to accurately
reflect the adversarial domain. To alleviate this problem, we propose a novel
Adversarial Training with Domain Adaptation (ATDA) method. Our intuition is to
regard the adversarial training on FGSM adversary as a domain adaption task
with limited number of target domain samples. The main idea is to learn a
representation that is semantically meaningful and domain invariant on the
clean domain as well as the adversarial domain. Empirical evaluations on
Fashion-MNIST, SVHN, CIFAR-10 and CIFAR-100 demonstrate that ATDA can greatly
improve the generalization of adversarial training and the smoothness of the
learned models, and outperforms state-of-the-art methods on standard benchmark
datasets. To show the transfer ability of our method, we also extend ATDA to
the adversarial training on iterative attacks such as PGD-Adversial Training
(PAT) and the defense performance is improved considerably.Comment: ICLR 201
Learning Smooth Representation for Unsupervised Domain Adaptation
In unsupervised domain adaptation, existing methods have achieved remarkable
performance, but few pay attention to the Lipschitz constraint. It has been
studied that not just reducing the divergence between distributions, but the
satisfaction of Lipschitz continuity guarantees an error bound for the target
distribution. In this paper, we adopt this principle and extend it to a deep
end-to-end model. We define a formula named local smooth discrepancy to measure
the Lipschitzness for target distribution in a pointwise way. Further, several
critical factors affecting the error bound are taken into account in our
proposed optimization strategy to ensure the effectiveness and stability.
Empirical evidence shows that the proposed method is comparable or superior to
the state-of-the-art methods and our modifications are important for the
validity.Comment: Code is available at https://github.com/CuthbertCai/SRD
Cycle-Consistent Adversarial GAN: the integration of adversarial attack and defense
In image classification of deep learning, adversarial examples where inputs
intended to add small magnitude perturbations may mislead deep neural networks
(DNNs) to incorrect results, which means DNNs are vulnerable to them. Different
attack and defense strategies have been proposed to better research the
mechanism of deep learning. However, those research in these networks are only
for one aspect, either an attack or a defense, not considering that attacks and
defenses should be interdependent and mutually reinforcing, just like the
relationship between spears and shields. In this paper, we propose
Cycle-Consistent Adversarial GAN (CycleAdvGAN) to generate adversarial
examples, which can learn and approximate the distribution of original
instances and adversarial examples. For CycleAdvGAN, once the Generator and are
trained, can generate adversarial perturbations efficiently for any instance,
so as to make DNNs predict wrong, and recovery adversarial examples to clean
instances, so as to make DNNs predict correct. We apply CycleAdvGAN under
semi-white box and black-box settings on two public datasets MNIST and CIFAR10.
Using the extensive experiments, we show that our method has achieved the
state-of-the-art adversarial attack method and also efficiently improve the
defense ability, which make the integration of adversarial attack and defense
come true. In additional, it has improved attack effect only trained on the
adversarial dataset generated by any kind of adversarial attack.Comment: 13 pages,7 tables, 1 figur
One Bit Matters: Understanding Adversarial Examples as the Abuse of Redundancy
Despite the great success achieved in machine learning (ML), adversarial
examples have caused concerns with regards to its trustworthiness: A small
perturbation of an input results in an arbitrary failure of an otherwise
seemingly well-trained ML model. While studies are being conducted to discover
the intrinsic properties of adversarial examples, such as their transferability
and universality, there is insufficient theoretic analysis to help understand
the phenomenon in a way that can influence the design process of ML
experiments. In this paper, we deduce an information-theoretic model which
explains adversarial attacks as the abuse of feature redundancies in ML
algorithms. We prove that feature redundancy is a necessary condition for the
existence of adversarial examples. Our model helps to explain some major
questions raised in many anecdotal studies on adversarial examples. Our theory
is backed up by empirical measurements of the information content of benign and
adversarial examples on both image and text datasets. Our measurements show
that typical adversarial examples introduce just enough redundancy to overflow
the decision making of an ML model trained on corresponding benign examples. We
conclude with actionable recommendations to improve the robustness of machine
learners against adversarial examples
- …