1,482 research outputs found
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
Deep learning algorithms have been shown to perform extremely well on many
classical machine learning problems. However, recent studies have shown that
deep learning, like other machine learning techniques, is vulnerable to
adversarial samples: inputs crafted to force a deep neural network (DNN) to
provide adversary-selected outputs. Such attacks can seriously undermine the
security of the system supported by the DNN, sometimes with devastating
consequences. For example, autonomous vehicles can be crashed, illicit or
illegal content can bypass content filters, or biometric authentication systems
can be manipulated to allow improper access. In this work, we introduce a
defensive mechanism called defensive distillation to reduce the effectiveness
of adversarial samples on DNNs. We analytically investigate the
generalizability and robustness properties granted by the use of defensive
distillation when training DNNs. We also empirically study the effectiveness of
our defense mechanisms on two DNNs placed in adversarial settings. The study
shows that defensive distillation can reduce effectiveness of sample creation
from 95% to less than 0.5% on a studied DNN. Such dramatic gains can be
explained by the fact that distillation leads gradients used in adversarial
sample creation to be reduced by a factor of 10^30. We also find that
distillation increases the average minimum number of features that need to be
modified to create adversarial samples by about 800% on one of the DNNs we
tested
Feature Distillation: DNN-Oriented JPEG Compression Against Adversarial Examples
Image compression-based approaches for defending against the
adversarial-example attacks, which threaten the safety use of deep neural
networks (DNN), have been investigated recently. However, prior works mainly
rely on directly tuning parameters like compression rate, to blindly reduce
image features, thereby lacking guarantee on both defense efficiency (i.e.
accuracy of polluted images) and classification accuracy of benign images,
after applying defense methods. To overcome these limitations, we propose a
JPEG-based defensive compression framework, namely "feature distillation", to
effectively rectify adversarial examples without impacting classification
accuracy on benign data. Our framework significantly escalates the defense
efficiency with marginal accuracy reduction using a two-step method: First, we
maximize malicious features filtering of adversarial input perturbations by
developing defensive quantization in frequency domain of JPEG compression or
decompression, guided by a semi-analytical method; Second, we suppress the
distortions of benign features to restore classification accuracy through a
DNN-oriented quantization refine process. Our experimental results show that
proposed "feature distillation" can significantly surpass the latest
input-transformation based mitigations such as Quilting and TV Minimization in
three aspects, including defense efficiency (improve classification accuracy
from to on adversarial examples), accuracy of benign
images after defense ( accuracy degradation), and processing time per
image ( Speedup). Moreover, our solution can also provide the
best defense efficiency ( accuracy) against the recent adaptive
attack with least accuracy reduction () on benign images when compared
with other input-transformation based defense methods.Comment: 2019 Conference on Computer Vision and Pattern Recognition (CVPR
2019
Ensemble Methods as a Defense to Adversarial Perturbations Against Deep Neural Networks
Deep learning has become the state of the art approach in many machine
learning problems such as classification. It has recently been shown that deep
learning is highly vulnerable to adversarial perturbations. Taking the camera
systems of self-driving cars as an example, small adversarial perturbations can
cause the system to make errors in important tasks, such as classifying traffic
signs or detecting pedestrians. Hence, in order to use deep learning without
safety concerns a proper defense strategy is required. We propose to use
ensemble methods as a defense strategy against adversarial perturbations. We
find that an attack leading one model to misclassify does not imply the same
for other networks performing the same task. This makes ensemble methods an
attractive defense strategy against adversarial attacks. We empirically show
for the MNIST and the CIFAR-10 data sets that ensemble methods not only improve
the accuracy of neural networks on test data but also increase their robustness
against adversarial perturbations.Comment: 10 pages, 2 figures, 4 table
Building Robust Deep Neural Networks for Road Sign Detection
Deep Neural Networks are built to generalize outside of training set in mind
by using techniques such as regularization, early stopping and dropout. But
considerations to make them more resilient to adversarial examples are rarely
taken. As deep neural networks become more prevalent in mission-critical and
real-time systems, miscreants start to attack them by intentionally making deep
neural networks to misclassify an object of one type to be seen as another
type. This can be catastrophic in some scenarios where the classification of a
deep neural network can lead to a fatal decision by a machine. In this work, we
used GTSRB dataset to craft adversarial samples by Fast Gradient Sign Method
and Jacobian Saliency Method, used those crafted adversarial samples to attack
another Deep Convolutional Neural Network and built the attacked network to be
more resilient against adversarial attacks by making it more robust by
Defensive Distillation and Adversarial Trainin
Adversarial Examples: Opportunities and Challenges
Deep neural networks (DNNs) have shown huge superiority over humans in image
recognition, speech processing, autonomous vehicles and medical diagnosis.
However, recent studies indicate that DNNs are vulnerable to adversarial
examples (AEs), which are designed by attackers to fool deep learning models.
Different from real examples, AEs can mislead the model to predict incorrect
outputs while hardly be distinguished by human eyes, therefore threaten
security-critical deep-learning applications. In recent years, the generation
and defense of AEs have become a research hotspot in the field of artificial
intelligence (AI) security. This article reviews the latest research progress
of AEs. First, we introduce the concept, cause, characteristics and evaluation
metrics of AEs, then give a survey on the state-of-the-art AE generation
methods with the discussion of advantages and disadvantages. After that, we
review the existing defenses and discuss their limitations. Finally, future
research opportunities and challenges on AEs are prospected.Comment: 16 pages, 13 figures, 5 table
Detecting Adversarial Perturbations Through Spatial Behavior in Activation Spaces
Neural network based classifiers are still prone to manipulation through
adversarial perturbations. State of the art attacks can overcome most of the
defense or detection mechanisms suggested so far, and adversaries have the
upper hand in this arms race. Adversarial examples are designed to resemble the
normal input from which they were constructed, while triggering an incorrect
classification. This basic design goal leads to a characteristic spatial
behavior within the context of Activation Spaces, a term coined by the authors
to refer to the hyperspaces formed by the activation values of the network's
layers. Within the output of the first layers of the network, an adversarial
example is likely to resemble normal instances of the source class, while in
the final layers such examples will diverge towards the adversary's target
class. The steps below enable us to leverage this inherent shift from one class
to another in order to form a novel adversarial example detector. We construct
Euclidian spaces out of the activation values of each of the deep neural
network layers. Then, we induce a set of k-nearest neighbor classifiers (k-NN),
one per activation space of each neural network layer, using the
non-adversarial examples. We leverage those classifiers to produce a sequence
of class labels for each nonperturbed input sample and estimate the a priori
probability for a class label change between one activation space and another.
During the detection phase we compute a sequence of classification labels for
each input using the trained classifiers. We then estimate the likelihood of
those classification sequences and show that adversarial sequences are far less
likely than normal ones. We evaluated our detection method against the state of
the art C&W attack method, using two image classification datasets (MNIST,
CIFAR-10) reaching an AUC 0f 0.95 for the CIFAR-10 dataset
Extending Defensive Distillation
Machine learning is vulnerable to adversarial examples: inputs carefully
modified to force misclassification. Designing defenses against such inputs
remains largely an open problem. In this work, we revisit defensive
distillation---which is one of the mechanisms proposed to mitigate adversarial
examples---to address its limitations. We view our results not only as an
effective way of addressing some of the recently discovered attacks but also as
reinforcing the importance of improved training techniques
ReabsNet: Detecting and Revising Adversarial Examples
Though deep neural network has hit a huge success in recent studies and
applica- tions, it still remains vulnerable to adversarial perturbations which
are imperceptible to humans. To address this problem, we propose a novel
network called ReabsNet to achieve high classification accuracy in the face of
various attacks. The approach is to augment an existing classification network
with a guardian network to detect if a sample is natural or has been
adversarially perturbed. Critically, instead of simply rejecting adversarial
examples, we revise them to get their true labels. We exploit the observation
that a sample containing adversarial perturbations has a possibility of
returning to its true class after revision. We demonstrate that our ReabsNet
outperforms the state-of-the-art defense method under various adversarial
attacks
Adversarial Examples: Attacks and Defenses for Deep Learning
With rapid progress and significant successes in a wide spectrum of
applications, deep learning is being applied in many safety-critical
environments. However, deep neural networks have been recently found vulnerable
to well-designed input samples, called adversarial examples. Adversarial
examples are imperceptible to human but can easily fool deep neural networks in
the testing/deploying stage. The vulnerability to adversarial examples becomes
one of the major risks for applying deep neural networks in safety-critical
environments. Therefore, attacks and defenses on adversarial examples draw
great attention. In this paper, we review recent findings on adversarial
examples for deep neural networks, summarize the methods for generating
adversarial examples, and propose a taxonomy of these methods. Under the
taxonomy, applications for adversarial examples are investigated. We further
elaborate on countermeasures for adversarial examples and explore the
challenges and the potential solutions.Comment: Github: https://github.com/chbrian/awesome-adversarial-examples-d
Enhanced Attacks on Defensively Distilled Deep Neural Networks
Deep neural networks (DNNs) have achieved tremendous success in many tasks of
machine learning, such as the image classification. Unfortunately, researchers
have shown that DNNs are easily attacked by adversarial examples, slightly
perturbed images which can mislead DNNs to give incorrect classification
results. Such attack has seriously hampered the deployment of DNN systems in
areas where security or safety requirements are strict, such as autonomous
cars, face recognition, malware detection. Defensive distillation is a
mechanism aimed at training a robust DNN which significantly reduces the
effectiveness of adversarial examples generation. However, the state-of-the-art
attack can be successful on distilled networks with 100% probability. But it is
a white-box attack which needs to know the inner information of DNN. Whereas,
the black-box scenario is more general. In this paper, we first propose the
epsilon-neighborhood attack, which can fool the defensively distilled networks
with 100% success rate in the white-box setting, and it is fast to generate
adversarial examples with good visual quality. On the basis of this attack, we
further propose the region-based attack against defensively distilled DNNs in
the black-box setting. And we also perform the bypass attack to indirectly
break the distillation defense as a complementary method. The experimental
results show that our black-box attacks have a considerable success rate on
defensively distilled networks
- …