6,937 research outputs found
Recommended from our members
Security Through Stochasticity - Toward Adversarial Defense using Energy-based Models
This paper serves as an investigation in the use of energy-based models for adversarial defense via purification and training. Convergent and non-convergent energy-based models are tasked to remove white-box adversarial signals embedded into images from the CIFAR-10 dataset so that they may be classified correctly. This work presents an analysis behind the stochastic behavior of MCMC sampling for adversarial noise reduction in meta-stable energy basins and the benefits and challenges associated with different regimes of energy-based learning for this task
MagNet: a Two-Pronged Defense against Adversarial Examples
Deep learning has shown promising results on hard perceptual problems in
recent years. However, deep learning systems are found to be vulnerable to
small adversarial perturbations that are nearly imperceptible to human. Such
specially crafted perturbations cause deep learning systems to output incorrect
decisions, with potentially disastrous consequences. These vulnerabilities
hinder the deployment of deep learning systems where safety or security is
important. Attempts to secure deep learning systems either target specific
attacks or have been shown to be ineffective.
In this paper, we propose MagNet, a framework for defending neural network
classifiers against adversarial examples. MagNet does not modify the protected
classifier or know the process for generating adversarial examples. MagNet
includes one or more separate detector networks and a reformer network.
Different from previous work, MagNet learns to differentiate between normal and
adversarial examples by approximating the manifold of normal examples. Since it
does not rely on any process for generating adversarial examples, it has
substantial generalization power. Moreover, MagNet reconstructs adversarial
examples by moving them towards the manifold, which is effective for helping
classify adversarial examples with small perturbation correctly. We discuss the
intrinsic difficulty in defending against whitebox attack and propose a
mechanism to defend against graybox attack. Inspired by the use of randomness
in cryptography, we propose to use diversity to strengthen MagNet. We show
empirically that MagNet is effective against most advanced state-of-the-art
attacks in blackbox and graybox scenarios while keeping false positive rate on
normal examples very low.Comment: Accepted at the ACM Conference on Computer and Communications
Security (CCS), 201
Semantic Adversarial Attacks: Parametric Transformations That Fool Deep Classifiers
Deep neural networks have been shown to exhibit an intriguing vulnerability
to adversarial input images corrupted with imperceptible perturbations.
However, the majority of adversarial attacks assume global, fine-grained
control over the image pixel space. In this paper, we consider a different
setting: what happens if the adversary could only alter specific attributes of
the input image? These would generate inputs that might be perceptibly
different, but still natural-looking and enough to fool a classifier. We
propose a novel approach to generate such `semantic' adversarial examples by
optimizing a particular adversarial loss over the range-space of a parametric
conditional generative model. We demonstrate implementations of our attacks on
binary classifiers trained on face images, and show that such natural-looking
semantic adversarial examples exist. We evaluate the effectiveness of our
attack on synthetic and real data, and present detailed comparisons with
existing attack methods. We supplement our empirical results with theoretical
bounds that demonstrate the existence of such parametric adversarial examples.Comment: Accepted to International Conference on Computer Vision, (ICCV) 201
Fraternal Twins: Unifying Attacks on Machine Learning and Digital Watermarking
Machine learning is increasingly used in security-critical applications, such
as autonomous driving, face recognition and malware detection. Most learning
methods, however, have not been designed with security in mind and thus are
vulnerable to different types of attacks. This problem has motivated the
research field of adversarial machine learning that is concerned with attacking
and defending learning methods. Concurrently, a different line of research has
tackled a very similar problem: In digital watermarking information are
embedded in a signal in the presence of an adversary. As a consequence, this
research field has also extensively studied techniques for attacking and
defending watermarking methods.
The two research communities have worked in parallel so far, unnoticeably
developing similar attack and defense strategies. This paper is a first effort
to bring these communities together. To this end, we present a unified notation
of black-box attacks against machine learning and watermarking that reveals the
similarity of both settings. To demonstrate the efficacy of this unified view,
we apply concepts from watermarking to machine learning and vice versa. We show
that countermeasures from watermarking can mitigate recent model-extraction
attacks and, similarly, that techniques for hardening machine learning can fend
off oracle attacks against watermarks. Our work provides a conceptual link
between two research fields and thereby opens novel directions for improving
the security of both, machine learning and digital watermarking
The advantages of multiple classes for reducing overfitting from test set reuse
Excessive reuse of holdout data can lead to overfitting. However, there is
little concrete evidence of significant overfitting due to holdout reuse in
popular multiclass benchmarks today. Known results show that, in the
worst-case, revealing the accuracy of adaptively chosen classifiers on a
data set of size allows to create a classifier with bias of
for any binary prediction problem. We show a new upper
bound of on the worst-case bias
that any attack can achieve in a prediction problem with classes. Moreover,
we present an efficient attack that achieve a bias of and improves on previous work for the binary setting (). We also
present an inefficient attack that achieves a bias of .
Complementing our theoretical work, we give new practical attacks to
stress-test multiclass benchmarks by aiming to create as large a bias as
possible with a given number of queries. Our experiments show that the
additional uncertainty of prediction with a large number of classes indeed
mitigates the effect of our best attacks.
Our work extends developments in understanding overfitting due to adaptive
data analysis to multiclass prediction problems. It also bears out the
surprising fact that multiclass prediction problems are significantly more
robust to overfitting when reusing a test (or holdout) dataset. This offers an
explanation as to why popular multiclass prediction benchmarks, such as
ImageNet, may enjoy a longer lifespan than what intuition from literature on
binary classification suggests
Trust but Verify: An Information-Theoretic Explanation for the Adversarial Fragility of Machine Learning Systems, and a General Defense against Adversarial Attacks
Deep-learning based classification algorithms have been shown to be
susceptible to adversarial attacks: minor changes to the input of classifiers
can dramatically change their outputs, while being imperceptible to humans. In
this paper, we present a simple hypothesis about a feature compression property
of artificial intelligence (AI) classifiers and present theoretical arguments
to show that this hypothesis successfully accounts for the observed fragility
of AI classifiers to small adversarial perturbations. Drawing on ideas from
information and coding theory, we propose a general class of defenses for
detecting classifier errors caused by abnormally small input perturbations. We
further show theoretical guarantees for the performance of this detection
method. We present experimental results with (a) a voice recognition system,
and (b) a digit recognition system using the MNIST database, to demonstrate the
effectiveness of the proposed defense methods. The ideas in this paper are
motivated by a simple analogy between AI classifiers and the standard Shannon
model of a communication system.Comment: 44 Pages, 2 Theorems, 35 Figures, 29 Tables. arXiv admin note:
substantial text overlap with arXiv:1901.0941
Security Matters: A Survey on Adversarial Machine Learning
Adversarial machine learning is a fast growing research area, which considers
the scenarios when machine learning systems may face potential adversarial
attackers, who intentionally synthesize input data to make a well-trained model
to make mistake. It always involves a defending side, usually a classifier, and
an attacking side that aims to cause incorrect output. The earliest studies on
the adversarial examples for machine learning algorithms start from the
information security area, which considers a much wider varieties of attacking
methods. But recent research focus that popularized by the deep learning
community places strong emphasis on how the "imperceivable" perturbations on
the normal inputs may cause dramatic mistakes by the deep learning with
supposed super-human accuracy. This paper serves to give a comprehensive
introduction to a range of aspects of the adversarial deep learning topic,
including its foundations, typical attacking and defending strategies, and some
extended studies
Adversarial Transformation Networks: Learning to Generate Adversarial Examples
Multiple different approaches of generating adversarial examples have been
proposed to attack deep neural networks. These approaches involve either
directly computing gradients with respect to the image pixels, or directly
solving an optimization on the image pixels. In this work, we present a
fundamentally new method for generating adversarial examples that is fast to
execute and provides exceptional diversity of output. We efficiently train
feed-forward neural networks in a self-supervised manner to generate
adversarial examples against a target network or set of networks. We call such
a network an Adversarial Transformation Network (ATN). ATNs are trained to
generate adversarial examples that minimally modify the classifier's outputs
given the original input, while constraining the new classification to match an
adversarial target class. We present methods to train ATNs and analyze their
effectiveness targeting a variety of MNIST classifiers as well as the latest
state-of-the-art ImageNet classifier Inception ResNet v2
On-board Deep-learning-based Unmanned Aerial Vehicle Fault Cause Detection and Identification
With the increase in use of Unmanned Aerial Vehicles (UAVs)/drones, it is
important to detect and identify causes of failure in real time for proper
recovery from a potential crash-like scenario or post incident forensics
analysis. The cause of crash could be either a fault in the sensor/actuator
system, a physical damage/attack, or a cyber attack on the drone's software. In
this paper, we propose novel architectures based on deep Convolutional and Long
Short-Term Memory Neural Networks (CNNs and LSTMs) to detect (via Autoencoder)
and classify drone mis-operations based on sensor data. The proposed
architectures are able to learn high-level features automatically from the raw
sensor data and learn the spatial and temporal dynamics in the sensor data. We
validate the proposed deep-learning architectures via simulations and
experiments on a real drone. Empirical results show that our solution is able
to detect with over 90% accuracy and classify various types of drone
mis-operations (with about 99% accuracy (simulation data) and upto 88% accuracy
(experimental data)).Comment: IEEE International Conference on Robotics and Automation (ICRA), May
2020, 6+1 page
Enhancing Robustness of Deep Neural Networks Against Adversarial Malware Samples: Principles, Framework, and AICS'2019 Challenge
Malware continues to be a major cyber threat, despite the tremendous effort
that has been made to combat them. The number of malware in the wild steadily
increases over time, meaning that we must resort to automated defense
techniques. This naturally calls for machine learning based malware detection.
However, machine learning is known to be vulnerable to adversarial evasion
attacks that manipulate a small number of features to make classifiers wrongly
recognize a malware sample as a benign one. The state-of-the-art is that there
are no effective countermeasures against these attacks. Inspired by the
AICS'2019 Challenge, we systematize a number of principles for enhancing the
robustness of neural networks against adversarial malware evasion attacks. Some
of these principles have been scattered in the literature, but others are
proposed in this paper for the first time. Under the guidance of these
principles, we propose a framework and an accompanying training algorithm,
which are then applied to the AICS'2019 challenge. Our experimental results
have been submitted to the challenge organizer for evaluation.Comment: 8 pages, 4 figures, AICS 2019; for the fully-fledged version, please
see arxiv:2004.0791
- …