2,407 research outputs found
Locally optimal detection of stochastic targeted universal adversarial perturbations
Deep learning image classifiers are known to be vulnerable to small
adversarial perturbations of input images. In this paper, we derive the locally
optimal generalized likelihood ratio test (LO-GLRT) based detector for
detecting stochastic targeted universal adversarial perturbations (UAPs) of the
classifier inputs. We also describe a supervised training method to learn the
detector's parameters, and demonstrate better performance of the detector
compared to other detection methods on several popular image classification
datasets.Comment: Submitted to ICASSP 202
A Survey on Resilient Machine Learning
Machine learning based system are increasingly being used for sensitive tasks
such as security surveillance, guiding autonomous vehicle, taking investment
decisions, detecting and blocking network intrusion and malware etc. However,
recent research has shown that machine learning models are venerable to attacks
by adversaries at all phases of machine learning (eg, training data collection,
training, operation). All model classes of machine learning systems can be
misled by providing carefully crafted inputs making them wrongly classify
inputs. Maliciously created input samples can affect the learning process of a
ML system by either slowing down the learning process, or affecting the
performance of the learned mode, or causing the system make error(s) only in
attacker's planned scenario. Because of these developments, understanding
security of machine learning algorithms and systems is emerging as an important
research area among computer security and machine learning researchers and
practitioners. We present a survey of this emerging area in machine learning
Adversarial Attacks and Defences: A Survey
Deep learning has emerged as a strong and efficient framework that can be
applied to a broad spectrum of complex learning problems which were difficult
to solve using the traditional machine learning techniques in the past. In the
last few years, deep learning has advanced radically in such a way that it can
surpass human-level performance on a number of tasks. As a consequence, deep
learning is being extensively used in most of the recent day-to-day
applications. However, security of deep learning systems are vulnerable to
crafted adversarial examples, which may be imperceptible to the human eye, but
can lead the model to misclassify the output. In recent times, different types
of adversaries based on their threat model leverage these vulnerabilities to
compromise a deep learning system where adversaries have high incentives.
Hence, it is extremely important to provide robustness to deep learning
algorithms against these adversaries. However, there are only a few strong
countermeasures which can be used in all types of attack scenarios to design a
robust deep learning system. In this paper, we attempt to provide a detailed
discussion on different types of adversarial attacks with various threat models
and also elaborate the efficiency and challenges of recent countermeasures
against them
Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness
Previous work shows that adversarially robust generalization requires larger
sample complexity, and the same dataset, e.g., CIFAR-10, which enables good
standard accuracy may not suffice to train robust models. Since collecting new
training data could be costly, we focus on better utilizing the given data by
inducing the regions with high sample density in the feature space, which could
lead to locally sufficient samples for robust learning. We first formally show
that the softmax cross-entropy (SCE) loss and its variants convey inappropriate
supervisory signals, which encourage the learned feature points to spread over
the space sparsely in training. This inspires us to propose the Max-Mahalanobis
center (MMC) loss to explicitly induce dense feature regions in order to
benefit robustness. Namely, the MMC loss encourages the model to concentrate on
learning ordered and compact representations, which gather around the preset
optimal centers for different classes. We empirically demonstrate that applying
the MMC loss can significantly improve robustness even under strong adaptive
attacks, while keeping state-of-the-art accuracy on clean inputs with little
extra computation compared to the SCE loss.Comment: ICLR 202
DARTS: Deceiving Autonomous Cars with Toxic Signs
Sign recognition is an integral part of autonomous cars. Any
misclassification of traffic signs can potentially lead to a multitude of
disastrous consequences, ranging from a life-threatening accident to even a
large-scale interruption of transportation services relying on autonomous cars.
In this paper, we propose and examine security attacks against sign recognition
systems for Deceiving Autonomous caRs with Toxic Signs (we call the proposed
attacks DARTS). In particular, we introduce two novel methods to create these
toxic signs. First, we propose Out-of-Distribution attacks, which expand the
scope of adversarial examples by enabling the adversary to generate these
starting from an arbitrary point in the image space compared to prior attacks
which are restricted to existing training/test data (In-Distribution). Second,
we present the Lenticular Printing attack, which relies on an optical
phenomenon to deceive the traffic sign recognition system. We extensively
evaluate the effectiveness of the proposed attacks in both virtual and
real-world settings and consider both white-box and black-box threat models.
Our results demonstrate that the proposed attacks are successful under both
settings and threat models. We further show that Out-of-Distribution attacks
can outperform In-Distribution attacks on classifiers defended using the
adversarial training defense, exposing a new attack vector for these defenses.Comment: Submitted to ACM CCS 2018; Extended version of [1801.02780] Rogue
Signs: Deceiving Traffic Sign Recognition with Malicious Ads and Logo
Classification regions of deep neural networks
The goal of this paper is to analyze the geometric properties of deep neural
network classifiers in the input space. We specifically study the topology of
classification regions created by deep networks, as well as their associated
decision boundary. Through a systematic empirical investigation, we show that
state-of-the-art deep nets learn connected classification regions, and that the
decision boundary in the vicinity of datapoints is flat along most directions.
We further draw an essential connection between two seemingly unrelated
properties of deep networks: their sensitivity to additive perturbations in the
inputs, and the curvature of their decision boundary. The directions where the
decision boundary is curved in fact remarkably characterize the directions to
which the classifier is the most vulnerable. We finally leverage a fundamental
asymmetry in the curvature of the decision boundary of deep nets, and propose a
method to discriminate between original images, and images perturbed with small
adversarial examples. We show the effectiveness of this purely geometric
approach for detecting small adversarial perturbations in images, and for
recovering the labels of perturbed images
Evaluating Robustness of Neural Networks with Mixed Integer Programming
Neural networks have demonstrated considerable success on a wide variety of
real-world problems. However, networks trained only to optimize for training
accuracy can often be fooled by adversarial examples - slightly perturbed
inputs that are misclassified with high confidence. Verification of networks
enables us to gauge their vulnerability to such adversarial examples. We
formulate verification of piecewise-linear neural networks as a mixed integer
program. On a representative task of finding minimum adversarial distortions,
our verifier is two to three orders of magnitude quicker than the
state-of-the-art. We achieve this computational speedup via tight formulations
for non-linearities, as well as a novel presolve algorithm that makes full use
of all information available. The computational speedup allows us to verify
properties on convolutional networks with an order of magnitude more ReLUs than
networks previously verified by any complete verifier. In particular, we
determine for the first time the exact adversarial accuracy of an MNIST
classifier to perturbations with bounded norm : for
this classifier, we find an adversarial example for 4.38% of samples, and a
certificate of robustness (to perturbations with bounded norm) for the
remainder. Across all robust training procedures and network architectures
considered, we are able to certify more samples than the state-of-the-art and
find more adversarial examples than a strong first-order attack.Comment: Accepted as a conference paper at ICLR 201
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
Deep learning algorithms have been shown to perform extremely well on many
classical machine learning problems. However, recent studies have shown that
deep learning, like other machine learning techniques, is vulnerable to
adversarial samples: inputs crafted to force a deep neural network (DNN) to
provide adversary-selected outputs. Such attacks can seriously undermine the
security of the system supported by the DNN, sometimes with devastating
consequences. For example, autonomous vehicles can be crashed, illicit or
illegal content can bypass content filters, or biometric authentication systems
can be manipulated to allow improper access. In this work, we introduce a
defensive mechanism called defensive distillation to reduce the effectiveness
of adversarial samples on DNNs. We analytically investigate the
generalizability and robustness properties granted by the use of defensive
distillation when training DNNs. We also empirically study the effectiveness of
our defense mechanisms on two DNNs placed in adversarial settings. The study
shows that defensive distillation can reduce effectiveness of sample creation
from 95% to less than 0.5% on a studied DNN. Such dramatic gains can be
explained by the fact that distillation leads gradients used in adversarial
sample creation to be reduced by a factor of 10^30. We also find that
distillation increases the average minimum number of features that need to be
modified to create adversarial samples by about 800% on one of the DNNs we
tested
Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
Many machine learning models are vulnerable to adversarial examples: inputs
that are specially crafted to cause a machine learning model to produce an
incorrect output. Adversarial examples that affect one model often affect
another model, even if the two models have different architectures or were
trained on different training sets, so long as both models were trained to
perform the same task. An attacker may therefore train their own substitute
model, craft adversarial examples against the substitute, and transfer them to
a victim model, with very little information about the victim. Recent work has
further developed a technique that uses the victim model as an oracle to label
a synthetic training set for the substitute, so the attacker need not even
collect a training set to mount the attack. We extend these recent techniques
using reservoir sampling to greatly enhance the efficiency of the training
procedure for the substitute model. We introduce new transferability attacks
between previously unexplored (substitute, victim) pairs of machine learning
model classes, most notably SVMs and decision trees. We demonstrate our attacks
on two commercial machine learning classification systems from Amazon (96.19%
misclassification rate) and Google (88.94%) using only 800 queries of the
victim model, thereby showing that existing machine learning approaches are in
general vulnerable to systematic black-box attacks regardless of their
structure
A Survey of Deep Facial Attribute Analysis
Facial attribute analysis has received considerable attention when deep
learning techniques made remarkable breakthroughs in this field over the past
few years. Deep learning based facial attribute analysis consists of two basic
sub-issues: facial attribute estimation (FAE), which recognizes whether facial
attributes are present in given images, and facial attribute manipulation
(FAM), which synthesizes or removes desired facial attributes. In this paper,
we provide a comprehensive survey of deep facial attribute analysis from the
perspectives of both estimation and manipulation. First, we summarize a general
pipeline that deep facial attribute analysis follows, which comprises two
stages: data preprocessing and model construction. Additionally, we introduce
the underlying theories of this two-stage pipeline for both FAE and FAM.
Second, the datasets and performance metrics commonly used in facial attribute
analysis are presented. Third, we create a taxonomy of state-of-the-art methods
and review deep FAE and FAM algorithms in detail. Furthermore, several
additional facial attribute related issues are introduced, as well as relevant
real-world applications. Finally, we discuss possible challenges and promising
future research directions.Comment: submitted to International Journal of Computer Vision (IJCV
- …