584 research outputs found
SPAA: Stealthy Projector-based Adversarial Attacks on Deep Image Classifiers
Light-based adversarial attacks aim to fool deep learning-based image
classifiers by altering the physical light condition using a controllable light
source, e.g., a projector. Compared with physical attacks that place carefully
designed stickers or printed adversarial objects, projector-based ones obviate
modifying the physical entities. Moreover, projector-based attacks can be
performed transiently and dynamically by altering the projection pattern.
However, existing approaches focus on projecting adversarial patterns that
result in clearly perceptible camera-captured perturbations, while the more
interesting yet challenging goal, stealthy projector-based attack, remains an
open problem. In this paper, for the first time, we formulate this problem as
an end-to-end differentiable process and propose Stealthy Projector-based
Adversarial Attack (SPAA). In SPAA, we approximate the real project-and-capture
operation using a deep neural network named PCNet, then we include PCNet in the
optimization of projector-based attacks such that the generated adversarial
projection is physically plausible. Finally, to generate robust and stealthy
adversarial projections, we propose an optimization algorithm that uses minimum
perturbation and adversarial confidence thresholds to alternate between the
adversarial loss and stealthiness loss optimization. Our experimental
evaluations show that the proposed SPAA clearly outperforms other methods by
achieving higher attack success rates and meanwhile being stealthier
Wasserstein Adversarial Examples via Projected Sinkhorn Iterations
A rapidly growing area of work has studied the existence of adversarial
examples, datapoints which have been perturbed to fool a classifier, but the
vast majority of these works have focused primarily on threat models defined by
norm-bounded perturbations. In this paper, we propose a new threat
model for adversarial attacks based on the Wasserstein distance. In the image
classification setting, such distances measure the cost of moving pixel mass,
which naturally cover "standard" image manipulations such as scaling, rotation,
translation, and distortion (and can potentially be applied to other settings
as well). To generate Wasserstein adversarial examples, we develop a procedure
for projecting onto the Wasserstein ball, based upon a modified version of the
Sinkhorn iteration. The resulting algorithm can successfully attack image
classification models, bringing traditional CIFAR10 models down to 3% accuracy
within a Wasserstein ball with radius 0.1 (i.e., moving 10% of the image mass 1
pixel), and we demonstrate that PGD-based adversarial training can improve this
adversarial accuracy to 76%. In total, this work opens up a new direction of
study in adversarial robustness, more formally considering convex metrics that
accurately capture the invariances that we typically believe should exist in
classifiers. Code for all experiments in the paper is available at
https://github.com/locuslab/projected_sinkhorn
Beyond Pixel Norm-Balls: Parametric Adversaries using an Analytically Differentiable Renderer
Many machine learning image classifiers are vulnerable to adversarial
attacks, inputs with perturbations designed to intentionally trigger
misclassification. Current adversarial methods directly alter pixel colors and
evaluate against pixel norm-balls: pixel perturbations smaller than a specified
magnitude, according to a measurement norm. This evaluation, however, has
limited practical utility since perturbations in the pixel space do not
correspond to underlying real-world phenomena of image formation that lead to
them and has no security motivation attached. Pixels in natural images are
measurements of light that has interacted with the geometry of a physical
scene. As such, we propose the direct perturbation of physical parameters that
underly image formation: lighting and geometry. As such, we propose a novel
evaluation measure, parametric norm-balls, by directly perturbing physical
parameters that underly image formation. One enabling contribution we present
is a physically-based differentiable renderer that allows us to propagate pixel
gradients to the parametric space of lighting and geometry. Our approach
enables physically-based adversarial attacks, and our differentiable renderer
leverages models from the interactive rendering literature to balance the
performance and accuracy trade-offs necessary for a memory-efficient and
scalable adversarial data augmentation workflow
Unifying Bilateral Filtering and Adversarial Training for Robust Neural Networks
Recent analysis of deep neural networks has revealed their vulnerability to
carefully structured adversarial examples. Many effective algorithms exist to
craft these adversarial examples, but performant defenses seem to be far away.
In this work, we explore the use of edge-aware bilateral filtering as a
projection back to the space of natural images. We show that bilateral
filtering is an effective defense in multiple attack settings, where the
strength of the adversary gradually increases. In the case of an adversary who
has no knowledge of the defense, bilateral filtering can remove more than 90%
of adversarial examples from a variety of different attacks. To evaluate
against an adversary with complete knowledge of our defense, we adapt the
bilateral filter as a trainable layer in a neural network and show that adding
this layer makes ImageNet images significantly more robust to attacks. When
trained under a framework of adversarial training, we show that the resulting
model is hard to fool with even the best attack methods.Comment: 9 pages, 14 figure
Optimal Transport Classifier: Defending Against Adversarial Attacks by Regularized Deep Embedding
Recent studies have demonstrated the vulnerability of deep convolutional
neural networks against adversarial examples. Inspired by the observation that
the intrinsic dimension of image data is much smaller than its pixel space
dimension and the vulnerability of neural networks grows with the input
dimension, we propose to embed high-dimensional input images into a
low-dimensional space to perform classification. However, arbitrarily
projecting the input images to a low-dimensional space without regularization
will not improve the robustness of deep neural networks. Leveraging optimal
transport theory, we propose a new framework, Optimal Transport Classifier
(OT-Classifier), and derive an objective that minimizes the discrepancy between
the distribution of the true label and the distribution of the OT-Classifier
output. Experimental results on several benchmark datasets show that, our
proposed framework achieves state-of-the-art performance against strong
adversarial attack methods.Comment: 9 page
Low Frequency Adversarial Perturbation
Adversarial images aim to change a target model's decision by minimally
perturbing a target image. In the black-box setting, the absence of gradient
information often renders this search problem costly in terms of query
complexity. In this paper we propose to restrict the search for adversarial
images to a low frequency domain. This approach is readily compatible with many
existing black-box attack frameworks and consistently reduces their query cost
by 2 to 4 times. Further, we can circumvent image transformation defenses even
when both the model and the defense strategy are unknown. Finally, we
demonstrate the efficacy of this technique by fooling the Google Cloud Vision
platform with an unprecedented low number of model queries.Comment: 9 pages, 9 figures. Accepted to UAI 201
Adversarial Examples that Fool Detectors
An adversarial example is an example that has been adjusted to produce a
wrong label when presented to a system at test time. To date, adversarial
example constructions have been demonstrated for classifiers, but not for
detectors. If adversarial examples that could fool a detector exist, they could
be used to (for example) maliciously create security hazards on roads populated
with smart vehicles. In this paper, we demonstrate a construction that
successfully fools two standard detectors, Faster RCNN and YOLO. The existence
of such examples is surprising, as attacking a classifier is very different
from attacking a detector, and that the structure of detectors - which must
search for their own bounding box, and which cannot estimate that box very
accurately - makes it quite likely that adversarial patterns are strongly
disrupted. We show that our construction produces adversarial examples that
generalize well across sequences digitally, even though large perturbations are
needed. We also show that our construction yields physical objects that are
adversarial.Comment: Follow up paper for adversarial stop signs. Submitted to CVPR 201
A Spectral View of Adversarially Robust Features
Given the apparent difficulty of learning models that are robust to
adversarial perturbations, we propose tackling the simpler problem of
developing adversarially robust features. Specifically, given a dataset and
metric of interest, the goal is to return a function (or multiple functions)
that 1) is robust to adversarial perturbations, and 2) has significant
variation across the datapoints. We establish strong connections between
adversarially robust features and a natural spectral property of the geometry
of the dataset and metric of interest. This connection can be leveraged to
provide both robust features, and a lower bound on the robustness of any
function that has significant variance across the dataset. Finally, we provide
empirical evidence that the adversarially robust features given by this
spectral approach can be fruitfully leveraged to learn a robust (and accurate)
model.Comment: To appear at NIPS 201
Label Universal Targeted Attack
We introduce Label Universal Targeted Attack (LUTA) that makes a deep model
predict a label of attacker's choice for `any' sample of a given source class
with high probability. Our attack stochastically maximizes the log-probability
of the target label for the source class with first order gradient
optimization, while accounting for the gradient moments. It also suppresses the
leakage of attack information to the non-source classes for avoiding the attack
suspicions. The perturbations resulting from our attack achieve high fooling
ratios on the large-scale ImageNet and VGGFace models, and transfer well to the
Physical World. Given full control over the perturbation scope in LUTA, we also
demonstrate it as a tool for deep model autopsy. The proposed attack reveals
interesting perturbation patterns and observations regarding the deep models
Keeping the Bad Guys Out: Protecting and Vaccinating Deep Learning with JPEG Compression
Deep neural networks (DNNs) have achieved great success in solving a variety
of machine learning (ML) problems, especially in the domain of image
recognition. However, recent research showed that DNNs can be highly vulnerable
to adversarially generated instances, which look seemingly normal to human
observers, but completely confuse DNNs. These adversarial samples are crafted
by adding small perturbations to normal, benign images. Such perturbations,
while imperceptible to the human eye, are picked up by DNNs and cause them to
misclassify the manipulated instances with high confidence. In this work, we
explore and demonstrate how systematic JPEG compression can work as an
effective pre-processing step in the classification pipeline to counter
adversarial attacks and dramatically reduce their effects (e.g., Fast Gradient
Sign Method, DeepFool). An important component of JPEG compression is its
ability to remove high frequency signal components, inside square blocks of an
image. Such an operation is equivalent to selective blurring of the image,
helping remove additive perturbations. Further, we propose an ensemble-based
technique that can be constructed quickly from a given well-performing DNN, and
empirically show how such an ensemble that leverages JPEG compression can
protect a model from multiple types of adversarial attacks, without requiring
knowledge about the model
- …