265 research outputs found
Attacking Binarized Neural Networks
Neural networks with low-precision weights and activations offer compelling
efficiency advantages over their full-precision equivalents. The two most
frequently discussed benefits of quantization are reduced memory consumption,
and a faster forward pass when implemented with efficient bitwise operations.
We propose a third benefit of very low-precision neural networks: improved
robustness against some adversarial attacks, and in the worst case, performance
that is on par with full-precision models. We focus on the very low-precision
case where weights and activations are both quantized to 1, and note that
stochastically quantizing weights in just one layer can sharply reduce the
impact of iterative attacks. We observe that non-scaled binary neural networks
exhibit a similar effect to the original defensive distillation procedure that
led to gradient masking, and a false notion of security. We address this by
conducting both black-box and white-box experiments with binary models that do
not artificially mask gradients.Comment: Published as a conference paper at ICLR 201
Combinatorial Attacks on Binarized Neural Networks
Binarized Neural Networks (BNNs) have recently attracted significant interest
due to their computational efficiency. Concurrently, it has been shown that
neural networks may be overly sensitive to "attacks" - tiny adversarial changes
in the input - which may be detrimental to their use in safety-critical
domains. Designing attack algorithms that effectively fool trained models is a
key step towards learning robust neural networks. The discrete,
non-differentiable nature of BNNs, which distinguishes them from their
full-precision counterparts, poses a challenge to gradient-based attacks. In
this work, we study the problem of attacking a BNN through the lens of
combinatorial and integer optimization. We propose a Mixed Integer Linear
Programming (MILP) formulation of the problem. While exact and flexible, the
MILP quickly becomes intractable as the network and perturbation space grow. To
address this issue, we propose IProp, a decomposition-based algorithm that
solves a sequence of much smaller MILP problems. Experimentally, we evaluate
both proposed methods against the standard gradient-based attack (FGSM) on
MNIST and Fashion-MNIST, and show that IProp performs favorably compared to
FGSM, while scaling beyond the limits of the MILP
Probabilistic Binary Neural Networks
Low bit-width weights and activations are an effective way of combating the
increasing need for both memory and compute power of Deep Neural Networks. In
this work, we present a probabilistic training method for Neural Network with
both binary weights and activations, called BLRNet. By embracing stochasticity
during training, we circumvent the need to approximate the gradient of
non-differentiable functions such as sign(), while still obtaining a fully
Binary Neural Network at test time. Moreover, it allows for anytime ensemble
predictions for improved performance and uncertainty estimates by sampling from
the weight distribution. Since all operations in a layer of the BLRNet operate
on random variables, we introduce stochastic versions of Batch Normalization
and max pooling, which transfer well to a deterministic network at test time.
We evaluate the BLRNet on multiple standardized benchmarks
Predicting Adversarial Examples with High Confidence
It has been suggested that adversarial examples cause deep learning models to
make incorrect predictions with high confidence. In this work, we take the
opposite stance: an overly confident model is more likely to be vulnerable to
adversarial examples. This work is one of the most proactive approaches taken
to date, as we link robustness with non-calibrated model confidence on noisy
images, providing a data-augmentation-free path forward. The adversarial
examples phenomenon is most easily explained by the trend of increasing
non-regularized model capacity, while the diversity and number of samples in
common datasets has remained flat. Test accuracy has incorrectly been
associated with true generalization performance, ignoring that training and
test splits are often extremely similar in terms of the overall representation
space. The transferability property of adversarial examples was previously used
as evidence against overfitting arguments, a perceived random effect, but
overfitting is not always random.Comment: Under review by the International Conference on Machine Learning
(ICML
SafetyNet: Detecting and Rejecting Adversarial Examples Robustly
We describe a method to produce a network where current methods such as
DeepFool have great difficulty producing adversarial samples. Our construction
suggests some insights into how deep networks work. We provide a reasonable
analyses that our construction is difficult to defeat, and show experimentally
that our method is hard to defeat with both Type I and Type II attacks using
several standard networks and datasets. This SafetyNet architecture is used to
an important and novel application SceneProof, which can reliably detect
whether an image is a picture of a real scene or not. SceneProof applies to
images captured with depth maps (RGBD images) and checks if a pair of image and
depth map is consistent. It relies on the relative difficulty of producing
naturalistic depth maps for images in post processing. We demonstrate that our
SafetyNet is robust to adversarial examples built from currently known
attacking approaches.Comment: Accepted to ICCV 201
Exploiting Excessive Invariance caused by Norm-Bounded Adversarial Robustness
Adversarial examples are malicious inputs crafted to cause a model to
misclassify them. Their most common instantiation, "perturbation-based"
adversarial examples introduce changes to the input that leave its true label
unchanged, yet result in a different model prediction. Conversely,
"invariance-based" adversarial examples insert changes to the input that leave
the model's prediction unaffected despite the underlying input's label having
changed.
In this paper, we demonstrate that robustness to perturbation-based
adversarial examples is not only insufficient for general robustness, but
worse, it can also increase vulnerability of the model to invariance-based
adversarial examples. In addition to analytical constructions, we empirically
study vision classifiers with state-of-the-art robustness to perturbation-based
adversaries constrained by an norm. We mount attacks that exploit
excessive model invariance in directions relevant to the task, which are able
to find adversarial examples within the ball. In fact, we find that
classifiers trained to be -norm robust are more vulnerable to
invariance-based adversarial examples than their undefended counterparts.
Excessive invariance is not limited to models trained to be robust to
perturbation-based -norm adversaries. In fact, we argue that the term
adversarial example is used to capture a series of model limitations, some of
which may not have been discovered yet. Accordingly, we call for a set of
precise definitions that taxonomize and address each of these shortcomings in
learning.Comment: Accepted at the ICLR 2019 SafeML Worksho
Defending against substitute model black box adversarial attacks with the 01 loss
Substitute model black box attacks can create adversarial examples for a
target model just by accessing its output labels. This poses a major challenge
to machine learning models in practice, particularly in security sensitive
applications. The 01 loss model is known to be more robust to outliers and
noise than convex models that are typically used in practice. Motivated by
these properties we present 01 loss linear and 01 loss dual layer neural
network models as a defense against transfer based substitute model black box
attacks. We compare the accuracy of adversarial examples from substitute model
black box attacks targeting our 01 loss models and their convex counterparts
for binary classification on popular image benchmarks. Our 01 loss dual layer
neural network has an adversarial accuracy of 66.2%, 58%, 60.5%, and 57% on
MNIST, CIFAR10, STL10, and ImageNet respectively whereas the sigmoid activated
logistic loss counterpart has accuracies of 63.5%, 19.3%, 14.9%, and 27.6%.
Except for MNIST the convex counterparts have substantially lower adversarial
accuracies. We show practical applications of our models to deter traffic sign
and facial recognition adversarial attacks. On GTSRB street sign and CelebA
facial detection our 01 loss network has 34.6% and 37.1% adversarial accuracy
respectively whereas the convex logistic counterpart has accuracy 24% and 1.9%.
Finally we show that our 01 loss network can attain robustness on par with
simple convolutional neural networks and much higher than its convex
counterpart even when attacked with a convolutional network substitute model.
Our work shows that 01 loss models offer a powerful defense against substitute
model black box attacks.Comment: arXiv admin note: substantial text overlap with arXiv:2006.07800;
text overlap with arXiv:2008.0914
ComDefend: An Efficient Image Compression Model to Defend Adversarial Examples
Deep neural networks (DNNs) have been demonstrated to be vulnerable to
adversarial examples. Specifically, adding imperceptible perturbations to clean
images can fool the well trained deep neural networks. In this paper, we
propose an end-to-end image compression model to defend adversarial examples:
\textbf{ComDefend}. The proposed model consists of a compression convolutional
neural network (ComCNN) and a reconstruction convolutional neural network
(ResCNN). The ComCNN is used to maintain the structure information of the
original image and purify adversarial perturbations. And the ResCNN is used to
reconstruct the original image with high quality. In other words, ComDefend can
transform the adversarial image to its clean version, which is then fed to the
trained classifier. Our method is a pre-processing module, and does not modify
the classifier's structure during the whole process. Therefore, it can be
combined with other model-specific defense models to jointly improve the
classifier's robustness. A series of experiments conducted on MNIST, CIFAR10
and ImageNet show that the proposed method outperforms the state-of-the-art
defense methods, and is consistently effective to protect classifiers against
adversarial attacks
Exploiting Verified Neural Networks via Floating Point Numerical Error
We show how to construct adversarial examples for neural networks with
exactly verified robustness against -bounded input perturbations
by exploiting floating point error. We argue that any exact verification of
real-valued neural networks must accurately model the implementation details of
any floating point arithmetic used during inference or verification
Towards a Robust Deep Neural Network in Texts: A Survey
Deep neural networks (DNNs) have achieved remarkable success in various tasks
(e.g., image classification, speech recognition, and natural language
processing). However, researches have shown that DNN models are vulnerable to
adversarial examples, which cause incorrect predictions by adding imperceptible
perturbations into normal inputs. Studies on adversarial examples in image
domain have been well investigated, but in texts the research is not enough,
let alone a comprehensive survey in this field. In this paper, we aim at
presenting a comprehensive understanding of adversarial attacks and
corresponding mitigation strategies in texts. Specifically, we first give a
taxonomy of adversarial attacks and defenses in texts from the perspective of
different natural language processing (NLP) tasks, and then introduce how to
build a robust DNN model via testing and verification. Finally, we discuss the
existing challenges of adversarial attacks and defenses in texts and present
the future research directions in this emerging field
- …