293 research outputs found
Training for Faster Adversarial Robustness Verification via Inducing ReLU Stability
We explore the concept of co-design in the context of neural network
verification. Specifically, we aim to train deep neural networks that not only
are robust to adversarial perturbations but also whose robustness can be
verified more easily. To this end, we identify two properties of network models
- weight sparsity and so-called ReLU stability - that turn out to significantly
impact the complexity of the corresponding verification task. We demonstrate
that improving weight sparsity alone already enables us to turn computationally
intractable verification problems into tractable ones. Then, improving ReLU
stability leads to an additional 4-13x speedup in verification times. An
important feature of our methodology is its "universality," in the sense that
it can be used with a broad range of training procedures and verification
approaches
Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness
Previous work shows that adversarially robust generalization requires larger
sample complexity, and the same dataset, e.g., CIFAR-10, which enables good
standard accuracy may not suffice to train robust models. Since collecting new
training data could be costly, we focus on better utilizing the given data by
inducing the regions with high sample density in the feature space, which could
lead to locally sufficient samples for robust learning. We first formally show
that the softmax cross-entropy (SCE) loss and its variants convey inappropriate
supervisory signals, which encourage the learned feature points to spread over
the space sparsely in training. This inspires us to propose the Max-Mahalanobis
center (MMC) loss to explicitly induce dense feature regions in order to
benefit robustness. Namely, the MMC loss encourages the model to concentrate on
learning ordered and compact representations, which gather around the preset
optimal centers for different classes. We empirically demonstrate that applying
the MMC loss can significantly improve robustness even under strong adaptive
attacks, while keeping state-of-the-art accuracy on clean inputs with little
extra computation compared to the SCE loss.Comment: ICLR 202
Scaleable input gradient regularization for adversarial robustness
In this work we revisit gradient regularization for adversarial robustness
with some new ingredients. First, we derive new per-image theoretical
robustness bounds based on local gradient information. These bounds strongly
motivate input gradient regularization. Second, we implement a scaleable
version of input gradient regularization which avoids double backpropagation:
adversarially robust ImageNet models are trained in 33 hours on four consumer
grade GPUs. Finally, we show experimentally and through theoretical
certification that input gradient regularization is competitive with
adversarial training. Moreover we demonstrate that gradient regularization does
not lead to gradient obfuscation or gradient masking
Wasserstein Adversarial Examples via Projected Sinkhorn Iterations
A rapidly growing area of work has studied the existence of adversarial
examples, datapoints which have been perturbed to fool a classifier, but the
vast majority of these works have focused primarily on threat models defined by
norm-bounded perturbations. In this paper, we propose a new threat
model for adversarial attacks based on the Wasserstein distance. In the image
classification setting, such distances measure the cost of moving pixel mass,
which naturally cover "standard" image manipulations such as scaling, rotation,
translation, and distortion (and can potentially be applied to other settings
as well). To generate Wasserstein adversarial examples, we develop a procedure
for projecting onto the Wasserstein ball, based upon a modified version of the
Sinkhorn iteration. The resulting algorithm can successfully attack image
classification models, bringing traditional CIFAR10 models down to 3% accuracy
within a Wasserstein ball with radius 0.1 (i.e., moving 10% of the image mass 1
pixel), and we demonstrate that PGD-based adversarial training can improve this
adversarial accuracy to 76%. In total, this work opens up a new direction of
study in adversarial robustness, more formally considering convex metrics that
accurately capture the invariances that we typically believe should exist in
classifiers. Code for all experiments in the paper is available at
https://github.com/locuslab/projected_sinkhorn
Certifying Strategyproof Auction Networks
Optimal auctions maximize a seller's expected revenue subject to individual
rationality and strategyproofness for the buyers. Myerson's seminal work in
1981 settled the case of auctioning a single item; however, subsequent decades
of work have yielded little progress moving beyond a single item, leaving the
design of revenue-maximizing auctions as a central open problem in the field of
mechanism design. A recent thread of work in "differentiable economics" has
used tools from modern deep learning to instead learn good mechanisms. We focus
on the RegretNet architecture, which can represent auctions with arbitrary
numbers of items and participants; it is trained to be empirically
strategyproof, but the property is never exactly verified leaving potential
loopholes for market participants to exploit. We propose ways to explicitly
verify strategyproofness under a particular valuation profile using techniques
from the neural network verification literature. Doing so requires making
several modifications to the RegretNet architecture in order to represent it
exactly in an integer program. We train our network and produce certificates in
several settings, including settings for which the optimal strategyproof
mechanism is not known
Variational Inference with Latent Space Quantization for Adversarial Resilience
Despite their tremendous success in modelling high-dimensional data
manifolds, deep neural networks suffer from the threat of adversarial attacks -
Existence of perceptually valid input-like samples obtained through careful
perturbation that lead to degradation in the performance of the underlying
model. Major concerns with existing defense mechanisms include
non-generalizability across different attacks, models and large inference time.
In this paper, we propose a generalized defense mechanism capitalizing on the
expressive power of regularized latent space based generative models. We design
an adversarial filter, devoid of access to classifier and adversaries, which
makes it usable in tandem with any classifier. The basic idea is to learn a
Lipschitz constrained mapping from the data manifold, incorporating adversarial
perturbations, to a quantized latent space and re-map it to the true data
manifold. Specifically, we simultaneously auto-encode the data manifold and its
perturbations implicitly through the perturbations of the regularized and
quantized generative latent space, realized using variational inference. We
demonstrate the efficacy of the proposed formulation in providing resilience
against multiple attack types (black and white box) and methods, while being
almost real-time. Our experiments show that the proposed method surpasses the
state-of-the-art techniques in several cases
Efficient Exact Verification of Binarized Neural Networks
We present a new system, EEV, for verifying binarized neural networks (BNNs).
We formulate BNN verification as a Boolean satisfiability problem (SAT) with
reified cardinality constraints of the form ,
where and are Boolean variables possibly with negation and is an
integer constant. We also identify two properties, specifically balanced weight
sparsity and lower cardinality bounds, that reduce the verification complexity
of BNNs. EEV contains both a SAT solver enhanced to handle reified cardinality
constraints natively and novel training strategies designed to reduce
verification complexity by delivering networks with improved sparsity
properties and cardinality bounds. We demonstrate the effectiveness of EEV by
presenting the first exact verification results for -bounded
adversarial robustness of nontrivial convolutional BNNs on the MNIST and
CIFAR10 datasets. Our results also show that, depending on the dataset and
network architecture, our techniques verify BNNs between a factor of ten to ten
thousand times faster than the best previous exact verification techniques for
either binarized or real-valued networks
What Do Adversarially Robust Models Look At?
In this paper, we address the open question: "What do adversarially robust
models look at?" Recently, it has been reported in many works that there exists
the trade-off between standard accuracy and adversarial robustness. According
to prior works, this trade-off is rooted in the fact that adversarially robust
and standard accurate models might depend on very different sets of features.
However, it has not been well studied what kind of difference actually exists.
In this paper, we analyze this difference through various experiments visually
and quantitatively. Experimental results show that adversarially robust models
look at things at a larger scale than standard models and pay less attention to
fine textures. Furthermore, although it has been claimed that adversarially
robust features are not compatible with standard accuracy, there is even a
positive effect by using them as pre-trained models particularly in low
resolution datasets
Improving the Certified Robustness of Neural Networks via Consistency Regularization
A range of defense methods have been proposed to improve the robustness of
neural networks on adversarial examples, among which provable defense methods
have been demonstrated to be effective to train neural networks that are
certifiably robust to the attacker. However, most of these provable defense
methods treat all examples equally during training process, which ignore the
inconsistent constraint of certified robustness between correctly classified
(natural) and misclassified examples. In this paper, we explore this
inconsistency caused by misclassified examples and add a novel consistency
regularization term to make better use of the misclassified examples.
Specifically, we identified that the certified robustness of network can be
significantly improved if the constraint of certified robustness on
misclassified examples and correctly classified examples is consistent.
Motivated by this discovery, we design a new defense regularization term called
Misclassification Aware Adversarial Regularization (MAAR), which constrains the
output probability distributions of all examples in the certified region of the
misclassified example. Experimental results show that our proposed MAAR
achieves the best certified robustness and comparable accuracy on CIFAR-10 and
MNIST datasets in comparison with several state-of-the-art methods.Comment: 4 pages, appeare in AAAI21-RSEM
Towards Stable and Efficient Training of Verifiably Robust Neural Networks
Training neural networks with verifiable robustness guarantees is
challenging. Several existing approaches utilize linear relaxation based neural
network output bounds under perturbation, but they can slow down training by a
factor of hundreds depending on the underlying network architectures.
Meanwhile, interval bound propagation (IBP) based training is efficient and
significantly outperforms linear relaxation based methods on many tasks, yet it
may suffer from stability issues since the bounds are much looser especially at
the beginning of training. In this paper, we propose a new certified
adversarial training method, CROWN-IBP, by combining the fast IBP bounds in a
forward bounding pass and a tight linear relaxation based bound, CROWN, in a
backward bounding pass. CROWN-IBP is computationally efficient and consistently
outperforms IBP baselines on training verifiably robust neural networks. We
conduct large scale experiments on MNIST and CIFAR datasets, and outperform all
previous linear relaxation and bound propagation based certified defenses in
robustness. Notably, we achieve 7.02% verified test error on
MNIST at , and 66.94% on CIFAR-10 with . Code is
available at https://github.com/deepmind/interval-bound-propagation
(TensorFlow) and https://github.com/huanzhang12/CROWN-IBP (PyTorch)
- …