7,338 research outputs found
Adversarial Diversity and Hard Positive Generation
State-of-the-art deep neural networks suffer from a fundamental problem -
they misclassify adversarial examples formed by applying small perturbations to
inputs. In this paper, we present a new psychometric perceptual adversarial
similarity score (PASS) measure for quantifying adversarial images, introduce
the notion of hard positive generation, and use a diverse set of adversarial
perturbations - not just the closest ones - for data augmentation. We introduce
a novel hot/cold approach for adversarial example generation, which provides
multiple possible adversarial perturbations for every single image. The
perturbations generated by our novel approach often correspond to semantically
meaningful image structures, and allow greater flexibility to scale
perturbation-amplitudes, which yields an increased diversity of adversarial
images. We present adversarial images on several network topologies and
datasets, including LeNet on the MNIST dataset, and GoogLeNet and ResidualNet
on the ImageNet dataset. Finally, we demonstrate on LeNet and GoogLeNet that
fine-tuning with a diverse set of hard positives improves the robustness of
these networks compared to training with prior methods of generating
adversarial images.Comment: Accepted to CVPR 2016 DeepVision Worksho
Parameter-Saving Adversarial Training: Reinforcing Multi-Perturbation Robustness via Hypernetworks
Adversarial training serves as one of the most popular and effective methods
to defend against adversarial perturbations. However, most defense mechanisms
only consider a single type of perturbation while various attack methods might
be adopted to perform stronger adversarial attacks against the deployed model
in real-world scenarios, e.g., or . Defending against
various attacks can be a challenging problem since multi-perturbation
adversarial training and its variants only achieve suboptimal robustness
trade-offs, due to the theoretical limit to multi-perturbation robustness for a
single model. Besides, it is impractical to deploy large models in some
storage-efficient scenarios. To settle down these drawbacks, in this paper we
propose a novel multi-perturbation adversarial training framework,
parameter-saving adversarial training (PSAT), to reinforce multi-perturbation
robustness with an advantageous side effect of saving parameters, which
leverages hypernetworks to train specialized models against a single
perturbation and aggregate these specialized models to defend against multiple
perturbations. Eventually, we extensively evaluate and compare our proposed
method with state-of-the-art single/multi-perturbation robust methods against
various latest attack methods on different datasets, showing the robustness
superiority and parameter efficiency of our proposed method, e.g., for the
CIFAR-10 dataset with ResNet-50 as the backbone, PSAT saves approximately 80\%
of parameters with achieving the state-of-the-art robustness trade-off
accuracy.Comment: 9 pages, 2 figure
Towards Compositional Adversarial Robustness: Generalizing Adversarial Training to Composite Semantic Perturbations
Model robustness against adversarial examples of single perturbation type
such as the -norm has been widely studied, yet its generalization to
more realistic scenarios involving multiple semantic perturbations and their
composition remains largely unexplored. In this paper, we first propose a novel
method for generating composite adversarial examples. Our method can find the
optimal attack composition by utilizing component-wise projected gradient
descent and automatic attack-order scheduling. We then propose generalized
adversarial training (GAT) to extend model robustness from -ball to
composite semantic perturbations, such as the combination of Hue, Saturation,
Brightness, Contrast, and Rotation. Results obtained using ImageNet and
CIFAR-10 datasets indicate that GAT can be robust not only to all the tested
types of a single attack, but also to any combination of such attacks. GAT also
outperforms baseline -norm bounded adversarial training
approaches by a significant margin
Robust Classification via a Single Diffusion Model
Recently, diffusion models have been successfully applied to improving
adversarial robustness of image classifiers by purifying the adversarial noises
or generating realistic data for adversarial training. However, the
diffusion-based purification can be evaded by stronger adaptive attacks while
adversarial training does not perform well under unseen threats, exhibiting
inevitable limitations of these methods. To better harness the expressive power
of diffusion models, in this paper we propose Robust Diffusion Classifier
(RDC), a generative classifier that is constructed from a pre-trained diffusion
model to be adversarially robust. Our method first maximizes the data
likelihood of a given input and then predicts the class probabilities of the
optimized input using the conditional likelihood of the diffusion model through
Bayes' theorem. Since our method does not require training on particular
adversarial attacks, we demonstrate that it is more generalizable to defend
against multiple unseen threats. In particular, RDC achieves robust
accuracy against norm-bounded perturbations with
on CIFAR-10, surpassing the previous state-of-the-art
adversarial training models by . The findings highlight the potential
of generative classifiers by employing diffusion models for adversarial
robustness compared with the commonly studied discriminative classifiers
Towards Certified Probabilistic Robustness with High Accuracy
Adversarial examples pose a security threat to many critical systems built on
neural networks (such as face recognition systems, and self-driving cars).
While many methods have been proposed to build robust models, how to build
certifiably robust yet accurate neural network models remains an open problem.
For example, adversarial training improves empirical robustness, but they do
not provide certification of the model's robustness. On the other hand,
certified training provides certified robustness but at the cost of a
significant accuracy drop. In this work, we propose a novel approach that aims
to achieve both high accuracy and certified probabilistic robustness. Our
method has two parts, i.e., a probabilistic robust training method with an
additional goal of minimizing variance in terms of divergence and a runtime
inference method for certified probabilistic robustness of the prediction. The
latter enables efficient certification of the model's probabilistic robustness
at runtime with statistical guarantees. This is supported by our training
objective, which minimizes the variance of the model's predictions in a given
vicinity, derived from a general definition of model robustness. Our approach
works for a variety of perturbations and is reasonably efficient. Our
experiments on multiple models trained on different datasets demonstrate that
our approach significantly outperforms existing approaches in terms of both
certification rate and accuracy
- β¦