181 research outputs found
Harnessing the Vulnerability of Latent Layers in Adversarially Trained Models
Neural networks are vulnerable to adversarial attacks - small visually imperceptible crafted noise which when added to the input drastically changes the output. The most effective method of defending against these adversarial attacks is to use the methodology of adversarial training. We analyze the adversarially trained robust models to study their vulnerability against adversarial attacks at the level of the latent layers. Our analysis reveals that contrary to the input layer which is robust to adversarial attack, the latent layer of these robust models are highly susceptible to adversarial perturbations of small magnitude. Leveraging this information, we introduce a new technique Latent Adversarial Training (LAT) which comprises of fine-tuning the adversarially trained models to ensure the robustness at the feature layers. We also propose Latent Attack (LA), a novel algorithm for construction of adversarial examples. LAT results in minor improvement in test accuracy and leads to a state-of-the-art adversarial accuracy against the universal first-order adversarial PGD attack which is shown for the MNIST, CIFAR-10, CIFAR-100 datasets
On the benefits of defining vicinal distributions in latent space
The vicinal risk minimization (VRM) principle is an empirical risk
minimization (ERM) variant that replaces Dirac masses with vicinal functions.
There is strong numerical and theoretical evidence showing that VRM outperforms
ERM in terms of generalization if appropriate vicinal functions are chosen.
Mixup Training (MT), a popular choice of vicinal distribution, improves the
generalization performance of models by introducing globally linear behavior in
between training examples. Apart from generalization, recent works have shown
that mixup trained models are relatively robust to input
perturbations/corruptions and at the same time are calibrated better than their
non-mixup counterparts. In this work, we investigate the benefits of defining
these vicinal distributions like mixup in latent space of generative models
rather than in input space itself. We propose a new approach - \textit{VarMixup
(Variational Mixup)} - to better sample mixup images by using the latent
manifold underlying the data. Our empirical studies on CIFAR-10, CIFAR-100, and
Tiny-ImageNet demonstrate that models trained by performing mixup in the latent
manifold learned by VAEs are inherently more robust to various input
corruptions/perturbations, are significantly better calibrated, and exhibit
more local-linear loss landscapes.Comment: Accepted at Elsevier Pattern Recognition Letters (2021), Best Paper
Award at CVPR 2021 Workshop on Adversarial Machine Learning in Real-World
Computer Vision (AML-CV), Also accepted at ICLR 2021 Workshops on
Robust-Reliable Machine Learning (Oral) and Generalization beyond the
training distribution (Abstract
CARSO: Counter-Adversarial Recall of Synthetic Observations
In this paper, we propose a novel adversarial defence mechanism for image
classification -- CARSO -- inspired by cues from cognitive neuroscience. The
method is synergistically complementary to adversarial training and relies on
knowledge of the internal representation of the attacked classifier. Exploiting
a generative model for adversarial purification, conditioned on such
representation, it samples reconstructions of inputs to be finally classified.
Experimental evaluation by a well-established benchmark of varied, strong
adaptive attacks, across diverse image datasets and classifier architectures,
shows that CARSO is able to defend the classifier significantly better than
state-of-the-art adversarial training alone -- with a tolerable clean accuracy
toll. Furthermore, the defensive architecture succeeds in effectively shielding
itself from unforeseen threats, and end-to-end attacks adapted to fool
stochastic defences. Code and pre-trained models are available at
https://github.com/emaballarin/CARSO .Comment: 20 pages, 5 figures, 10 table
The Odds are Odd: A Statistical Test for Detecting Adversarial Examples
We investigate conditions under which test statistics exist that can reliably
detect examples, which have been adversarially manipulated in a white-box
attack. These statistics can be easily computed and calibrated by randomly
corrupting inputs. They exploit certain anomalies that adversarial attacks
introduce, in particular if they follow the paradigm of choosing perturbations
optimally under p-norm constraints. Access to the log-odds is the only
requirement to defend models. We justify our approach empirically, but also
provide conditions under which detectability via the suggested test statistics
is guaranteed to be effective. In our experiments, we show that it is even
possible to correct test time predictions for adversarial attacks with high
accuracy
- …