233 research outputs found
VisionGuard: Runtime Detection of Adversarial Inputs to Perception Systems
Deep neural network (DNN) models have proven to be vulnerable to adversarial
attacks. In this paper, we propose VisionGuard, a novel attack- and
dataset-agnostic and computationally-light defense mechanism for adversarial
inputs to DNN-based perception systems. In particular, VisionGuard relies on
the observation that adversarial images are sensitive to lossy compression
transformations. Specifically, to determine if an image is adversarial,
VisionGuard checks if the output of the target classifier on a given input
image changes significantly after feeding it a transformed version of the image
under investigation. Moreover, we show that VisionGuard is
computationally-light both at runtime and design-time which makes it suitable
for real-time applications that may also involve large-scale image domains. To
highlight this, we demonstrate the efficiency of VisionGuard on ImageNet, a
task that is computationally challenging for the majority of relevant defenses.
Finally, we include extensive comparative experiments on the MNIST, CIFAR10,
and ImageNet datasets that show that VisionGuard outperforms existing defenses
in terms of scalability and detection performance
What Learned Representations and Influence Functions Can Tell Us About Adversarial Examples
Adversarial examples, deliberately crafted using small perturbations to fool
deep neural networks, were first studied in image processing and more recently
in NLP. While approaches to detecting adversarial examples in NLP have largely
relied on search over input perturbations, image processing has seen a range of
techniques that aim to characterise adversarial subspaces over the learned
representations.
In this paper, we adapt two such approaches to NLP, one based on nearest
neighbors and influence functions and one on Mahalanobis distances. The former
in particular produces a state-of-the-art detector when compared against
several strong baselines; moreover, the novel use of influence functions
provides insight into how the nature of adversarial example subspaces in NLP
relate to those in image processing, and also how they differ depending on the
kind of NLP task.Comment: 20 pages, Accepted in IJCNLP_AACL 202
- …