Search CORE

233 research outputs found

VisionGuard: Runtime Detection of Adversarial Inputs to Perception Systems

Author: Carpenter Taylor
Ivanov Radoslav
Jang Sooyong
Kantaros Yiannis
Lee Insup
Park Sangdon
Weimer James
Publication venue
Publication date: 22/02/2020
Field of study

Deep neural network (DNN) models have proven to be vulnerable to adversarial attacks. In this paper, we propose VisionGuard, a novel attack- and dataset-agnostic and computationally-light defense mechanism for adversarial inputs to DNN-based perception systems. In particular, VisionGuard relies on the observation that adversarial images are sensitive to lossy compression transformations. Specifically, to determine if an image is adversarial, VisionGuard checks if the output of the target classifier on a given input image changes significantly after feeding it a transformed version of the image under investigation. Moreover, we show that VisionGuard is computationally-light both at runtime and design-time which makes it suitable for real-time applications that may also involve large-scale image domains. To highlight this, we demonstrate the efficiency of VisionGuard on ImageNet, a task that is computationally challenging for the majority of relevant defenses. Finally, we include extensive comparative experiments on the MNIST, CIFAR10, and ImageNet datasets that show that VisionGuard outperforms existing defenses in terms of scalability and detection performance

arXiv.org e-Print Archive

ScholarlyCommons@Penn

What Learned Representations and Influence Functions Can Tell Us About Adversarial Examples

Author: Dras Mark
Tonni Shakila Mahjabin
Publication venue
Publication date: 10/10/2023
Field of study

Adversarial examples, deliberately crafted using small perturbations to fool deep neural networks, were first studied in image processing and more recently in NLP. While approaches to detecting adversarial examples in NLP have largely relied on search over input perturbations, image processing has seen a range of techniques that aim to characterise adversarial subspaces over the learned representations. In this paper, we adapt two such approaches to NLP, one based on nearest neighbors and influence functions and one on Mahalanobis distances. The former in particular produces a state-of-the-art detector when compared against several strong baselines; moreover, the novel use of influence functions provides insight into how the nature of adversarial example subspaces in NLP relate to those in image processing, and also how they differ depending on the kind of NLP task.Comment: 20 pages, Accepted in IJCNLP_AACL 202

arXiv.org e-Print Archive