Search CORE

6 research outputs found

Frequency-guided word substitutions for detecting textual adversarial examples

Author: Griffin Lewis D.
Kleinberg Bennett
Mozes Maximilian
Stenetorp Pontus
Publication venue: arXiv.org
Publication date: 01/01/2020
Field of study

Frequency-Guided Word Substitutions for Detecting Textual Adversarial Examples

Author: Griffin Lewis D.
Kleinberg Bennett
Mozes Maximilian
Stenetorp Pontus
Publication venue
Publication date: 01/01/2020
Field of study

Recent efforts have shown that neural text processing models are vulnerable to adversarial examples, but the nature of these examples is poorly understood. In this work, we show that adversarial attacks against CNN, LSTM and Transformer-based classification models perform word substitutions that are identifiable through frequency differences between replaced words and their corresponding substitutions. Based on these findings, we propose frequency-guided word substitutions (FGWS), a simple algorithm exploiting the frequency properties of adversarial word substitutions for the detection of adversarial examples. FGWS achieves strong performance by accurately detecting adversarial examples on the SST-2 and IMDb sentiment datasets, with F1 detection scores of up to 91.4% against RoBERTa-based classification models. We compare our approach against a recently proposed perturbation discrimination framework and show that we outperform it by up to 13.0% F1.Comment: EACL 2021 camera-read

arXiv.org e-Print Archive

UCL Discovery

Tilburg University Repository

Adversarial Examples Detection with Bayesian Neural Network

Author: Hsieh Cho-Jui
Lee Thomas C. M.
Li Yao
Tang Tongyi
Publication venue
Publication date: 22/02/2024
Field of study

In this paper, we propose a new framework to detect adversarial examples motivated by the observations that random components can improve the smoothness of predictors and make it easier to simulate the output distribution of a deep neural network. With these observations, we propose a novel Bayesian adversarial example detector, short for BATer, to improve the performance of adversarial example detection. Specifically, we study the distributional difference of hidden layer output between natural and adversarial examples, and propose to use the randomness of the Bayesian neural network to simulate hidden layer output distribution and leverage the distribution dispersion to detect adversarial examples. The advantage of a Bayesian neural network is that the output is stochastic while a deep neural network without random components does not have such characteristics. Empirical results on several benchmark datasets against popular attacks show that the proposed BATer outperforms the state-of-the-art detectors in adversarial example detection

arXiv.org e-Print Archive