VoteTRANS: Detecting Adversarial Text without Training by Voting on Hard
  Labels of Transformations

Echizen, Isao; Fukushima, Kazuhide; Hidano, Seira; Kiyomoto, Shinsaku; Nguyen-Son, Hoang-Quoc

VoteTRANS: Detecting Adversarial Text without Training by Voting on Hard Labels of Transformations

Authors: Isao Echizen
Kazuhide Fukushima
Seira Hidano
Shinsaku Kiyomoto
Hoang-Quoc Nguyen-Son
Publication date: 2 June 2023
Publisher

Abstract

Adversarial attacks reveal serious flaws in deep learning models. More dangerously, these attacks preserve the original meaning and escape human recognition. Existing methods for detecting these attacks need to be trained using original/adversarial data. In this paper, we propose detection without training by voting on hard labels from predictions of transformations, namely, VoteTRANS. Specifically, VoteTRANS detects adversarial text by comparing the hard labels of input text and its transformation. The evaluation demonstrates that VoteTRANS effectively detects adversarial text across various state-of-the-art attacks, models, and datasets.Comment: Findings of ACL 2023 (long paper

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2306.01273

Last time updated on 06/06/2023