Search CORE

83,884 research outputs found

Reevaluating Adversarial Examples in Natural Language

Author: Ji Yangfeng
Lanchantin Jack
Lifland Eli
Morris John X.
Qi Yanjun
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

State-of-the-art attacks on NLP models lack a shared definition of a what constitutes a successful attack. We distill ideas from past work into a unified framework: a successful natural language adversarial example is a perturbation that fools the model and follows some linguistic constraints. We then analyze the outputs of two state-of-the-art synonym substitution attacks. We find that their perturbations often do not preserve semantics, and 38% introduce grammatical errors. Human surveys reveal that to successfully preserve semantics, we need to significantly increase the minimum cosine similarities between the embeddings of swapped words and between the sentence encodings of original and perturbed sentences.With constraints adjusted to better preserve semantics and grammaticality, the attack success rate drops by over 70 percentage points.Comment: 15 pages; 9 Tables; 5 Figure

arXiv.org e-Print Archive

Crossref

Adversarial Training for Free!

Author: Davis Larry S.
Dickerson John
Ghiasi Amin
Goldstein Tom
Najibi Mahyar
Shafahi Ali
Studer Christoph
Taylor Gavin
Xu Zheng
Publication venue
Publication date: 20/11/2019
Field of study

Adversarial training, in which a network is trained on adversarial examples, is one of the few defenses against adversarial attacks that withstands strong attacks. Unfortunately, the high cost of generating strong adversarial examples makes standard adversarial training impractical on large-scale problems like ImageNet. We present an algorithm that eliminates the overhead cost of generating adversarial examples by recycling the gradient information computed when updating model parameters. Our "free" adversarial training algorithm achieves comparable robustness to PGD adversarial training on the CIFAR-10 and CIFAR-100 datasets at negligible additional cost compared to natural training, and can be 7 to 30 times faster than other strong adversarial training methods. Using a single workstation with 4 P100 GPUs and 2 days of runtime, we can train a robust model for the large-scale ImageNet classification task that maintains 40% accuracy against PGD attacks. The code is available at https://github.com/ashafahi/free_adv_train.Comment: Accepted to NeurIPS 201

arXiv.org e-Print Archive

Repository for Publications and Research Data