32,410 research outputs found
A Geometry-Inspired Attack for Generating Natural Language Adversarial Examples
Generating adversarial examples for natural language is hard, as natural
language consists of discrete symbols, and examples are often of variable
lengths. In this paper, we propose a geometry-inspired attack for generating
natural language adversarial examples. Our attack generates adversarial
examples by iteratively approximating the decision boundary of Deep Neural
Networks (DNNs). Experiments on two datasets with two different models show
that our attack fools natural language models with high success rates, while
only replacing a few words. Human evaluation shows that adversarial examples
generated by our attack are hard for humans to recognize. Further experiments
show that adversarial training can improve model robustness against our attack.Comment: COLING 2020 Long Pape
Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge
Adversarial examples are inputs to machine learning models designed to cause
the model to make a mistake. They are useful for understanding the shortcomings
of machine learning models, interpreting their results, and for regularisation.
In NLP, however, most example generation strategies produce input text by using
known, pre-specified semantic transformations, requiring significant manual
effort and in-depth understanding of the problem and domain. In this paper, we
investigate the problem of automatically generating adversarial examples that
violate a set of given First-Order Logic constraints in Natural Language
Inference (NLI). We reduce the problem of identifying such adversarial examples
to a combinatorial optimisation problem, by maximising a quantity measuring the
degree of violation of such constraints and by using a language model for
generating linguistically-plausible examples. Furthermore, we propose a method
for adversarially regularising neural NLI models for incorporating background
knowledge. Our results show that, while the proposed method does not always
improve results on the SNLI and MultiNLI datasets, it significantly and
consistently increases the predictive accuracy on adversarially-crafted
datasets -- up to a 79.6% relative improvement -- while drastically reducing
the number of background knowledge violations. Furthermore, we show that
adversarial examples transfer among model architectures, and that the proposed
adversarial training procedure improves the robustness of NLI models to
adversarial examples.Comment: Accepted at the SIGNLL Conference on Computational Natural Language
Learning (CoNLL 2018
Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification
Research shows that natural language processing models are generally considered to be vulnerable to adversarial attacks; but recent work has drawn attention to the issue of validating these adversarial inputs against certain criteria (e.g., the preservation of semantics and grammaticality). Enforcing constraints to uphold such criteria may render attacks unsuccessful, raising the question of whether valid attacks are actually feasible. In this work, we investigate this through the lens of human language ability. We report on crowdsourcing studies in which we task humans with iteratively modifying words in an input text, while receiving immediate model feedback, with the aim of causing a sentiment classification model to misclassify the example. Our findings suggest that humans are capable of generating a substantial amount of adversarial examples using semantics-preserving word substitutions. We analyze how human-generated adversarial examples compare to the recently proposed TEXTFOOLER, GENETIC, BAE and SEMEMEPSO attack algorithms on the dimensions naturalness, preservation of sentiment, grammaticality and substitution rate. Our findings suggest that human-generated adversarial examples are not more able than the best algorithms to generate natural-reading, sentiment-preserving examples, though they do so by being much more computationally efficient
- …