Search CORE

19 research outputs found

A Context Aware Approach for Generating Natural Language Attacks

Author: Maheshwary Rishabh
Maheshwary Saket
Pudi Vikram
Publication venue
Publication date: 24/12/2020
Field of study

We study an important task of attacking natural language processing models in a black box setting. We propose an attack strategy that crafts semantically similar adversarial examples on text classification and entailment tasks. Our proposed attack finds candidate words by considering the information of both the original word and its surrounding context. It jointly leverages masked language modelling and next sentence prediction for context understanding. In comparison to attacks proposed in prior literature, we are able to generate high quality adversarial examples that do significantly better both in terms of success rate and word perturbation percentage.Comment: Accepted as Student Poster at AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Generating Natural Language Attacks in a Hard Label Black Box Setting

Author: Maheshwary Rishabh
Maheshwary Saket
Pudi Vikram
Publication venue
Publication date: 29/12/2020
Field of study

We study an important and challenging task of attacking natural language processing models in a hard label black box setting. We propose a decision-based attack strategy that crafts high quality adversarial examples on text classification and entailment tasks. Our proposed attack strategy leverages population-based optimization algorithm to craft plausible and semantically similar adversarial examples by observing only the top label predicted by the target model. At each iteration, the optimization procedure allow word replacements that maximizes the overall semantic similarity between the original and the adversarial text. Further, our approach does not rely on using substitute models or any kind of training data. We demonstrate the efficacy of our proposed approach through extensive experimentation and ablation studies on five state-of-the-art target models across seven benchmark datasets. In comparison to attacks proposed in prior literature, we are able to achieve a higher success rate with lower word perturbation percentage that too in a highly restricted setting.Comment: Accepted at AAAI 2021 (Main Conference

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications