19 research outputs found
A Context Aware Approach for Generating Natural Language Attacks
We study an important task of attacking natural language processing models in
a black box setting. We propose an attack strategy that crafts semantically
similar adversarial examples on text classification and entailment tasks. Our
proposed attack finds candidate words by considering the information of both
the original word and its surrounding context. It jointly leverages masked
language modelling and next sentence prediction for context understanding. In
comparison to attacks proposed in prior literature, we are able to generate
high quality adversarial examples that do significantly better both in terms of
success rate and word perturbation percentage.Comment: Accepted as Student Poster at AAAI 202
Generating Natural Language Attacks in a Hard Label Black Box Setting
We study an important and challenging task of attacking natural language
processing models in a hard label black box setting. We propose a
decision-based attack strategy that crafts high quality adversarial examples on
text classification and entailment tasks. Our proposed attack strategy
leverages population-based optimization algorithm to craft plausible and
semantically similar adversarial examples by observing only the top label
predicted by the target model. At each iteration, the optimization procedure
allow word replacements that maximizes the overall semantic similarity between
the original and the adversarial text. Further, our approach does not rely on
using substitute models or any kind of training data. We demonstrate the
efficacy of our proposed approach through extensive experimentation and
ablation studies on five state-of-the-art target models across seven benchmark
datasets. In comparison to attacks proposed in prior literature, we are able to
achieve a higher success rate with lower word perturbation percentage that too
in a highly restricted setting.Comment: Accepted at AAAI 2021 (Main Conference