We study the problem of adversarial language games, in which multiple agents
with conflicting goals compete with each other via natural language
interactions. While adversarial language games are ubiquitous in human
activities, little attention has been devoted to this field in natural language
processing. In this work, we propose a challenging adversarial language game
called Adversarial Taboo as an example, in which an attacker and a defender
compete around a target word. The attacker is tasked with inducing the defender
to utter the target word invisible to the defender, while the defender is
tasked with detecting the target word before being induced by the attacker. In
Adversarial Taboo, a successful attacker must hide its intention and subtly
induce the defender, while a competitive defender must be cautious with its
utterances and infer the intention of the attacker. Such language abilities can
facilitate many important downstream NLP tasks. To instantiate the game, we
create a game environment and a competition platform. Comprehensive experiments
and empirical studies on several baseline attack and defense strategies show
promising and interesting results. Based on the analysis on the game and
experiments, we discuss multiple promising directions for future research.Comment: Accepted by AAAI 202