IMAP: Intrinsically Motivated Adversarial Policy

Ma, Xingjun; Shen, Chao; Wang, Cong; Wang, Shengjie; Wang, Xinyu; Zheng, Xiang

IMAP: Intrinsically Motivated Adversarial Policy

Authors: Xingjun Ma
Chao Shen
Cong Wang
Shengjie Wang
Xinyu Wang
Xiang Zheng
Publication date: 18 October 2023
Publisher

Abstract

Reinforcement learning agents are susceptible to evasion attacks during deployment. In single-agent environments, these attacks can occur through imperceptible perturbations injected into the inputs of the victim policy network. In multi-agent environments, an attacker can manipulate an adversarial opponent to influence the victim policy's observations indirectly. While adversarial policies offer a promising technique to craft such attacks, current methods are either sample-inefficient due to poor exploration strategies or require extra surrogate model training under the black-box assumption. To address these challenges, in this paper, we propose Intrinsically Motivated Adversarial Policy (IMAP) for efficient black-box adversarial policy learning in both single- and multi-agent environments. We formulate four types of adversarial intrinsic regularizers -- maximizing the adversarial state coverage, policy coverage, risk, or divergence -- to discover potential vulnerabilities of the victim policy in a principled way. We also present a novel Bias-Reduction (BR) method to boost IMAP further. Our experiments validate the effectiveness of the four types of adversarial intrinsic regularizers and BR in enhancing black-box adversarial policy learning across a variety of environments. Our IMAP successfully evades two types of defense methods, adversarial training and robust regularizer, decreasing the performance of the state-of-the-art robust WocaR-PPO agents by 34%-54% across four single-agent tasks. IMAP also achieves a state-of-the-art attacking success rate of 83.91% in the multi-agent game YouShallNotPass

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2305.02605

Last time updated on 06/05/2023