31 research outputs found
Data Poisoning Attacks in Contextual Bandits
We study offline data poisoning attacks in contextual bandits, a class of
reinforcement learning problems with important applications in online
recommendation and adaptive medical treatment, among others. We provide a
general attack framework based on convex optimization and show that by slightly
manipulating rewards in the data, an attacker can force the bandit algorithm to
pull a target arm for a target contextual vector. The target arm and target
contextual vector are both chosen by the attacker. That is, the attacker can
hijack the behavior of a contextual bandit. We also investigate the feasibility
and the side effects of such attacks, and identify future directions for
defense. Experiments on both synthetic and real-world data demonstrate the
efficiency of the attack algorithm.Comment: GameSec 201
Bandit models and Blotto games
In this thesis we present a new take on two classic problems of game theory: the "multiarmed bandit" problem of dynamic learning, and the "Colonel Blotto" game, a multidi-
mensional contest.
In Chapters 2-4 we treat the questions of experimentation with congestion: how do players search and learn about options when they are competing for access with other
players? We consider a bandit model in which two players choose between learning about
the quality of a risky option (modelled as a Poisson process with unknown arrival rate),
and competing for the use of a single shared safe option that can only be used by one
agent at the time.
We present the equilibria of the game when switching to the safe option is irrevocable,
and when it is not. We show that the equilibrium is always inefficient: it involves too
little experimentation when compared to the planner solution. The striking equilibrium
dynamics of the game with revocable exit are driven by a strategic option-value arising
purely from competition between the players. This constitutes a new result in the bandit
literature. Finally we present extensions to the model. In particular we assume that
players do not observe the result of their opponent's experimentation.
In Chapter 5 we turn to the n-dimensional Blotto game and allow battlefi�elds to have
di�fferent values. We describe a geometrical method for constructing equilibrium distribution in the Colonel Blotto game with asymmetric battlfi�eld values. It generalises the
3-dimensional construction method �first described by Gross and Wagner (1950). The proposed method does particularly well in instances of the Colonel Blotto game in which the
battlefi�eld weights satisfy some clearly defi�ned regularity conditions. The chapter also
explores the parallel between these conditions and the integer partitioning problem in
combinatorial optimisation