5 research outputs found
Exploiting No-Regret Algorithms in System Design
We investigate a repeated two-player zero-sum game setting where the column
player is also a designer of the system, and has full control on the design of
the payoff matrix. In addition, the row player uses a no-regret algorithm to
efficiently learn how to adapt their strategy to the column player's behaviour
over time in order to achieve good total payoff. The goal of the column player
is to guide her opponent to pick a mixed strategy which is favourable for the
system designer. Therefore, she needs to: (i) design an appropriate payoff
matrix whose unique minimax solution contains the desired mixed strategy of
the row player; and (ii) strategically interact with the row player during a
sequence of plays in order to guide her opponent to converge to that desired
behaviour. To design such a payoff matrix, we propose a novel solution that
provably has a unique minimax solution with the desired behaviour. We also
investigate a relaxation of this problem where uniqueness is not required, but
all the minimax solutions have the same mixed strategy for the row player.
Finally, we propose a new game playing algorithm for the system designer and
prove that it can guide the row player, who may play a \emph{stable} no-regret
algorithm, to converge to a minimax solution
Dual-Mandate Patrols: Multi-Armed Bandits for Green Security
Conservation efforts in green security domains to protect wildlife and
forests are constrained by the limited availability of defenders (i.e.,
patrollers), who must patrol vast areas to protect from attackers (e.g.,
poachers or illegal loggers). Defenders must choose how much time to spend in
each region of the protected area, balancing exploration of infrequently
visited regions and exploitation of known hotspots. We formulate the problem as
a stochastic multi-armed bandit, where each action represents a patrol
strategy, enabling us to guarantee the rate of convergence of the patrolling
policy. However, a naive bandit approach would compromise short-term
performance for long-term optimality, resulting in animals poached and forests
destroyed. To speed up performance, we leverage smoothness in the reward
function and decomposability of actions. We show a synergy between
Lipschitz-continuity and decomposition as each aids the convergence of the
other. In doing so, we bridge the gap between combinatorial and Lipschitz
bandits, presenting a no-regret approach that tightens existing guarantees
while optimizing for short-term performance. We demonstrate that our algorithm,
LIZARD, improves performance on real-world poaching data from Cambodia.Comment: Published at AAAI 2021. 9 pages (paper and references), 3 page
appendix. 6 figures and 1 tabl
Playing repeated security games with no prior knowledge
This paper investigates repeated security games with unknown (to the defender) game payoffs and attacker behaviors. As existing work assumes prior knowledge about either the game payoffs or the attackers behaviors, they are not suitable for tackling our problem. Given this, we propose the first efficient defender strategy, based on an adversarial online learning framework, that can provably achieve good performance guarantees without any prior knowledge. In particular, we prove that our algorithm can achieve low performance loss against the best fixed strategy on hindsight (i.e., having full knowledge of the attackers moves). In addition, we prove that our algorithm can achieve an efficient competitive ratio against the optimal adaptive defender strategy. We also show that for zero-sum security games, our algorithm achieves efficient results in approximating a number of solution concepts, such as algorithmic equilibria and the minimax value. Finally, our extensive numerical results demonstrate that, without having any prior information, our algorithm still achieves good performance, compared to state-of-the-art algorithms from the literature on security games, such as SUQR, which require significant amount of prior knowledge