28 research outputs found
Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks
We study the problem of computing an approximate Nash equilibrium of
continuous-action game without access to gradients. Such game access is common
in reinforcement learning settings, where the environment is typically treated
as a black box. To tackle this problem, we apply zeroth-order optimization
techniques that combine smoothed gradient estimators with equilibrium-finding
dynamics. We model players' strategies using artificial neural networks. In
particular, we use randomized policy networks to model mixed strategies. These
take noise in addition to an observation as input and can flexibly represent
arbitrary observation-dependent, continuous-action distributions. Being able to
model such mixed strategies is crucial for tackling continuous-action games
that lack pure-strategy equilibria. We evaluate the performance of our method
using an approximation of the Nash convergence metric from game theory, which
measures how much players can benefit from unilaterally changing their
strategy. We apply our method to continuous Colonel Blotto games, single-item
and multi-item auctions, and a visibility game. The experiments show that our
method can quickly find high-quality approximate equilibria. Furthermore, they
show that the dimensionality of the input noise is crucial for performance. To
our knowledge, this paper is the first to solve general continuous-action games
with unrestricted mixed strategies and without any gradient information
Open-ended Learning in Symmetric Zero-sum Games
Zero-sum games such as chess and poker are, abstractly, functions that
evaluate pairs of agents, for example labeling them `winner' and `loser'. If
the game is approximately transitive, then self-play generates sequences of
agents of increasing strength. However, nontransitive games, such as
rock-paper-scissors, can exhibit strategic cycles, and there is no longer a
clear objective -- we want agents to increase in strength, but against whom is
unclear. In this paper, we introduce a geometric framework for formulating
agent objectives in zero-sum games, in order to construct adaptive sequences of
objectives that yield open-ended learning. The framework allows us to reason
about population performance in nontransitive games, and enables the
development of a new algorithm (rectified Nash response, PSRO_rN) that uses
game-theoretic niching to construct diverse populations of effective agents,
producing a stronger set of agents than existing algorithms. We apply PSRO_rN
to two highly nontransitive resource allocation games and find that PSRO_rN
consistently outperforms the existing alternatives.Comment: ICML 2019, final versio
Strategically Revealing Intentions in General Lotto Games
Strategic decision-making in uncertain and adversarial environments is crucial for the security of modern systems and infrastructures. A salient feature of many optimal decision-making policies is a level of unpredictability, or randomness, which helps to keep an adversary uncertain about the system’s behavior. This paper seeks to explore decision-making policies on the other end of the spectrum – namely, whether there are benefits in revealing one’s strategic intentions to an opponent before engaging in competition.We study these scenarios in a well-studied model of competitive resource allocation problem known as General Lotto games. In the classic formulation, two competing players simultaneously allocate their assets to a set of battlefields, and the resulting payoffs are derived in a zero-sum fashion. Here, we consider a multi-step extension where one of the players has the option to publicly pre-commit assets in a binding fashion to battlefields before play begins. In response, the opponent decides which of these battlefields to secure (or abandon) by matching the pre-commitment with its own assets. They then engage in a General Lotto game over the remaining set of battlefields. Interestingly, this paper highlights many scenarios where strategically revealing intentions can actually significantly improve one’s payoff. This runs contrary to the conventional wisdom that randomness should be a central component of decision-making in adversarial environments
Strategically Revealing Intentions in General Lotto Games
Strategic decision-making in uncertain and adversarial environments is crucial for the security of modern systems and infrastructures. A salient feature of many optimal decision-making policies is a level of unpredictability, or randomness, which helps to keep an adversary uncertain about the system’s behavior. This paper seeks to explore decision-making policies on the other end of the spectrum – namely, whether there are benefits in revealing one’s strategic intentions to an opponent before engaging in competition.We study these scenarios in a well-studied model of competitive resource allocation problem known as General Lotto games. In the classic formulation, two competing players simultaneously allocate their assets to a set of battlefields, and the resulting payoffs are derived in a zero-sum fashion. Here, we consider a multi-step extension where one of the players has the option to publicly pre-commit assets in a binding fashion to battlefields before play begins. In response, the opponent decides which of these battlefields to secure (or abandon) by matching the pre-commitment with its own assets. They then engage in a General Lotto game over the remaining set of battlefields. Interestingly, this paper highlights many scenarios where strategically revealing intentions can actually significantly improve one’s payoff. This runs contrary to the conventional wisdom that randomness should be a central component of decision-making in adversarial environments
The Art of Concession in General Lotto Games
Success in adversarial environments often requires investment into additional resources in order to improve one’s competitive position. But, can intentionally decreasing one’s own competitiveness ever provide strategic benefits in such settings? In this paper, we focus on characterizing the role of “concessions” as a component of strategic decision making. Specifically, we investigate whether a player can gain an advantage by either conceding budgetary resources or conceding valuable prizes to an opponent. While one might na¨ıvely assume that the player cannot, our work demonstrates that – perhaps surprisingly – concessions do offer strategic benefits when made correctly. In the context of General Lotto games, we first show that neither budgetary concessions nor value concessions can be advantageous to either player in a 1-vs.-1 scenario. However, in settings where two players compete against a common adversary, we find opportunities for one of the two players to improve her payoff by conceding a prize to the adversary. We provide a set of sufficient conditions under which such concessions exist