28 research outputs found

    Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks

    Full text link
    We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients. Such game access is common in reinforcement learning settings, where the environment is typically treated as a black box. To tackle this problem, we apply zeroth-order optimization techniques that combine smoothed gradient estimators with equilibrium-finding dynamics. We model players' strategies using artificial neural networks. In particular, we use randomized policy networks to model mixed strategies. These take noise in addition to an observation as input and can flexibly represent arbitrary observation-dependent, continuous-action distributions. Being able to model such mixed strategies is crucial for tackling continuous-action games that lack pure-strategy equilibria. We evaluate the performance of our method using an approximation of the Nash convergence metric from game theory, which measures how much players can benefit from unilaterally changing their strategy. We apply our method to continuous Colonel Blotto games, single-item and multi-item auctions, and a visibility game. The experiments show that our method can quickly find high-quality approximate equilibria. Furthermore, they show that the dimensionality of the input noise is crucial for performance. To our knowledge, this paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information

    Open-ended Learning in Symmetric Zero-sum Games

    Get PDF
    Zero-sum games such as chess and poker are, abstractly, functions that evaluate pairs of agents, for example labeling them `winner' and `loser'. If the game is approximately transitive, then self-play generates sequences of agents of increasing strength. However, nontransitive games, such as rock-paper-scissors, can exhibit strategic cycles, and there is no longer a clear objective -- we want agents to increase in strength, but against whom is unclear. In this paper, we introduce a geometric framework for formulating agent objectives in zero-sum games, in order to construct adaptive sequences of objectives that yield open-ended learning. The framework allows us to reason about population performance in nontransitive games, and enables the development of a new algorithm (rectified Nash response, PSRO_rN) that uses game-theoretic niching to construct diverse populations of effective agents, producing a stronger set of agents than existing algorithms. We apply PSRO_rN to two highly nontransitive resource allocation games and find that PSRO_rN consistently outperforms the existing alternatives.Comment: ICML 2019, final versio

    Strategically Revealing Intentions in General Lotto Games

    Get PDF
    Strategic decision-making in uncertain and adversarial environments is crucial for the security of modern systems and infrastructures. A salient feature of many optimal decision-making policies is a level of unpredictability, or randomness, which helps to keep an adversary uncertain about the system’s behavior. This paper seeks to explore decision-making policies on the other end of the spectrum – namely, whether there are benefits in revealing one’s strategic intentions to an opponent before engaging in competition.We study these scenarios in a well-studied model of competitive resource allocation problem known as General Lotto games. In the classic formulation, two competing players simultaneously allocate their assets to a set of battlefields, and the resulting payoffs are derived in a zero-sum fashion. Here, we consider a multi-step extension where one of the players has the option to publicly pre-commit assets in a binding fashion to battlefields before play begins. In response, the opponent decides which of these battlefields to secure (or abandon) by matching the pre-commitment with its own assets. They then engage in a General Lotto game over the remaining set of battlefields. Interestingly, this paper highlights many scenarios where strategically revealing intentions can actually significantly improve one’s payoff. This runs contrary to the conventional wisdom that randomness should be a central component of decision-making in adversarial environments

    Strategically Revealing Intentions in General Lotto Games

    Get PDF
    Strategic decision-making in uncertain and adversarial environments is crucial for the security of modern systems and infrastructures. A salient feature of many optimal decision-making policies is a level of unpredictability, or randomness, which helps to keep an adversary uncertain about the system’s behavior. This paper seeks to explore decision-making policies on the other end of the spectrum – namely, whether there are benefits in revealing one’s strategic intentions to an opponent before engaging in competition.We study these scenarios in a well-studied model of competitive resource allocation problem known as General Lotto games. In the classic formulation, two competing players simultaneously allocate their assets to a set of battlefields, and the resulting payoffs are derived in a zero-sum fashion. Here, we consider a multi-step extension where one of the players has the option to publicly pre-commit assets in a binding fashion to battlefields before play begins. In response, the opponent decides which of these battlefields to secure (or abandon) by matching the pre-commitment with its own assets. They then engage in a General Lotto game over the remaining set of battlefields. Interestingly, this paper highlights many scenarios where strategically revealing intentions can actually significantly improve one’s payoff. This runs contrary to the conventional wisdom that randomness should be a central component of decision-making in adversarial environments

    The Art of Concession in General Lotto Games

    Get PDF
    Success in adversarial environments often requires investment into additional resources in order to improve one’s competitive position. But, can intentionally decreasing one’s own competitiveness ever provide strategic benefits in such settings? In this paper, we focus on characterizing the role of “concessions” as a component of strategic decision making. Specifically, we investigate whether a player can gain an advantage by either conceding budgetary resources or conceding valuable prizes to an opponent. While one might na¨ıvely assume that the player cannot, our work demonstrates that – perhaps surprisingly – concessions do offer strategic benefits when made correctly. In the context of General Lotto games, we first show that neither budgetary concessions nor value concessions can be advantageous to either player in a 1-vs.-1 scenario. However, in settings where two players compete against a common adversary, we find opportunities for one of the two players to improve her payoff by conceding a prize to the adversary. We provide a set of sufficient conditions under which such concessions exist
    corecore