7,324 research outputs found
A Generalised Method for Empirical Game Theoretic Analysis
This paper provides theoretical bounds for empirical game theoretical
analysis of complex multi-agent interactions. We provide insights in the
empirical meta game showing that a Nash equilibrium of the meta-game is an
approximate Nash equilibrium of the true underlying game. We investigate and
show how many data samples are required to obtain a close enough approximation
of the underlying game. Additionally, we extend the meta-game analysis
methodology to asymmetric games. The state-of-the-art has only considered
empirical games in which agents have access to the same strategy sets and the
payoff structure is symmetric, implying that agents are interchangeable.
Finally, we carry out an empirical illustration of the generalised method in
several domains, illustrating the theory and evolutionary dynamics of several
versions of the AlphaGo algorithm (symmetric), the dynamics of the Colonel
Blotto game played by human players on Facebook (symmetric), and an example of
a meta-game in Leduc Poker (asymmetric), generated by the PSRO multi-agent
learning algorithm.Comment: will appear at AAMAS'1
On Similarities between Inference in Game Theory and Machine Learning
In this paper, we elucidate the equivalence between inference in game theory and machine learning. Our aim in so doing is to establish an equivalent vocabulary between the two domains so as to facilitate developments at the intersection of both fields, and as proof of the usefulness of this approach, we use recent developments in each field to make useful improvements to the other. More specifically, we consider the analogies between smooth best responses in fictitious play and Bayesian inference methods. Initially, we use these insights to develop and demonstrate an improved algorithm for learning in games based on probabilistic moderation. That is, by integrating over the distribution of opponent strategies (a Bayesian approach within machine learning) rather than taking a simple empirical average (the approach used in standard fictitious play) we derive a novel moderated fictitious play algorithm and show that it is more likely than standard fictitious play to converge to a payoff-dominant but risk-dominated Nash equilibrium in a simple coordination game. Furthermore we consider the converse case, and show how insights from game theory can be used to derive two improved mean field variational learning algorithms. We first show that the standard update rule of mean field variational learning is analogous to a Cournot adjustment within game theory. By analogy with fictitious play, we then suggest an improved update rule, and show that this results in fictitious variational play, an improved mean field variational learning algorithm that exhibits better convergence in highly or strongly connected graphical models. Second, we use a recent advance in fictitious play, namely dynamic fictitious play, to derive a derivative action variational learning algorithm, that exhibits superior convergence properties on a canonical machine learning problem (clustering a mixture distribution)
Deep Reinforcement Learning from Self-Play in Imperfect-Information Games
Many real-world applications can be described as large-scale games of
imperfect information. To deal with these challenging domains, prior work has
focused on computing Nash equilibria in a handcrafted abstraction of the
domain. In this paper we introduce the first scalable end-to-end approach to
learning approximate Nash equilibria without prior domain knowledge. Our method
combines fictitious self-play with deep reinforcement learning. When applied to
Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium,
whereas common reinforcement learning methods diverged. In Limit Texas Holdem,
a poker game of real-world scale, NFSP learnt a strategy that approached the
performance of state-of-the-art, superhuman algorithms based on significant
domain expertise.Comment: updated version, incorporating conference feedbac
Voting Power and Voting Blocs
We investigate the applicability of voting power indices, in particular the Penrose index (aka absolute Banzhaf index), in the analysis of voting blocs by means of a hypothetical voting body. We use the power of individual bloc members to study the implications of the formation of blocs and how voting power varies as bloc size varies. This technique of analysis has many real world applications to legislatures and international bodies. It can be generalised in many ways : the analysis is a priori (assuming formal voting and ignoring actual voting behaviour) but can be made empirical with voting data ; it examines the consequences of two blocs but can easily be extended to more.
Information Causality, the Tsirelson Bound, and the 'Being-Thus' of Things
The principle of `information causality' can be used to derive an upper
bound---known as the `Tsirelson bound'---on the strength of quantum mechanical
correlations, and has been conjectured to be a foundational principle of
nature. To date, however, it has not been sufficiently motivated to play such a
foundational role. The motivations that have so far been given are, as I argue,
either unsatisfactorily vague or appeal to little if anything more than
intuition. Thus in this paper I consider whether some way might be found to
successfully motivate the principle. And I propose that a compelling way of so
doing is to understand it as a generalisation of Einstein's principle of the
mutually independent existence---the `being-thus'---of spatially distant
things. In particular I first describe an argument, due to Demopoulos, to the
effect that the so-called `no-signalling' condition can be viewed as a
generalisation of Einstein's principle that is appropriate for an irreducibly
statistical theory such as quantum mechanics. I then argue that a compelling
way to motivate information causality is to in turn consider it as a further
generalisation of the Einsteinian principle that is appropriate for a theory of
communication. I describe, however, some important conceptual obstacles that
must yet be overcome if the project of establishing information causality as a
foundational principle of nature is to succeed.Comment: '*' footnote added to page 1; 24 pages, 1 figure; Forthcoming in
Studies in History and Philosophy of Modern Physic
Open-ended Learning in Symmetric Zero-sum Games
Zero-sum games such as chess and poker are, abstractly, functions that
evaluate pairs of agents, for example labeling them `winner' and `loser'. If
the game is approximately transitive, then self-play generates sequences of
agents of increasing strength. However, nontransitive games, such as
rock-paper-scissors, can exhibit strategic cycles, and there is no longer a
clear objective -- we want agents to increase in strength, but against whom is
unclear. In this paper, we introduce a geometric framework for formulating
agent objectives in zero-sum games, in order to construct adaptive sequences of
objectives that yield open-ended learning. The framework allows us to reason
about population performance in nontransitive games, and enables the
development of a new algorithm (rectified Nash response, PSRO_rN) that uses
game-theoretic niching to construct diverse populations of effective agents,
producing a stronger set of agents than existing algorithms. We apply PSRO_rN
to two highly nontransitive resource allocation games and find that PSRO_rN
consistently outperforms the existing alternatives.Comment: ICML 2019, final versio
- ā¦