13,752 research outputs found
Deep Reinforcement Learning from Self-Play in Imperfect-Information Games
Many real-world applications can be described as large-scale games of
imperfect information. To deal with these challenging domains, prior work has
focused on computing Nash equilibria in a handcrafted abstraction of the
domain. In this paper we introduce the first scalable end-to-end approach to
learning approximate Nash equilibria without prior domain knowledge. Our method
combines fictitious self-play with deep reinforcement learning. When applied to
Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium,
whereas common reinforcement learning methods diverged. In Limit Texas Holdem,
a poker game of real-world scale, NFSP learnt a strategy that approached the
performance of state-of-the-art, superhuman algorithms based on significant
domain expertise.Comment: updated version, incorporating conference feedbac
The learnability criterion and monetary policy
Expectations of the future play a large role in macroeconomics. The rational expectations assumption, which is commonly used in the literature, provides an important benchmark, but may be too strong for some applications. This paper reviews some recent research that has emphasized methods for analyzing models of learning, in which expectations are not initially rational but which may become rational eventually provided certain conditions are met. Many of the applications are in the context of popular models of monetary policy. The goal of the paper is to provide a largely nontechnical survey of some, but not all, of this work and to point out connections to some related research.Monetary policy ; Rational expectations (Economic theory)
Real and Complex Monotone Communication Games
Noncooperative game-theoretic tools have been increasingly used to study many
important resource allocation problems in communications, networking, smart
grids, and portfolio optimization. In this paper, we consider a general class
of convex Nash Equilibrium Problems (NEPs), where each player aims to solve an
arbitrary smooth convex optimization problem. Differently from most of current
works, we do not assume any specific structure for the players' problems, and
we allow the optimization variables of the players to be matrices in the
complex domain. Our main contribution is the design of a novel class of
distributed (asynchronous) best-response- algorithms suitable for solving the
proposed NEPs, even in the presence of multiple solutions. The new methods,
whose convergence analysis is based on Variational Inequality (VI) techniques,
can select, among all the equilibria of a game, those that optimize a given
performance criterion, at the cost of limited signaling among the players. This
is a major departure from existing best-response algorithms, whose convergence
conditions imply the uniqueness of the NE. Some of our results hinge on the use
of VI problems directly in the complex domain; the study of these new kind of
VIs also represents a noteworthy innovative contribution. We then apply the
developed methods to solve some new generalizations of SISO and MIMO games in
cognitive radios and femtocell systems, showing a considerable performance
improvement over classical pure noncooperative schemes.Comment: to appear on IEEE Transactions in Information Theor
On Similarities between Inference in Game Theory and Machine Learning
In this paper, we elucidate the equivalence between inference in game theory and machine learning. Our aim in so doing is to establish an equivalent vocabulary between the two domains so as to facilitate developments at the intersection of both fields, and as proof of the usefulness of this approach, we use recent developments in each field to make useful improvements to the other. More specifically, we consider the analogies between smooth best responses in fictitious play and Bayesian inference methods. Initially, we use these insights to develop and demonstrate an improved algorithm for learning in games based on probabilistic moderation. That is, by integrating over the distribution of opponent strategies (a Bayesian approach within machine learning) rather than taking a simple empirical average (the approach used in standard fictitious play) we derive a novel moderated fictitious play algorithm and show that it is more likely than standard fictitious play to converge to a payoff-dominant but risk-dominated Nash equilibrium in a simple coordination game. Furthermore we consider the converse case, and show how insights from game theory can be used to derive two improved mean field variational learning algorithms. We first show that the standard update rule of mean field variational learning is analogous to a Cournot adjustment within game theory. By analogy with fictitious play, we then suggest an improved update rule, and show that this results in fictitious variational play, an improved mean field variational learning algorithm that exhibits better convergence in highly or strongly connected graphical models. Second, we use a recent advance in fictitious play, namely dynamic fictitious play, to derive a derivative action variational learning algorithm, that exhibits superior convergence properties on a canonical machine learning problem (clustering a mixture distribution)
- …