13,752 research outputs found

    Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

    Get PDF
    Many real-world applications can be described as large-scale games of imperfect information. To deal with these challenging domains, prior work has focused on computing Nash equilibria in a handcrafted abstraction of the domain. In this paper we introduce the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge. Our method combines fictitious self-play with deep reinforcement learning. When applied to Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium, whereas common reinforcement learning methods diverged. In Limit Texas Holdem, a poker game of real-world scale, NFSP learnt a strategy that approached the performance of state-of-the-art, superhuman algorithms based on significant domain expertise.Comment: updated version, incorporating conference feedbac

    The learnability criterion and monetary policy

    Get PDF
    Expectations of the future play a large role in macroeconomics. The rational expectations assumption, which is commonly used in the literature, provides an important benchmark, but may be too strong for some applications. This paper reviews some recent research that has emphasized methods for analyzing models of learning, in which expectations are not initially rational but which may become rational eventually provided certain conditions are met. Many of the applications are in the context of popular models of monetary policy. The goal of the paper is to provide a largely nontechnical survey of some, but not all, of this work and to point out connections to some related research.Monetary policy ; Rational expectations (Economic theory)

    Real and Complex Monotone Communication Games

    Full text link
    Noncooperative game-theoretic tools have been increasingly used to study many important resource allocation problems in communications, networking, smart grids, and portfolio optimization. In this paper, we consider a general class of convex Nash Equilibrium Problems (NEPs), where each player aims to solve an arbitrary smooth convex optimization problem. Differently from most of current works, we do not assume any specific structure for the players' problems, and we allow the optimization variables of the players to be matrices in the complex domain. Our main contribution is the design of a novel class of distributed (asynchronous) best-response- algorithms suitable for solving the proposed NEPs, even in the presence of multiple solutions. The new methods, whose convergence analysis is based on Variational Inequality (VI) techniques, can select, among all the equilibria of a game, those that optimize a given performance criterion, at the cost of limited signaling among the players. This is a major departure from existing best-response algorithms, whose convergence conditions imply the uniqueness of the NE. Some of our results hinge on the use of VI problems directly in the complex domain; the study of these new kind of VIs also represents a noteworthy innovative contribution. We then apply the developed methods to solve some new generalizations of SISO and MIMO games in cognitive radios and femtocell systems, showing a considerable performance improvement over classical pure noncooperative schemes.Comment: to appear on IEEE Transactions in Information Theor

    On Similarities between Inference in Game Theory and Machine Learning

    No full text
    In this paper, we elucidate the equivalence between inference in game theory and machine learning. Our aim in so doing is to establish an equivalent vocabulary between the two domains so as to facilitate developments at the intersection of both fields, and as proof of the usefulness of this approach, we use recent developments in each field to make useful improvements to the other. More specifically, we consider the analogies between smooth best responses in fictitious play and Bayesian inference methods. Initially, we use these insights to develop and demonstrate an improved algorithm for learning in games based on probabilistic moderation. That is, by integrating over the distribution of opponent strategies (a Bayesian approach within machine learning) rather than taking a simple empirical average (the approach used in standard fictitious play) we derive a novel moderated fictitious play algorithm and show that it is more likely than standard fictitious play to converge to a payoff-dominant but risk-dominated Nash equilibrium in a simple coordination game. Furthermore we consider the converse case, and show how insights from game theory can be used to derive two improved mean field variational learning algorithms. We first show that the standard update rule of mean field variational learning is analogous to a Cournot adjustment within game theory. By analogy with fictitious play, we then suggest an improved update rule, and show that this results in fictitious variational play, an improved mean field variational learning algorithm that exhibits better convergence in highly or strongly connected graphical models. Second, we use a recent advance in fictitious play, namely dynamic fictitious play, to derive a derivative action variational learning algorithm, that exhibits superior convergence properties on a canonical machine learning problem (clustering a mixture distribution)
    corecore