9,258 research outputs found

    Self-tuning experience weighted attraction learning in games

    Get PDF
    Self-tuning experience weighted attraction (EWA) is a one-parameter theory of learning in games. It addresses a criticism that an earlier model (EWA) has too many parameters, by fixing some parameters at plausible values and replacing others with functions of experience so that they no longer need to be estimated. Consequently, it is econometrically simpler than the popular weighted fictitious play and reinforcement learning models. The functions of experience which replace free parameters “self-tune” over time, adjusting in a way that selects a sensible learning rule to capture subjects’ choice dynamics. For instance, the self-tuning EWA model can turn from a weighted fictitious play into an averaging reinforcement learning as subjects equilibrate and learn to ignore inferior foregone payoffs. The theory was tested on seven different games, and compared to the earlier parametric EWA model and a one-parameter stochastic equilibrium theory (QRE). Self-tuning EWA does as well as EWA in predicting behavior in new games, even though it has fewer parameters, and fits reliably better than the QRE equilibrium benchmark

    Equilibrium selection through incomplete information in coordination games: An experimental study

    Get PDF
    We perform an experiment on a pure coordination game with uncertainty about the payoffs. Our game is closely related to models that have been used in many macroeconomic and financial applications to solve problems of equilibrium indeterminacy. In our experiment each subject receives a noisy signal about the true payoffs. This game has a unique strategy profile that survives the iterative deletion of strictly dominated strategies (thus a unique Nash equilibrium). The equilibrium outcome coincides, on average, with the risk-dominant equilibrium outcome of the underlying coordination game. The behavior of the subjects converges to the theoretical prediction after enough experience has been gained. The data (and the comments) suggest that subjects do not apply through "a priori" reasoning the iterated deletion of dominated strategies. Instead, they adapt to the responses of other players. Thus, the length of the learning phase clearly varies for the different signals. We also test behavior in a game without uncertainty as a benchmark case. The game with uncertainty is inspired by the "global" games of Carlsson and Van Damme (1993).Global games, risk dominance, equilibrium selection, common knowledge, Leex

    Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

    Get PDF
    Many real-world applications can be described as large-scale games of imperfect information. To deal with these challenging domains, prior work has focused on computing Nash equilibria in a handcrafted abstraction of the domain. In this paper we introduce the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge. Our method combines fictitious self-play with deep reinforcement learning. When applied to Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium, whereas common reinforcement learning methods diverged. In Limit Texas Holdem, a poker game of real-world scale, NFSP learnt a strategy that approached the performance of state-of-the-art, superhuman algorithms based on significant domain expertise.Comment: updated version, incorporating conference feedbac

    Learning an Unknown Network State in Routing Games

    Full text link
    We study learning dynamics induced by myopic travelers who repeatedly play a routing game on a transportation network with an unknown state. The state impacts cost functions of one or more edges of the network. In each stage, travelers choose their routes according to Wardrop equilibrium based on public belief of the state. This belief is broadcast by an information system that observes the edge loads and realized costs on the used edges, and performs a Bayesian update to the prior stage's belief. We show that the sequence of public beliefs and edge load vectors generated by the repeated play converge almost surely. In any rest point, travelers have no incentive to deviate from the chosen routes and accurately learn the true costs on the used edges. However, the costs on edges that are not used may not be accurately learned. Thus, learning can be incomplete in that the edge load vectors at rest point and complete information equilibrium can be different. We present some conditions for complete learning and illustrate situations when such an outcome is not guaranteed
    corecore