9,258 research outputs found
Self-tuning experience weighted attraction learning in games
Self-tuning experience weighted attraction (EWA) is a one-parameter theory of learning in
games. It addresses a criticism that an earlier model (EWA) has too many parameters, by
fixing some parameters at plausible values and replacing others with functions of experience
so that they no longer need to be estimated. Consequently, it is econometrically simpler
than the popular weighted fictitious play and reinforcement learning models.
The functions of experience which replace free parameters “self-tune” over time, adjusting
in a way that selects a sensible learning rule to capture subjects’ choice dynamics. For
instance, the self-tuning EWA model can turn from a weighted fictitious play into an averaging
reinforcement learning as subjects equilibrate and learn to ignore inferior foregone
payoffs. The theory was tested on seven different games, and compared to the earlier parametric
EWA model and a one-parameter stochastic equilibrium theory (QRE). Self-tuning
EWA does as well as EWA in predicting behavior in new games, even though it has fewer
parameters, and fits reliably better than the QRE equilibrium benchmark
Equilibrium selection through incomplete information in coordination games: An experimental study
We perform an experiment on a pure coordination game with uncertainty about the payoffs. Our game is closely related to models that have been used in many macroeconomic and financial applications to solve problems of equilibrium indeterminacy. In our experiment each subject receives a noisy signal about the true payoffs. This game has a unique strategy profile that survives the iterative deletion of strictly dominated strategies (thus a unique Nash equilibrium). The equilibrium outcome coincides, on average, with the risk-dominant equilibrium outcome of the underlying coordination game. The behavior of the subjects converges to the theoretical prediction after enough experience has been gained. The data (and the comments) suggest that subjects do not apply through "a priori" reasoning the iterated deletion of dominated strategies. Instead, they adapt to the responses of other players. Thus, the length of the learning phase clearly varies for the different signals. We also test behavior in a game without uncertainty as a benchmark case. The game with uncertainty is inspired by the "global" games of Carlsson and Van Damme (1993).Global games, risk dominance, equilibrium selection, common knowledge, Leex
Deep Reinforcement Learning from Self-Play in Imperfect-Information Games
Many real-world applications can be described as large-scale games of
imperfect information. To deal with these challenging domains, prior work has
focused on computing Nash equilibria in a handcrafted abstraction of the
domain. In this paper we introduce the first scalable end-to-end approach to
learning approximate Nash equilibria without prior domain knowledge. Our method
combines fictitious self-play with deep reinforcement learning. When applied to
Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium,
whereas common reinforcement learning methods diverged. In Limit Texas Holdem,
a poker game of real-world scale, NFSP learnt a strategy that approached the
performance of state-of-the-art, superhuman algorithms based on significant
domain expertise.Comment: updated version, incorporating conference feedbac
Learning an Unknown Network State in Routing Games
We study learning dynamics induced by myopic travelers who repeatedly play a
routing game on a transportation network with an unknown state. The state
impacts cost functions of one or more edges of the network. In each stage,
travelers choose their routes according to Wardrop equilibrium based on public
belief of the state. This belief is broadcast by an information system that
observes the edge loads and realized costs on the used edges, and performs a
Bayesian update to the prior stage's belief. We show that the sequence of
public beliefs and edge load vectors generated by the repeated play converge
almost surely. In any rest point, travelers have no incentive to deviate from
the chosen routes and accurately learn the true costs on the used edges.
However, the costs on edges that are not used may not be accurately learned.
Thus, learning can be incomplete in that the edge load vectors at rest point
and complete information equilibrium can be different. We present some
conditions for complete learning and illustrate situations when such an outcome
is not guaranteed
- …