78,138 research outputs found
Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines
Learning strategies for imperfect information games from samples of
interaction is a challenging problem. A common method for this setting, Monte
Carlo Counterfactual Regret Minimization (MCCFR), can have slow long-term
convergence rates due to high variance. In this paper, we introduce a variance
reduction technique (VR-MCCFR) that applies to any sampling variant of MCCFR.
Using this technique, per-iteration estimated values and updates are
reformulated as a function of sampled values and state-action baselines,
similar to their use in policy gradient reinforcement learning. The new
formulation allows estimates to be bootstrapped from other estimates within the
same episode, propagating the benefits of baselines along the sampled
trajectory; the estimates remain unbiased even when bootstrapping from other
estimates. Finally, we show that given a perfect baseline, the variance of the
value estimates can be reduced to zero. Experimental evaluation shows that
VR-MCCFR brings an order of magnitude speedup, while the empirical variance
decreases by three orders of magnitude. The decreased variance allows for the
first time CFR+ to be used with sampling, increasing the speedup to two orders
of magnitude
Estimation of Finite Sequential Games
I study the estimation of finite sequential games with perfect information. The major challenge in estimation is computation of high-dimensional truncated integration whose domain is complicated by strategic interaction. I show that this complication resolves when unobserved off-the-equilibrium-path strategies are controlled for. Separately evaluating the likelihood contribution of each subgame perfect strategy profile that rationalizes the observed outcome allows the use of the GHK simulator, the most widely used importance-sampling probit simulator. Monte Carlo experiments demonstrate the performance and robustness of the proposed method, and confirm that misspecification of the decision order leads to underestimation of strategic effect.Inference In Discrete Games; Sequential Games; Monte Carlo Integration; GHK Simulator; Subgame Perfection; Perfect Information
Predicting Human Cooperation
The Prisoner's Dilemma has been a subject of extensive research due to its
importance in understanding the ever-present tension between individual
self-interest and social benefit. A strictly dominant strategy in a Prisoner's
Dilemma (defection), when played by both players, is mutually harmful.
Repetition of the Prisoner's Dilemma can give rise to cooperation as an
equilibrium, but defection is as well, and this ambiguity is difficult to
resolve. The numerous behavioral experiments investigating the Prisoner's
Dilemma highlight that players often cooperate, but the level of cooperation
varies significantly with the specifics of the experimental predicament. We
present the first computational model of human behavior in repeated Prisoner's
Dilemma games that unifies the diversity of experimental observations in a
systematic and quantitatively reliable manner. Our model relies on data we
integrated from many experiments, comprising 168,386 individual decisions. The
computational model is composed of two pieces: the first predicts the
first-period action using solely the structural game parameters, while the
second predicts dynamic actions using both game parameters and history of play.
Our model is extremely successful not merely at fitting the data, but in
predicting behavior at multiple scales in experimental designs not used for
calibration, using only information about the game structure. We demonstrate
the power of our approach through a simulation analysis revealing how to best
promote human cooperation.Comment: Added references. New inline citation style. Added small portions of
text. Re-compiled Rmarkdown file with updated ggplot2 so small aesthetic
changes to plot
- ā¦