10,754 research outputs found
Deep Reinforcement Learning from Self-Play in Imperfect-Information Games
Many real-world applications can be described as large-scale games of
imperfect information. To deal with these challenging domains, prior work has
focused on computing Nash equilibria in a handcrafted abstraction of the
domain. In this paper we introduce the first scalable end-to-end approach to
learning approximate Nash equilibria without prior domain knowledge. Our method
combines fictitious self-play with deep reinforcement learning. When applied to
Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium,
whereas common reinforcement learning methods diverged. In Limit Texas Holdem,
a poker game of real-world scale, NFSP learnt a strategy that approached the
performance of state-of-the-art, superhuman algorithms based on significant
domain expertise.Comment: updated version, incorporating conference feedbac
Traditional Wisdom and Monte Carlo Tree Search Face-to-Face in the Card Game Scopone
We present the design of a competitive artificial intelligence for Scopone, a
popular Italian card game. We compare rule-based players using the most
established strategies (one for beginners and two for advanced players) against
players using Monte Carlo Tree Search (MCTS) and Information Set Monte Carlo
Tree Search (ISMCTS) with different reward functions and simulation strategies.
MCTS requires complete information about the game state and thus implements a
cheating player while ISMCTS can deal with incomplete information and thus
implements a fair player. Our results show that, as expected, the cheating MCTS
outperforms all the other strategies; ISMCTS is stronger than all the
rule-based players implementing well-known and most advanced strategies and it
also turns out to be a challenging opponent for human players.Comment: Preprint. Accepted for publication in the IEEE Transaction on Game
Solving Games with Functional Regret Estimation
We propose a novel online learning method for minimizing regret in large
extensive-form games. The approach learns a function approximator online to
estimate the regret for choosing a particular action. A no-regret algorithm
uses these estimates in place of the true regrets to define a sequence of
policies.
We prove the approach sound by providing a bound relating the quality of the
function approximation and regret of the algorithm. A corollary being that the
method is guaranteed to converge to a Nash equilibrium in self-play so long as
the regrets are ultimately realizable by the function approximator. Our
technique can be understood as a principled generalization of existing work on
abstraction in large games; in our work, both the abstraction as well as the
equilibrium are learned during self-play. We demonstrate empirically the method
achieves higher quality strategies than state-of-the-art abstraction techniques
given the same resources.Comment: AAAI Conference on Artificial Intelligence 201
Recommended from our members
What differentiates professional poker players from recreational poker players? A qualitative interview study
The popularity of poker (and in particular online poker) has increasingly grown worldwide in recent years. Some of the factors that may explain this increased popularity
include: (i) an increasing number of celebrities endorsing and playing poker, (ii) an increased number of televised poker tournaments, (iii) 24/7 access of poker via the
internet, and (iv) the low stakes needed to play online poker. This increase in the popularity of poker has led to the increased incidence of the ‘professional poker player’.
However, very little empirical research has been carried out into this relatively new group of gamblers. This research comprised a grounded theory study involving the analysis of data from three professional poker players, one semi-professional poker player and five recreational poker players. Results showed that all players believed that poker was a game of skill. The central theme as to what distinguishes professional poker players from
recreational players was that professional poker players were much more disciplined in their gambling behaviour. They treated their poker playing as work, and as such were
more likely to be logical and controlled in their behaviour, took less risks, and were less
likely to chase losses. Recreational players were more likely to engage in chasing behaviour, showed signs of lack of control, took more risks, and engaged in gambling
while under the influence of alcohol or drugs. Also of importance was the number of games and time spent playing online. Recreational players only played one or two games
at a time, whereas professional poker players were much more likely to engage in multitable poker online, and played longer sessions, thus increasing the potential amount of winnings. Playing poker for a living is very possible for a minority of players but it takes a combination of talent, dedication, patience, discipline and disposition to succeed
- …