7 research outputs found
Simplified three player Kuhn poker
We study a very small three player poker game (one-third street Kuhn poker),
and a simplified version of the game that is interesting because it has three
distinct equilibrium solutions. For one-third street Kuhn poker, we are able to
find all of the equilibrium solutions analytically. For large enough pot size,
, there is a degree of freedom in the solution that allows one player to
transfer profit between the other two players without changing their own
profit. This has potentially interesting consequences in repeated play of the
game. We also show that in a simplified version of the game with , there
is one equilibrium solution if , and three
distinct equilibrium solutions if . This may be the simplest
non-trivial multiplayer poker game with more than one distinct equilibrium
solution and provides us with a test case for theories of dynamic strategy
adjustment over multiple realisations of the game.
We then study a third order system of ordinary differential equations that
models the dynamics of three players who try to maximise their expectation by
continuously varying their betting frequencies. We find that the dynamics of
this system are oscillatory, with two distinct types of solution. We then study
a difference equation model, based on repeated play of the game, in which each
player continually updates their estimates of the other players' betting
frequencies. We find that the dynamics are noisy, but basically oscillatory for
short enough estimation periods and slow enough frequency adjustments, but that
the dynamics can be very different for other parameter values.Comment: 41 pages, 2 Tables, 17 Figure
Safe Opponent Exploitation For Epsilon Equilibrium Strategies
In safe opponent exploitation players hope to exploit their opponents'
potentially sub-optimal strategies while guaranteeing at least the value of the
game in expectation for themselves. Safe opponent exploitation algorithms have
been successfully applied to small instances of two-player zero-sum imperfect
information games, where Nash equilibrium strategies are typically known in
advance. Current methods available to compute these strategies are however not
scalable to desirable large domains of imperfect information such as No-Limit
Texas Hold 'em (NLHE) poker, where successful agents rely on game abstractions
in order to compute an equilibrium strategy approximation. This paper will
extend the concept of safe opponent exploitation by introducing prime-safe
opponent exploitation, in which we redefine the value of the game of a player
to be the worst-case payoff their strategy could be susceptible to. This allows
weaker epsilon equilibrium strategies to benefit from utilising a form of
opponent exploitation with our revised value of the game, still allowing for a
practical game-theoretical guaranteed lower-bound. We demonstrate the empirical
advantages of our generalisation when applied to the main safe opponent
exploitation algorithms
Autonomous Agents Modelling Other Agents: A Comprehensive Survey and Open Problems
Much research in artificial intelligence is concerned with the development of
autonomous agents that can interact effectively with other agents. An important
aspect of such agents is the ability to reason about the behaviours of other
agents, by constructing models which make predictions about various properties
of interest (such as actions, goals, beliefs) of the modelled agents. A variety
of modelling approaches now exist which vary widely in their methodology and
underlying assumptions, catering to the needs of the different sub-communities
within which they were developed and reflecting the different practical uses
for which they are intended. The purpose of the present article is to provide a
comprehensive survey of the salient modelling methods which can be found in the
literature. The article concludes with a discussion of open problems which may
form the basis for fruitful future research.Comment: Final manuscript (46 pages), published in Artificial Intelligence
Journal. The arXiv version also contains a table of contents after the
abstract, but is otherwise identical to the AIJ version. Keywords: autonomous
agents, multiagent systems, modelling other agents, opponent modellin
Reinforcement Learning from Self-Play in Imperfect-Information Games
This thesis investigates artificial agents learning to make strategic decisions in imperfect-information games. In particular, we introduce a novel approach to reinforcement learning from self-play. We introduce Smooth UCT, which combines the game-theoretic notion of fictitious play with Monte Carlo Tree Search (MCTS). Smooth UCT outperformed a classic MCTS method in several imperfect-information poker games and won three silver medals in the 2014 Annual Computer Poker Competition. We develop Extensive-Form Fictitious Play (XFP) that is entirely implemented in sequential strategies, thus extending this prominent game-theoretic model of learning to sequential games. XFP provides a principled foundation for self-play reinforcement learning in imperfect-information games. We introduce Fictitious Self-Play (FSP), a class of sample-based reinforcement learning algorithms that approximate XFP. We instantiate FSP with neuralnetwork function approximation and deep learning techniques, producing Neural FSP (NFSP). We demonstrate that (approximate) Nash equilibria and their representations (abstractions) can be learned using NFSP end to end, i.e. interfacing with the raw inputs and outputs of the domain. NFSP approached the performance of state-of-the-art, superhuman algorithms in Limit Texas Hold’em - an imperfect-information game at the absolute limit of tractability using massive computational resources. This is the first time that any reinforcement learning algorithm, learning solely from game outcomes without prior domain knowledge, achieved such a feat