Search CORE

7 research outputs found

Simplified three player Kuhn poker

Author: Billingham John
Publication venue
Publication date: 25/04/2017
Field of study

We study a very small three player poker game (one-third street Kuhn poker), and a simplified version of the game that is interesting because it has three distinct equilibrium solutions. For one-third street Kuhn poker, we are able to find all of the equilibrium solutions analytically. For large enough pot size,

P

, there is a degree of freedom in the solution that allows one player to transfer profit between the other two players without changing their own profit. This has potentially interesting consequences in repeated play of the game. We also show that in a simplified version of the game with

P>5

, there is one equilibrium solution if

5 < P < P^* \equiv (5+\sqrt{73})/2

, and three distinct equilibrium solutions if

P > P^*

. This may be the simplest non-trivial multiplayer poker game with more than one distinct equilibrium solution and provides us with a test case for theories of dynamic strategy adjustment over multiple realisations of the game. We then study a third order system of ordinary differential equations that models the dynamics of three players who try to maximise their expectation by continuously varying their betting frequencies. We find that the dynamics of this system are oscillatory, with two distinct types of solution. We then study a difference equation model, based on repeated play of the game, in which each player continually updates their estimates of the other players' betting frequencies. We find that the dynamics are noisy, but basically oscillatory for short enough estimation periods and slow enough frequency adjustments, but that the dynamics can be very different for other parameter values.Comment: 41 pages, 2 Tables, 17 Figure

arXiv.org e-Print Archive

Nottingham eTheses

Safe Opponent Exploitation For Epsilon Equilibrium Strategies

Author: Jeary Linus
Turrini Paolo
Publication venue
Publication date: 23/07/2023
Field of study

In safe opponent exploitation players hope to exploit their opponents' potentially sub-optimal strategies while guaranteeing at least the value of the game in expectation for themselves. Safe opponent exploitation algorithms have been successfully applied to small instances of two-player zero-sum imperfect information games, where Nash equilibrium strategies are typically known in advance. Current methods available to compute these strategies are however not scalable to desirable large domains of imperfect information such as No-Limit Texas Hold 'em (NLHE) poker, where successful agents rely on game abstractions in order to compute an equilibrium strategy approximation. This paper will extend the concept of safe opponent exploitation by introducing prime-safe opponent exploitation, in which we redefine the value of the game of a player to be the worst-case payoff their strategy could be susceptible to. This allows weaker epsilon equilibrium strategies to benefit from utilising a form of opponent exploitation with our revised value of the game, still allowing for a practical game-theoretical guaranteed lower-bound. We demonstrate the empirical advantages of our generalisation when applied to the main safe opponent exploitation algorithms

arXiv.org e-Print Archive

Autonomous Agents Modelling Other Agents: A Comprehensive Survey and Open Problems

Author: Abdul-Rahman
Ahmadi
Albrecht
Albrecht
Albrecht
Albrecht
Albrecht
Albrecht
Albrecht
Albrecht
Albrecht
Albrecht
Albrecht
Albrecht
Alonso
Anderson
Aumann
Avrahami-Zilberbrand
Avrahami-Zilberbrand
Avrahami-Zilberbrand
Baarslag
Baker
Baker
Baker
Bakkes
Banerjee
Banerjee
Banerjee
Bard
Bard
Barrett
Barrett
Barrett
Baré
Bellman
Bengio
Billings
Blaylock
Blaylock
Blaylock
Bloembergen
Bolander
Bombini
Borck
Boutilier
Boutilier
Bowling
Bowling
Bowling
Boyen
Brown
Browne
Buehler
Bui
Busoniu
Cadilhac
Camerer
Camerer
Campbell
Carberry
Carmel
Carmel
Carmel
Carmel
Carmel
Carmel
Carmel
Chajewska
Chajewska
Chakraborty
Chakraborty
Chalkiadakis
Chaloner
Chandrasekaran
Charniak
Claus
Coehoorn
Cohen
Cohen
Conitzer
Cortes
Crandall
Dasgupta
Davidson
Davison
de Farias
de Weerd
de Weerd
Dean
Dekel
Denzinger
Doshi
Doshi
Doshi
Doshi
Doucet
Erdogan
Fagan
Fagundes
Fern
Fikes
Foster
Foster
Fredkin
Fudenberg
Fürnkranz
Gal
Gal
Gal
Gal
Ganzfried
Geib
Geib
Geib
Geib
Ghaderi
Gmytrasiewicz
Gmytrasiewicz
Gmytrasiewicz
Gmytrasiewicz
Gmytrasiewicz
Gold
Gold
Goodie
Grosz
Grosz
Guerra-Hernández
Hammond
Harsanyi
Harsanyi
Harsanyi
Harsanyi
Hart
Hausknecht
Hawasly
He
Hedden
Hernandez-Leal
Hernandez-Leal
Hindriks
Hoang
Hoehn
Hong
Hong
Horst
Howard
Howard
Howard
Hsieh
Huynh
Iglesias
Iglesias
Iida
Iida
Iida
Illobre
Jarvis
Jensen
Jensen
Johanson
Johanson
Kaelbling
Kalai
Kaminka
Kaminka
Karpinskyj
Kautz
Kearns
Keren
Keren
Keren
Kerkez
Kitano
Kocsis
Koller
Koller
Kolodner
Kominis
Kuhlmann
La Mura
Lasota
Lattner
Laviers
Ledezma
Lesh
Litman
Lockett
Löwe
Markovitch
McCalla
McCarthy
McCarthy
McCracken
McTear
Mealing
Milch
Millington
Miorandi
Mor
Muggleton
Mui
Muise
Myerson
Nachbar
Nash
Ng
Ng
Nguyen
Nielsen
Nyarko
Oh
Olorunleke
Panait
Panella
Pearl
Peter Stone
Pinyol
Pitt
Pollack
Pourmehr
Powers
Pynadath
Ramchurn
Ramırez
Ramírez
Ramírez
Rathnasabapathy
Reibman
Riley
Riley
Rovatsos
Royer
Rubin
Sabater
Sadigh
Saria
Schadd
Schillo
Schmid
Schmidt
Sen
Sen
Settles
Shachter
Silver
Singh
Sohrabi
Sondik
Sonu
Southey
Spronck
Stefano V. Albrecht
Steffens
Steffens
Steffens
Stone
Stone
Stone
Stone
Sukthankar
Sukthankar
Sukthankar
Suryadi
Synnaeve
Takahashi
Tambe
Tambe
Tambe
Tambe
Tian
Tuyls
van den Herik
Van Der Hoek
Veloso
Vered
Vickrey
Vidal
Visser
Von Neumann
Wang
Watkins
Wayllace
Weber
Wilks
Wright
Yoshida
Yu
Zeng
Zhuo
Zhuo
Zukerman
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Much research in artificial intelligence is concerned with the development of autonomous agents that can interact effectively with other agents. An important aspect of such agents is the ability to reason about the behaviours of other agents, by constructing models which make predictions about various properties of interest (such as actions, goals, beliefs) of the modelled agents. A variety of modelling approaches now exist which vary widely in their methodology and underlying assumptions, catering to the needs of the different sub-communities within which they were developed and reflecting the different practical uses for which they are intended. The purpose of the present article is to provide a comprehensive survey of the salient modelling methods which can be found in the literature. The article concludes with a discussion of open problems which may form the basis for fruitful future research.Comment: Final manuscript (46 pages), published in Artificial Intelligence Journal. The arXiv version also contains a table of contents after the abstract, but is otherwise identical to the AIJ version. Keywords: autonomous agents, multiagent systems, modelling other agents, opponent modellin

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Reinforcement Learning from Self-Play in Imperfect-Information Games

Author: Heinrich J
Publication venue: UCL (University College London)
Publication date: 28/04/2017
Field of study

This thesis investigates artificial agents learning to make strategic decisions in imperfect-information games. In particular, we introduce a novel approach to reinforcement learning from self-play. We introduce Smooth UCT, which combines the game-theoretic notion of fictitious play with Monte Carlo Tree Search (MCTS). Smooth UCT outperformed a classic MCTS method in several imperfect-information poker games and won three silver medals in the 2014 Annual Computer Poker Competition. We develop Extensive-Form Fictitious Play (XFP) that is entirely implemented in sequential strategies, thus extending this prominent game-theoretic model of learning to sequential games. XFP provides a principled foundation for self-play reinforcement learning in imperfect-information games. We introduce Fictitious Self-Play (FSP), a class of sample-based reinforcement learning algorithms that approximate XFP. We instantiate FSP with neuralnetwork function approximation and deep learning techniques, producing Neural FSP (NFSP). We demonstrate that (approximate) Nash equilibria and their representations (abstractions) can be learned using NFSP end to end, i.e. interfacing with the raw inputs and outputs of the domain. NFSP approached the performance of state-of-the-art, superhuman algorithms in Limit Texas Hold’em - an imperfect-information game at the absolute limit of tractability using massive computational resources. This is the first time that any reinforcement learning algorithm, learning solely from game outcomes without prior domain knowledge, achieved such a feat

UCL Discovery