Search CORE

614 research outputs found

Reinforcement Learning via AIXI Approximation

Author: Hutter Marcus
Ng Kee Siong
Silver David
Veness Joel
Publication venue
Publication date: 01/01/2010
Field of study

This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. This approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI could motivate the design of practical algorithms. We answer this hitherto open question in the affirmative, by providing the first computationally feasible approximation to the AIXI agent. To develop our approximation, we introduce a Monte Carlo Tree Search algorithm along with an agent-specific extension of the Context Tree Weighting algorithm. Empirically, we present a set of encouraging results on a number of stochastic, unknown, and partially observable domains.Comment: 8 LaTeX pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

UCL Discovery

The Australian National University

Association for the Advancement of Artificial Intelligence: AAAI Publications

Safe Opponent Exploitation For Epsilon Equilibrium Strategies

Author: Jeary Linus
Turrini Paolo
Publication venue
Publication date: 23/07/2023
Field of study

In safe opponent exploitation players hope to exploit their opponents' potentially sub-optimal strategies while guaranteeing at least the value of the game in expectation for themselves. Safe opponent exploitation algorithms have been successfully applied to small instances of two-player zero-sum imperfect information games, where Nash equilibrium strategies are typically known in advance. Current methods available to compute these strategies are however not scalable to desirable large domains of imperfect information such as No-Limit Texas Hold 'em (NLHE) poker, where successful agents rely on game abstractions in order to compute an equilibrium strategy approximation. This paper will extend the concept of safe opponent exploitation by introducing prime-safe opponent exploitation, in which we redefine the value of the game of a player to be the worst-case payoff their strategy could be susceptible to. This allows weaker epsilon equilibrium strategies to benefit from utilising a form of opponent exploitation with our revised value of the game, still allowing for a practical game-theoretical guaranteed lower-bound. We demonstrate the empirical advantages of our generalisation when applied to the main safe opponent exploitation algorithms

arXiv.org e-Print Archive

Simplified three player Kuhn poker

Author: Billingham John
Publication venue
Publication date: 25/04/2017
Field of study

We study a very small three player poker game (one-third street Kuhn poker), and a simplified version of the game that is interesting because it has three distinct equilibrium solutions. For one-third street Kuhn poker, we are able to find all of the equilibrium solutions analytically. For large enough pot size,

P

, there is a degree of freedom in the solution that allows one player to transfer profit between the other two players without changing their own profit. This has potentially interesting consequences in repeated play of the game. We also show that in a simplified version of the game with

P>5

, there is one equilibrium solution if

5 < P < P^* \equiv (5+\sqrt{73})/2

, and three distinct equilibrium solutions if

P > P^*

. This may be the simplest non-trivial multiplayer poker game with more than one distinct equilibrium solution and provides us with a test case for theories of dynamic strategy adjustment over multiple realisations of the game. We then study a third order system of ordinary differential equations that models the dynamics of three players who try to maximise their expectation by continuously varying their betting frequencies. We find that the dynamics of this system are oscillatory, with two distinct types of solution. We then study a difference equation model, based on repeated play of the game, in which each player continually updates their estimates of the other players' betting frequencies. We find that the dynamics are noisy, but basically oscillatory for short enough estimation periods and slow enough frequency adjustments, but that the dynamics can be very different for other parameter values.Comment: 41 pages, 2 Tables, 17 Figure

arXiv.org e-Print Archive

Nottingham eTheses

Poker as a Domain of Expertise

Author: Cowley Benjamin Ultan
Laakasuo Michael
Lappi Otto
Palomäki Jussi
Publication venue
Publication date: 01/06/2020
Field of study

Poker is a game of skill and chance involving economic decision-making under uncertainty. It is also a complex but well-defined real-world environment with a clear rule-structure. As such, poker has strong potential as a model system for studying high-stakes, high-risk expert performance. Poker has been increasingly used as a tool to study decision-making and learning, as well as emotion self-regulation. In this review, we discuss how these studies have begun to inform us about the interaction between emotions and technical skill, and how expertise develops and depends on these two factors. Expertise in poker critically requires both mastery of the technical aspects of the game, and proficiency in emotion regulation; poker thus offers a good environment for studying these skills in controlled experimental settings of high external validity.We conclude by suggesting ideas for future research on expertise, with new insights provided by poker.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

A Survey of Monte Carlo Tree Search Methods

Author: Browne Cameron B
Colton Simon
Cowling Peter I
Lucas Simon M
Perez Diego
Powley Edward
Rohlfshagen Philipp
Samothrakis Spyridon
Tavener Stephen
Whitehouse Daniel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work

University of Essex Research Repository

CiteSeerX

Maastricht University Research Portal

Recommended from our members

Opponent modeling and exploitation in poker using evolved recurrent neural networks

Author: Li Xun, Ph. D. in computer sciences
Publication venue
Publication date: 03/10/2018
Field of study

As a classic example of imperfect information games, poker, in particular, Heads-Up No-Limit Texas Holdem (HUNL), has been studied extensively in recent years. A number of computer poker agents have been built with increasingly higher quality. While agents based on approximated Nash equilibrium have been successful, they lack the ability to exploit their opponents effectively. In addition, the performance of equilibrium strategies cannot be guaranteed in games with more than two players and multiple Nash equilibria. This dissertation focuses on devising an evolutionary method to discover opponent models based on recurrent neural networks. A series of computer poker agents called Adaptive System for Hold’Em (ASHE) were evolved for HUNL. ASHE models the opponent explicitly using Pattern Recognition Trees (PRTs) and LSTM estimators. The default and board-texture-based PRTs maintain statistical data on the opponent strategies at different game states. The Opponent Action Rate Estimator predicts the opponent’s moves, and the Hand Range Estimator evaluates the showdown value of ASHE’s hand. Recursive Utility Estimation is used to evaluate the expected utility/reward for each available action. Experimental results show that (1) ASHE exploits opponents with high to moderate level of exploitability more effectively than Nash-equilibrium-based agents, and (2) ASHE can defeat top-ranking equilibrium-based poker agents. Thus, the dissertation introduces an effective new method to building high-performance computer agents for poker and other imperfect information games. It also provides a promising direction for future research in imperfect information games beyond the equilibrium-based approach.Computer Science

Texas ScholarWorks

Using a high-level language to build a poker playing agent

Author: Cruz Nuno Pedro Silva da
Publication venue
Publication date: 01/01/2009
Field of study

Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 200

CiteSeerX

Repositório Aberto da Universidade do Porto

Dynamic Opponent Modelling in Two-Player Games

Author: Mealing Richard
Publication venue
Publication date: 01/08/2015
Field of study

The University of Manchester - Institutional Repository

Opponent Modelling in Multi-Agent Systems

Author: Tian Zheng
Publication venue: UCL (University College London)
Publication date: 28/11/2021
Field of study

Reinforcement Learning (RL) formalises a problem where an intelligent agent needs to learn and achieve certain goals by maximising a long-term return in an environment. Multi-agent reinforcement learning (MARL) extends traditional RL to multiple agents. Many RL algorithms lose convergence guarantee in non-stationary environments due to the adaptive opponents. Partial observation caused by agents’ different private observations introduces high variance during the training which exacerbates the data inefficiency. In MARL, training an agent to perform well against a set of opponents often leads to bad performance against another set of opponents. Non-stationarity, partial observation and unclear learning objective are three critical problems in MARL which hinder agents’ learning and they all share a cause which is the lack of knowledge of the other agents. Therefore, in this thesis, we propose to solve these problems with opponent modelling methods. We tailor our solutions by combining opponent modelling with other techniques according to the characteristics of problems we face. Specifically, we first propose ROMMEO, an algorithm inspired by Bayesian inference, as a solution to alleviate the non-stationarity in cooperative games. Then we study the partial observation problem caused by agents’ private observation and design an implicit communication training method named PBL. Lastly, we investigate solutions to the non-stationarity and unclear learning objective problems in zero-sum games. We propose a solution named EPSOM which aims for finding safe exploitation strategies to play against non-stationary opponents. We verify our proposed methods by varied experiments and show they can achieve the desired performance. Limitations and future works are discussed in the last chapter of this thesis

UCL Discovery