614 research outputs found
Reinforcement Learning via AIXI Approximation
This paper introduces a principled approach for the design of a scalable
general reinforcement learning agent. This approach is based on a direct
approximation of AIXI, a Bayesian optimality notion for general reinforcement
learning agents. Previously, it has been unclear whether the theory of AIXI
could motivate the design of practical algorithms. We answer this hitherto open
question in the affirmative, by providing the first computationally feasible
approximation to the AIXI agent. To develop our approximation, we introduce a
Monte Carlo Tree Search algorithm along with an agent-specific extension of the
Context Tree Weighting algorithm. Empirically, we present a set of encouraging
results on a number of stochastic, unknown, and partially observable domains.Comment: 8 LaTeX pages, 1 figur
Safe Opponent Exploitation For Epsilon Equilibrium Strategies
In safe opponent exploitation players hope to exploit their opponents'
potentially sub-optimal strategies while guaranteeing at least the value of the
game in expectation for themselves. Safe opponent exploitation algorithms have
been successfully applied to small instances of two-player zero-sum imperfect
information games, where Nash equilibrium strategies are typically known in
advance. Current methods available to compute these strategies are however not
scalable to desirable large domains of imperfect information such as No-Limit
Texas Hold 'em (NLHE) poker, where successful agents rely on game abstractions
in order to compute an equilibrium strategy approximation. This paper will
extend the concept of safe opponent exploitation by introducing prime-safe
opponent exploitation, in which we redefine the value of the game of a player
to be the worst-case payoff their strategy could be susceptible to. This allows
weaker epsilon equilibrium strategies to benefit from utilising a form of
opponent exploitation with our revised value of the game, still allowing for a
practical game-theoretical guaranteed lower-bound. We demonstrate the empirical
advantages of our generalisation when applied to the main safe opponent
exploitation algorithms
Simplified three player Kuhn poker
We study a very small three player poker game (one-third street Kuhn poker),
and a simplified version of the game that is interesting because it has three
distinct equilibrium solutions. For one-third street Kuhn poker, we are able to
find all of the equilibrium solutions analytically. For large enough pot size,
, there is a degree of freedom in the solution that allows one player to
transfer profit between the other two players without changing their own
profit. This has potentially interesting consequences in repeated play of the
game. We also show that in a simplified version of the game with , there
is one equilibrium solution if , and three
distinct equilibrium solutions if . This may be the simplest
non-trivial multiplayer poker game with more than one distinct equilibrium
solution and provides us with a test case for theories of dynamic strategy
adjustment over multiple realisations of the game.
We then study a third order system of ordinary differential equations that
models the dynamics of three players who try to maximise their expectation by
continuously varying their betting frequencies. We find that the dynamics of
this system are oscillatory, with two distinct types of solution. We then study
a difference equation model, based on repeated play of the game, in which each
player continually updates their estimates of the other players' betting
frequencies. We find that the dynamics are noisy, but basically oscillatory for
short enough estimation periods and slow enough frequency adjustments, but that
the dynamics can be very different for other parameter values.Comment: 41 pages, 2 Tables, 17 Figure
Poker as a Domain of Expertise
Poker is a game of skill and chance involving economic decision-making under uncertainty. It is also a complex but well-defined real-world environment with a clear rule-structure. As such, poker has strong potential as a model system for studying high-stakes, high-risk expert performance. Poker has been increasingly used as a tool to study decision-making and learning, as well as emotion self-regulation. In this review, we discuss how these studies have begun to inform us about the interaction between emotions and technical skill, and how expertise develops and depends on these two factors. Expertise in poker critically requires both mastery of the technical aspects of the game, and proficiency in emotion regulation; poker thus offers a good environment for studying these skills in controlled experimental settings of high external validity.We conclude by suggesting ideas for future research on expertise, with new insights provided by poker.Peer reviewe
A Survey of Monte Carlo Tree Search Methods
Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work
Recommended from our members
Opponent modeling and exploitation in poker using evolved recurrent neural networks
As a classic example of imperfect information games, poker, in particular, Heads-Up No-Limit Texas Holdem (HUNL), has been studied extensively in recent years. A number of computer poker agents have been built with increasingly higher quality. While agents based on approximated Nash equilibrium have been successful, they lack the ability to exploit their opponents effectively. In addition, the performance of equilibrium strategies cannot be guaranteed in games with more than two players and multiple Nash equilibria. This dissertation focuses on devising an evolutionary method to discover opponent models based on recurrent neural networks.
A series of computer poker agents called Adaptive System for Hold’Em (ASHE) were evolved for HUNL. ASHE models the opponent explicitly using Pattern Recognition Trees (PRTs) and LSTM estimators. The default and board-texture-based PRTs maintain statistical data on the opponent strategies at different game states. The Opponent Action Rate Estimator predicts the opponent’s moves, and the Hand Range Estimator evaluates the showdown value of ASHE’s hand. Recursive Utility Estimation is used to evaluate the expected utility/reward for each available action.
Experimental results show that (1) ASHE exploits opponents with high to moderate level of exploitability more effectively than Nash-equilibrium-based agents, and (2) ASHE can defeat top-ranking equilibrium-based poker agents. Thus, the dissertation introduces an effective new method to building high-performance computer agents for poker and other imperfect information games. It also provides a promising direction for future research in imperfect information games beyond the equilibrium-based approach.Computer Science
Using a high-level language to build a poker playing agent
Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 200
Opponent Modelling in Multi-Agent Systems
Reinforcement Learning (RL) formalises a problem where an intelligent agent needs to learn and achieve certain goals by maximising a long-term return in an environment. Multi-agent reinforcement learning (MARL) extends traditional RL to multiple agents. Many RL algorithms lose convergence guarantee in non-stationary environments due to the adaptive opponents. Partial observation caused by agents’ different private observations introduces high variance during the training which exacerbates the data inefficiency. In MARL, training an agent to perform well against a set of opponents often leads to bad performance against another set of opponents. Non-stationarity, partial observation and unclear learning objective are three critical problems in MARL which hinder agents’ learning and they all share a cause which is the lack of knowledge of the other agents. Therefore, in this thesis, we propose to solve these problems with opponent modelling methods. We tailor our solutions by combining opponent modelling with other techniques according to the characteristics of problems we face. Specifically, we first propose ROMMEO, an algorithm inspired by Bayesian inference, as a solution to alleviate the non-stationarity in cooperative games. Then we study the partial observation problem caused by agents’ private observation and design an implicit communication training method named PBL. Lastly, we investigate solutions to the non-stationarity and unclear learning objective problems in zero-sum games. We propose a solution named EPSOM which aims for finding safe exploitation strategies to play against non-stationary opponents. We verify our proposed methods by varied experiments and show they can achieve the desired performance. Limitations and future works are discussed in the last chapter of this thesis
- …