Search CORE

92 research outputs found

Playing Cassino with Reinforcement Learning

Author: Yong Edmund
Publication venue: Digital Commons @ East Tennessee State University
Publication date: 01/05/2022
Field of study

Reinforcement learning algorithms have been used to create game-playing agents for various games—mostly, deterministic games such as chess, shogi, and Go. This study used Deep-Q reinforcement learning to create an agent that plays a non-deterministic card game, Cassino. This agent’s performance was compared against the performance of a Cassino mobile app. Results showed that the trained models did not perform well and had trouble training around build actions which are important in Cassino. Future research could experiment with other reinforcement learning algorithms to see if they are better at training around build actions

East Tennessee State University

Benelearn 2005: Annual Machine Learning Conference of Belgium and the Netherlands:CTIT Proceedings of the 14th annual Machine Learning Conference of Belgium and the Netherlands

Author
Publication venue: Centre for Telematics and Information Technology (CTIT)
Publication date: 01/02/2005
Field of study

University of Twente Research Information

Using a high-level language to build a poker playing agent

Author: Cruz Nuno Pedro Silva da
Publication venue
Publication date: 01/01/2009
Field of study

Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 200

CiteSeerX

Repositório Aberto da Universidade do Porto

Opponent Modelling in Multi-Agent Systems

Author: Tian Zheng
Publication venue: UCL (University College London)
Publication date: 28/11/2021
Field of study

Reinforcement Learning (RL) formalises a problem where an intelligent agent needs to learn and achieve certain goals by maximising a long-term return in an environment. Multi-agent reinforcement learning (MARL) extends traditional RL to multiple agents. Many RL algorithms lose convergence guarantee in non-stationary environments due to the adaptive opponents. Partial observation caused by agents’ different private observations introduces high variance during the training which exacerbates the data inefficiency. In MARL, training an agent to perform well against a set of opponents often leads to bad performance against another set of opponents. Non-stationarity, partial observation and unclear learning objective are three critical problems in MARL which hinder agents’ learning and they all share a cause which is the lack of knowledge of the other agents. Therefore, in this thesis, we propose to solve these problems with opponent modelling methods. We tailor our solutions by combining opponent modelling with other techniques according to the characteristics of problems we face. Specifically, we first propose ROMMEO, an algorithm inspired by Bayesian inference, as a solution to alleviate the non-stationarity in cooperative games. Then we study the partial observation problem caused by agents’ private observation and design an implicit communication training method named PBL. Lastly, we investigate solutions to the non-stationarity and unclear learning objective problems in zero-sum games. We propose a solution named EPSOM which aims for finding safe exploitation strategies to play against non-stationary opponents. We verify our proposed methods by varied experiments and show they can achieve the desired performance. Limitations and future works are discussed in the last chapter of this thesis

UCL Discovery

Recommended from our members

Opponent modeling and exploitation in poker using evolved recurrent neural networks

Author: Li Xun, Ph. D. in computer sciences
Publication venue
Publication date: 03/10/2018
Field of study

As a classic example of imperfect information games, poker, in particular, Heads-Up No-Limit Texas Holdem (HUNL), has been studied extensively in recent years. A number of computer poker agents have been built with increasingly higher quality. While agents based on approximated Nash equilibrium have been successful, they lack the ability to exploit their opponents effectively. In addition, the performance of equilibrium strategies cannot be guaranteed in games with more than two players and multiple Nash equilibria. This dissertation focuses on devising an evolutionary method to discover opponent models based on recurrent neural networks. A series of computer poker agents called Adaptive System for Hold’Em (ASHE) were evolved for HUNL. ASHE models the opponent explicitly using Pattern Recognition Trees (PRTs) and LSTM estimators. The default and board-texture-based PRTs maintain statistical data on the opponent strategies at different game states. The Opponent Action Rate Estimator predicts the opponent’s moves, and the Hand Range Estimator evaluates the showdown value of ASHE’s hand. Recursive Utility Estimation is used to evaluate the expected utility/reward for each available action. Experimental results show that (1) ASHE exploits opponents with high to moderate level of exploitability more effectively than Nash-equilibrium-based agents, and (2) ASHE can defeat top-ranking equilibrium-based poker agents. Thus, the dissertation introduces an effective new method to building high-performance computer agents for poker and other imperfect information games. It also provides a promising direction for future research in imperfect information games beyond the equilibrium-based approach.Computer Science

Texas ScholarWorks

Game theoretic modeling and analysis : A co-evolutionary, agent-based approach

Author: QUEK HAN YANG
Publication venue
Publication date: 27/07/2009
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Evolutionary Multiagent Transfer Learning With Model-Based Opponent Behavior Prediction

Author: Hou Yaqing
Ong Yew-soon
Tang Jing
Zeng Yifeng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/12/2019
Field of study

This article embarks a study on multiagent transfer learning (TL) for addressing the specific challenges that arise in complex multiagent systems where agents have different or even competing objectives. Specifically, beyond the essential backbone of a state-of-the-art evolutionary TL framework (eTL), this article presents the novel TL framework with prediction (eTL-P) as an upgrade over existing eTL to endow agents with abilities to interact with their opponents effectively by building candidate models and accordingly predicting their behavioral strategies. To reduce the complexity of candidate models, eTL-P constructs a monotone submodular function, which facilitates to select Top-K models from all available candidate models based on their representativeness in terms of behavioral coverage as well as reward diversity. eTL-P also integrates social selection mechanisms for agents to identify their better-performing partners, thus improving their learning performance and reducing the complexity of behavior prediction by reusing useful knowledge with respect to their partners' mind universes. Experiments based on a partner-opponent minefield navigation task (PO-MNT) have shown that eTL-P exhibits the superiority in achieving higher learning capability and efficiency of multiple agents when compared to the state-of-the-art multiagent TL approaches

Northumbria Research Link

Teeside University's Research Repository