Search CORE

369 research outputs found

Simulation of a Texas Hold'Em poker player

Author: Bosilj P
Palašek P
Popović B
Štefić D
Publication venue
Publication date: 23/05/2011
Field of study

Copyright 2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. This is the accepted version of the article. The published version is available at

Queen Mary Research Online

Multiparty Dynamics and Failure Modes for Machine Learning and Artificial Intelligence

Author: Manheim David
Publication venue: 'MDPI AG'
Publication date: 01/04/2019
Field of study

An important challenge for safety in machine learning and artificial intelligence systems is a~set of related failures involving specification gaming, reward hacking, fragility to distributional shifts, and Goodhart's or Campbell's law. This paper presents additional failure modes for interactions within multi-agent systems that are closely related. These multi-agent failure modes are more complex, more problematic, and less well understood than the single-agent case, and are also already occurring, largely unnoticed. After motivating the discussion with examples from poker-playing artificial intelligence (AI), the paper explains why these failure modes are in some senses unavoidable. Following this, the paper categorizes failure modes, provides definitions, and cites examples for each of the modes: accidental steering, coordination failures, adversarial misalignment, input spoofing and filtering, and goal co-option or direct hacking. The paper then discusses how extant literature on multi-agent AI fails to address these failure modes, and identifies work which may be useful for the mitigation of these failure modes.Comment: 12 Pages, This version re-submitted to Big Data and Cognitive Computing, Special Issue "Artificial Superintelligence: Coordination & Strategy

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Traditional Wisdom and Monte Carlo Tree Search Face-to-Face in the Card Game Scopone

Author: Di Palma Stefano
Lanzi Pier Luca
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

We present the design of a competitive artificial intelligence for Scopone, a popular Italian card game. We compare rule-based players using the most established strategies (one for beginners and two for advanced players) against players using Monte Carlo Tree Search (MCTS) and Information Set Monte Carlo Tree Search (ISMCTS) with different reward functions and simulation strategies. MCTS requires complete information about the game state and thus implements a cheating player while ISMCTS can deal with incomplete information and thus implements a fair player. Our results show that, as expected, the cheating MCTS outperforms all the other strategies; ISMCTS is stronger than all the rule-based players implementing well-known and most advanced strategies and it also turns out to be a challenging opponent for human players.Comment: Preprint. Accepted for publication in the IEEE Transaction on Game

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Recommended from our members

Bayesian opponent modeling in adversarial game environments.

Author: Baker Roderick J.S.
Publication venue: Department of Computing
Publication date: 01/01/2010
Field of study

This thesis investigates the use of Bayesian analysis upon an opponent¿s behaviour in order to determine the desired goals or strategy used by a given adversary. A terrain analysis approach utilising the A* algorithm is investigated, where a probability distribution between discrete behaviours of an opponent relative to a set of possible goals is generated. The Bayesian analysis of agent behaviour accurately determines the intended goal of an opponent agent, even when the opponent¿s actions are altered randomly. The environment of Poker is introduced and abstracted for ease of analysis. Bayes¿ theorem is used to generate an effective opponent model, categorizing behaviour according to its similarity with known styles of opponent. The accuracy of Bayes¿ rule yields a notable improvement in the performance of an agent once an opponent¿s style is understood. A hybrid of the Bayesian style predictor and a neuroevolutionary approach is shown to lead to effective dynamic play, in comparison to agents that do not use an opponent model. The use of recurrence in evolved networks is also shown to improve the performance and generalizability of an agent in a multiplayer environment. These strategies are then employed in the full-scale environment of Texas Hold¿em, where a betting round-based approach proves useful in determining and counteracting an opponent¿s play. It is shown that the use of opponent models, with the adaptive benefits of neuroevolution aid the performance of an agent, even when the behaviour of an opponent does not necessarily fit within the strict definitions of opponent ¿style¿.Engineering and Physical Sciences Research Council (EPSRC

Bradford Scholars

POKERFACE: EMOTION BASED GAME-PLAY TECHNIQUES FOR COMPUTER POKER PLAYERS

Author: Cockerham Lucas
Publication venue: UKnowledge
Publication date: 01/01/2004
Field of study

Numerous algorithms/methods exist for creating computer poker players. This thesis comparesand contrasts them. A set of poker agents for the system PokerFace are then introduced. A surveyof the problem of facial expression recognition is included in the hopes it may be used to build abetter computer poker player

University of Kentucky

Towards a theory of deception

Author: David Ettinger
Philippe Jehiel
Publication venue
Publication date
Field of study

This paper proposes an equilibrium approach to deception where deception is defined to be the process by which actions are chosen to induce erroneous inferences so as to take advantage of them. Specifically, we introduce a framework with boundedly rational players in which agents make inferences based on a coarse information about others' behaviors: Agents are assumed to know only the average reaction function of other agents over groups of situations. Equilibrium requires that the coarse information available to agents is correct, and that inferences and optimizations are made based on the simplest theories compatible with the available information. We illustrate the phenomenon of deception and how reputation concerns may arise even in zero-sum games in which there is no value to commitment. We further illustrate how the possibility of deception affects standard economic insights through a number of stylized applications including a monitoring game and two simple bargaining games. The approach can be viewed as formalizing into a game theoretic setting a well documented bias in social psychology, the Fundamental Attribution Error.deception ; game theory ; fundamental attribution error

Research Papers in Economics

Opponent Modelling in Multi-Agent Systems

Author: Tian Zheng
Publication venue: UCL (University College London)
Publication date: 28/11/2021
Field of study

Reinforcement Learning (RL) formalises a problem where an intelligent agent needs to learn and achieve certain goals by maximising a long-term return in an environment. Multi-agent reinforcement learning (MARL) extends traditional RL to multiple agents. Many RL algorithms lose convergence guarantee in non-stationary environments due to the adaptive opponents. Partial observation caused by agents’ different private observations introduces high variance during the training which exacerbates the data inefficiency. In MARL, training an agent to perform well against a set of opponents often leads to bad performance against another set of opponents. Non-stationarity, partial observation and unclear learning objective are three critical problems in MARL which hinder agents’ learning and they all share a cause which is the lack of knowledge of the other agents. Therefore, in this thesis, we propose to solve these problems with opponent modelling methods. We tailor our solutions by combining opponent modelling with other techniques according to the characteristics of problems we face. Specifically, we first propose ROMMEO, an algorithm inspired by Bayesian inference, as a solution to alleviate the non-stationarity in cooperative games. Then we study the partial observation problem caused by agents’ private observation and design an implicit communication training method named PBL. Lastly, we investigate solutions to the non-stationarity and unclear learning objective problems in zero-sum games. We propose a solution named EPSOM which aims for finding safe exploitation strategies to play against non-stationary opponents. We verify our proposed methods by varied experiments and show they can achieve the desired performance. Limitations and future works are discussed in the last chapter of this thesis

UCL Discovery