369 research outputs found
Simulation of a Texas Hold'Em poker player
Copyright 2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. This is the accepted version of the article. The published version is available at
Multiparty Dynamics and Failure Modes for Machine Learning and Artificial Intelligence
An important challenge for safety in machine learning and artificial
intelligence systems is a~set of related failures involving specification
gaming, reward hacking, fragility to distributional shifts, and Goodhart's or
Campbell's law. This paper presents additional failure modes for interactions
within multi-agent systems that are closely related. These multi-agent failure
modes are more complex, more problematic, and less well understood than the
single-agent case, and are also already occurring, largely unnoticed. After
motivating the discussion with examples from poker-playing artificial
intelligence (AI), the paper explains why these failure modes are in some
senses unavoidable. Following this, the paper categorizes failure modes,
provides definitions, and cites examples for each of the modes: accidental
steering, coordination failures, adversarial misalignment, input spoofing and
filtering, and goal co-option or direct hacking. The paper then discusses how
extant literature on multi-agent AI fails to address these failure modes, and
identifies work which may be useful for the mitigation of these failure modes.Comment: 12 Pages, This version re-submitted to Big Data and Cognitive
Computing, Special Issue "Artificial Superintelligence: Coordination &
Strategy
Traditional Wisdom and Monte Carlo Tree Search Face-to-Face in the Card Game Scopone
We present the design of a competitive artificial intelligence for Scopone, a
popular Italian card game. We compare rule-based players using the most
established strategies (one for beginners and two for advanced players) against
players using Monte Carlo Tree Search (MCTS) and Information Set Monte Carlo
Tree Search (ISMCTS) with different reward functions and simulation strategies.
MCTS requires complete information about the game state and thus implements a
cheating player while ISMCTS can deal with incomplete information and thus
implements a fair player. Our results show that, as expected, the cheating MCTS
outperforms all the other strategies; ISMCTS is stronger than all the
rule-based players implementing well-known and most advanced strategies and it
also turns out to be a challenging opponent for human players.Comment: Preprint. Accepted for publication in the IEEE Transaction on Game
Recommended from our members
Bayesian opponent modeling in adversarial game environments.
This thesis investigates the use of Bayesian analysis upon an opponentÂżs behaviour in order to determine the desired goals or strategy used by a given adversary. A terrain analysis approach utilising the A* algorithm is investigated, where a probability distribution between discrete behaviours of an opponent relative to a set of possible goals is generated. The Bayesian analysis of agent behaviour accurately determines the intended goal of an opponent agent, even when the opponentÂżs actions are altered randomly. The environment of Poker is introduced and abstracted for ease of analysis. BayesÂż theorem is used to generate an effective opponent model, categorizing behaviour according to its similarity with known styles of opponent. The accuracy of BayesÂż rule yields a notable improvement in the performance of an agent once an opponentÂżs style is understood. A hybrid of the Bayesian style predictor and a neuroevolutionary approach is shown to lead to effective dynamic play, in comparison to agents that do not use an opponent model. The use of recurrence in evolved networks is also shown to improve the performance and generalizability of an agent in a multiplayer environment. These strategies are then employed in the full-scale environment of Texas HoldÂżem, where a betting round-based approach proves useful in determining and counteracting an opponentÂżs play. It is shown that the use of opponent models, with the adaptive benefits of neuroevolution aid the performance of an agent, even when the behaviour of an opponent does not necessarily fit within the strict definitions of opponent ÂżstyleÂż.Engineering and Physical Sciences Research Council (EPSRC
POKERFACE: EMOTION BASED GAME-PLAY TECHNIQUES FOR COMPUTER POKER PLAYERS
Numerous algorithms/methods exist for creating computer poker players. This thesis comparesand contrasts them. A set of poker agents for the system PokerFace are then introduced. A surveyof the problem of facial expression recognition is included in the hopes it may be used to build abetter computer poker player
Towards a theory of deception
This paper proposes an equilibrium approach to deception where deception is defined to be the process by which actions are chosen to induce erroneous inferences so as to take advantage of them. Specifically, we introduce a framework with boundedly rational players in which agents make inferences based on a coarse information about others' behaviors: Agents are assumed to know only the average reaction function of other agents over groups of situations. Equilibrium requires that the coarse information available to agents is correct, and that inferences and optimizations are made based on the simplest theories compatible with the available information. We illustrate the phenomenon of deception and how reputation concerns may arise even in zero-sum games in which there is no value to commitment. We further illustrate how the possibility of deception affects standard economic insights through a number of stylized applications including a monitoring game and two simple bargaining games. The approach can be viewed as formalizing into a game theoretic setting a well documented bias in social psychology, the Fundamental Attribution Error.deception ; game theory ; fundamental attribution error
Opponent Modelling in Multi-Agent Systems
Reinforcement Learning (RL) formalises a problem where an intelligent agent needs to learn and achieve certain goals by maximising a long-term return in an environment. Multi-agent reinforcement learning (MARL) extends traditional RL to multiple agents. Many RL algorithms lose convergence guarantee in non-stationary environments due to the adaptive opponents. Partial observation caused by agents’ different private observations introduces high variance during the training which exacerbates the data inefficiency. In MARL, training an agent to perform well against a set of opponents often leads to bad performance against another set of opponents. Non-stationarity, partial observation and unclear learning objective are three critical problems in MARL which hinder agents’ learning and they all share a cause which is the lack of knowledge of the other agents. Therefore, in this thesis, we propose to solve these problems with opponent modelling methods. We tailor our solutions by combining opponent modelling with other techniques according to the characteristics of problems we face. Specifically, we first propose ROMMEO, an algorithm inspired by Bayesian inference, as a solution to alleviate the non-stationarity in cooperative games. Then we study the partial observation problem caused by agents’ private observation and design an implicit communication training method named PBL. Lastly, we investigate solutions to the non-stationarity and unclear learning objective problems in zero-sum games. We propose a solution named EPSOM which aims for finding safe exploitation strategies to play against non-stationary opponents. We verify our proposed methods by varied experiments and show they can achieve the desired performance. Limitations and future works are discussed in the last chapter of this thesis
- …