Search CORE

209 research outputs found

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

Author: Heinrich Johannes
Silver David
Publication venue
Publication date: 03/03/2016
Field of study

Many real-world applications can be described as large-scale games of imperfect information. To deal with these challenging domains, prior work has focused on computing Nash equilibria in a handcrafted abstraction of the domain. In this paper we introduce the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge. Our method combines fictitious self-play with deep reinforcement learning. When applied to Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium, whereas common reinforcement learning methods diverged. In Limit Texas Holdem, a poker game of real-world scale, NFSP learnt a strategy that approached the performance of state-of-the-art, superhuman algorithms based on significant domain expertise.Comment: updated version, incorporating conference feedbac

arXiv.org e-Print Archive

UCL Discovery

Rational bidding using reinforcement learning: an application in automated resource allocation

Author: A. Sherstov
C. Watkins
D. Gode
D. Reeves
E. Medernach
H.J. Herik van den
I. Erev
K. Lai
L. Panait
M. He
M. Kearns
M. Wellman
P. Green
R. Luce
S. Kaplan
T. Saaty
W. Smith
Y. Shoham
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

The application of autonomous agents by the provisioning and usage of computational resources is an attractive research field. Various methods and technologies in the area of artificial intelligence, statistics and economics are playing together to achieve i) autonomic resource provisioning and usage of computational resources, to invent ii) competitive bidding strategies for widely used market mechanisms and to iii) incentivize consumers and providers to use such market-based systems. The contributions of the paper are threefold. First, we present a framework for supporting consumers and providers in technical and economic preference elicitation and the generation of bids. Secondly, we introduce a consumer-side reinforcement learning bidding strategy which enables rational behavior by the generation and selection of bids. Thirdly, we evaluate and compare this bidding strategy against a truth-telling bidding strategy for two kinds of market mechanisms – one centralized and one decentralized

Crossref

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems.

Author: Laurent Guillaume J.
Le Fort-Piat Nadine
Matignon Laëtitia
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 06/03/2012
Field of study

International audienceIn the framework of fully cooperative multi-agent systems, independent (non-communicative) agents that learn by reinforcement must overcome several difficulties to manage to coordinate. This paper identifies several challenges responsible for the non-coordination of independent agents: Pareto-selection, nonstationarity, stochasticity, alter-exploration and shadowed equilibria. A selection of multi-agent domains is classified according to those challenges: matrix games, Boutilier's coordination game, predators pursuit domains and a special multi-state game. Moreover the performance of a range of algorithms for independent reinforcement learners is evaluated empirically. Those algorithms are Q-learning variants: decentralized Q-learning, distributed Q-learning, hysteretic Q-learning, recursive FMQ and WoLF PHC. An overview of the learning algorithms' strengths and weaknesses against each challenge concludes the paper and can serve as a basis for choosing the appropriate algorithm for a new domain. Furthermore, the distilled challenges may assist in the design of new learning algorithms that overcome these problems and achieve higher performance in multi-agent applications

HAL - Université de Franche-Comté

Intrinsic fluctuations of reinforcement learning promote cooperation

Author: Barfuss Wolfram
Meylahn Janusz M.
Publication venue
Publication date: 01/09/2022
Field of study

In this work, we ask for and answer what makes classical reinforcement learning cooperative. Cooperating in social dilemma situations is vital for animals, humans, and machines. While evolutionary theory revealed a range of mechanisms promoting cooperation, the conditions under which agents learn to cooperate are contested. Here, we demonstrate which and how individual elements of the multi-agent learning setting lead to cooperation. Specifically, we consider the widely used temporal-difference reinforcement learning algorithm with epsilon-greedy exploration in the classic environment of an iterated Prisoner's dilemma with one-period memory. Each of the two learning agents learns a strategy that conditions the following action choices on both agents' action choices of the last round. We find that next to a high caring for future rewards, a low exploration rate, and a small learning rate, it is primarily intrinsic stochastic fluctuations of the reinforcement learning process which double the final rate of cooperation to up to 80\%. Thus, inherent noise is not a necessary evil of the iterative learning process. It is a critical asset for the learning of cooperation. However, we also point out the trade-off between a high likelihood of cooperative behavior and achieving this in a reasonable amount of time. Our findings are relevant for purposefully designing cooperative algorithms and regulating undesired collusive effects.Comment: 9 pages, 4 figure

arXiv.org e-Print Archive

University of Twente Research Information

Segregation Dynamics with Reinforcement Learning and Agent Based Modeling

Author: Bar-Yam Yaneer
Morales Alfredo J.
Sert Egemen
Publication venue
Publication date: 18/09/2019
Field of study

Societies are complex. Properties of social systems can be explained by the interplay and weaving of individual actions. Incentives are key to understand people's choices and decisions. For instance, individual preferences of where to live may lead to the emergence of social segregation. In this paper, we combine Reinforcement Learning (RL) with Agent Based Models (ABM) in order to address the self-organizing dynamics of social segregation and explore the space of possibilities that emerge from considering different types of incentives. Our model promotes the creation of interdependencies and interactions among multiple agents of two different kinds that want to segregate from each other. For this purpose, agents use Deep Q-Networks to make decisions based on the rules of the Schelling Segregation model and the Predator-Prey model. Despite the segregation incentive, our experiments show that spatial integration can be achieved by establishing interdependencies among agents of different kinds. They also reveal that segregated areas are more probable to host older people than diverse areas, which attract younger ones. Through this work, we show that the combination of RL and ABMs can create an artificial environment for policy makers to observe potential and existing behaviors associated to incentives.Comment: 14 pages, 4 figures + supplemental material, in revie

arXiv.org e-Print Archive

OpenMETU (Middle East Technical University)

Q-Strategy: A Bidding Strategy for Market-Based Allocation of Grid Services

Author: A. Sherstov
C. Watkins
D. Cliff
D. Gode
D. Minoli
E. Medernach
H.J. Herik van den
I. Erev
K. Lai
L. Panait
M. He
M. Wellman
P. Green
R. Luce
R. Wolski
S. Gjerstad
S. Kaplan
T. Saaty
W. Smith
Y. Shoham
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

The application of autonomous agents by the provisioning and usage of computational services is an attractive research field. Various methods and technologies in the area of artificial intelligence, statistics and economics are playing together to achieve i) autonomic service provisioning and usage of Grid services, to invent ii) competitive bidding strategies for widely used market mechanisms and to iii) incentivize consumers and providers to use such market-based systems. The contributions of the paper are threefold. First, we present a bidding agent framework for implementing artificial bidding agents, supporting consumers and providers in technical and economic preference elicitation as well as automated bid generation by the requesting and provisioning of Grid services. Secondly, we introduce a novel consumer-side bidding strategy, which enables a goal-oriented and strategic behavior by the generation and submission of consumer service requests and selection of provider offers. Thirdly, we evaluate and compare the Q-strategy, implemented within the presented framework, against the Truth-Telling bidding strategy in three mechanisms – a centralized CDA, a decentralized on-line machine scheduling and a FIFO-scheduling mechanisms

Crossref

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive