Search CORE

18 research outputs found

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

Author: Heinrich Johannes
Silver David
Publication venue
Publication date: 03/03/2016
Field of study

Many real-world applications can be described as large-scale games of imperfect information. To deal with these challenging domains, prior work has focused on computing Nash equilibria in a handcrafted abstraction of the domain. In this paper we introduce the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge. Our method combines fictitious self-play with deep reinforcement learning. When applied to Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium, whereas common reinforcement learning methods diverged. In Limit Texas Holdem, a poker game of real-world scale, NFSP learnt a strategy that approached the performance of state-of-the-art, superhuman algorithms based on significant domain expertise.Comment: updated version, incorporating conference feedbac

arXiv.org e-Print Archive

UCL Discovery

Independent learners in abstract traffic scenarios

Author: Bazzan Ana Lucia Cetertich
Tavares Anderson Rocha
Publication venue: 'Universidade Federal do Rio Grande do Sul'
Publication date: 01/01/2012
Field of study

Traffic is a phenomena that emerges from individual, uncoordinatedand, most of the times, selfish route choice made by drivers. In general, this leads topoor global and individual performance, regarding travel times and road network loadbalance. This work presents a reinforcement learning based approach for route choicewhich relies solely on drivers experience to guide their decisions. There is no coordinatedlearning mechanism, thus driver agents are independent learners. Our approachis tested on two abstract traffic scenarios and it is compared to other route choice methods.Experimental results show that drivers learn routes in complex scenarios with noprior knowledge. Plus, the approach outperforms the compared route choice methodsregarding drivers’ travel time. Also, satisfactory performance is achieved regardingroad network load balance. The simplicity, realistic assumptions and performance ofthe proposed approach suggests that it is a feasible candidate for implementation innavigation systems for guiding drivers decision regarding route choice

Em Questao

Archives of the Faculty of Veterinary Medicine UFRGS

Lume 5.8

Quantifying the Impact of Non-Stationarity in Reinforcement Learning-Based Traffic Signal Control

Author: Alegre Lucas N.
Bazzan Ana L. C.
da Silva Bruno C.
Publication venue
Publication date: 09/04/2020
Field of study

In reinforcement learning (RL), dealing with non-stationarity is a challenging issue. However, some domains such as traffic optimization are inherently non-stationary. Causes for and effects of this are manifold. In particular, when dealing with traffic signal controls, addressing non-stationarity is key since traffic conditions change over time and as a function of traffic control decisions taken in other parts of a network. In this paper we analyze the effects that different sources of non-stationarity have in a network of traffic signals, in which each signal is modeled as a learning agent. More precisely, we study both the effects of changing the \textit{context} in which an agent learns (e.g., a change in flow rates experienced by it), as well as the effects of reducing agent observability of the true environment state. Partial observability may cause distinct states (in which distinct actions are optimal) to be seen as the same by the traffic signal agents. This, in turn, may lead to sub-optimal performance. We show that the lack of suitable sensors to provide a representative observation of the real state seems to affect the performance more drastically than the changes to the underlying traffic patterns.Comment: 13 page

arXiv.org e-Print Archive

ScholarWorks@UMass Amherst

Lume 5.8

PubMed Central

Intelligent Traffic Signal - coordinated or isolated control?

Author: Cristina Vilarinho
José Pedro Tavares
Rosaldo J. F. Rossetti
Publication venue
Publication date: 01/01/2020
Field of study

Repositório Aberto da Universidade do Porto