18 research outputs found
Deep Reinforcement Learning from Self-Play in Imperfect-Information Games
Many real-world applications can be described as large-scale games of
imperfect information. To deal with these challenging domains, prior work has
focused on computing Nash equilibria in a handcrafted abstraction of the
domain. In this paper we introduce the first scalable end-to-end approach to
learning approximate Nash equilibria without prior domain knowledge. Our method
combines fictitious self-play with deep reinforcement learning. When applied to
Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium,
whereas common reinforcement learning methods diverged. In Limit Texas Holdem,
a poker game of real-world scale, NFSP learnt a strategy that approached the
performance of state-of-the-art, superhuman algorithms based on significant
domain expertise.Comment: updated version, incorporating conference feedbac
Independent learners in abstract traffic scenarios
Traffic is a phenomena that emerges from individual, uncoordinatedand, most of the times, selfish route choice made by drivers. In general, this leads topoor global and individual performance, regarding travel times and road network loadbalance. This work presents a reinforcement learning based approach for route choicewhich relies solely on drivers experience to guide their decisions. There is no coordinatedlearning mechanism, thus driver agents are independent learners. Our approachis tested on two abstract traffic scenarios and it is compared to other route choice methods.Experimental results show that drivers learn routes in complex scenarios with noprior knowledge. Plus, the approach outperforms the compared route choice methodsregarding drivers’ travel time. Also, satisfactory performance is achieved regardingroad network load balance. The simplicity, realistic assumptions and performance ofthe proposed approach suggests that it is a feasible candidate for implementation innavigation systems for guiding drivers decision regarding route choice
Quantifying the Impact of Non-Stationarity in Reinforcement Learning-Based Traffic Signal Control
In reinforcement learning (RL), dealing with non-stationarity is a
challenging issue. However, some domains such as traffic optimization are
inherently non-stationary. Causes for and effects of this are manifold. In
particular, when dealing with traffic signal controls, addressing
non-stationarity is key since traffic conditions change over time and as a
function of traffic control decisions taken in other parts of a network. In
this paper we analyze the effects that different sources of non-stationarity
have in a network of traffic signals, in which each signal is modeled as a
learning agent. More precisely, we study both the effects of changing the
\textit{context} in which an agent learns (e.g., a change in flow rates
experienced by it), as well as the effects of reducing agent observability of
the true environment state. Partial observability may cause distinct states (in
which distinct actions are optimal) to be seen as the same by the traffic
signal agents. This, in turn, may lead to sub-optimal performance. We show that
the lack of suitable sensors to provide a representative observation of the
real state seems to affect the performance more drastically than the changes to
the underlying traffic patterns.Comment: 13 page