18 research outputs found

    Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

    Get PDF
    Many real-world applications can be described as large-scale games of imperfect information. To deal with these challenging domains, prior work has focused on computing Nash equilibria in a handcrafted abstraction of the domain. In this paper we introduce the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge. Our method combines fictitious self-play with deep reinforcement learning. When applied to Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium, whereas common reinforcement learning methods diverged. In Limit Texas Holdem, a poker game of real-world scale, NFSP learnt a strategy that approached the performance of state-of-the-art, superhuman algorithms based on significant domain expertise.Comment: updated version, incorporating conference feedbac

    Independent learners in abstract traffic scenarios

    Get PDF
    Traffic is a phenomena that emerges from individual, uncoordinatedand, most of the times, selfish route choice made by drivers. In general, this leads topoor global and individual performance, regarding travel times and road network loadbalance. This work presents a reinforcement learning based approach for route choicewhich relies solely on drivers experience to guide their decisions. There is no coordinatedlearning mechanism, thus driver agents are independent learners. Our approachis tested on two abstract traffic scenarios and it is compared to other route choice methods.Experimental results show that drivers learn routes in complex scenarios with noprior knowledge. Plus, the approach outperforms the compared route choice methodsregarding drivers’ travel time. Also, satisfactory performance is achieved regardingroad network load balance. The simplicity, realistic assumptions and performance ofthe proposed approach suggests that it is a feasible candidate for implementation innavigation systems for guiding drivers decision regarding route choice

    Quantifying the Impact of Non-Stationarity in Reinforcement Learning-Based Traffic Signal Control

    Get PDF
    In reinforcement learning (RL), dealing with non-stationarity is a challenging issue. However, some domains such as traffic optimization are inherently non-stationary. Causes for and effects of this are manifold. In particular, when dealing with traffic signal controls, addressing non-stationarity is key since traffic conditions change over time and as a function of traffic control decisions taken in other parts of a network. In this paper we analyze the effects that different sources of non-stationarity have in a network of traffic signals, in which each signal is modeled as a learning agent. More precisely, we study both the effects of changing the \textit{context} in which an agent learns (e.g., a change in flow rates experienced by it), as well as the effects of reducing agent observability of the true environment state. Partial observability may cause distinct states (in which distinct actions are optimal) to be seen as the same by the traffic signal agents. This, in turn, may lead to sub-optimal performance. We show that the lack of suitable sensors to provide a representative observation of the real state seems to affect the performance more drastically than the changes to the underlying traffic patterns.Comment: 13 page
    corecore