Search CORE

12,364 research outputs found

Analysis and Optimization of Deep Counterfactual Value Networks

Author: J Nash
M Bowling
T Kanungo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/10/2018
Field of study

Recently a strong poker-playing algorithm called DeepStack was published, which is able to find an approximate Nash equilibrium during gameplay by using heuristic values of future states predicted by deep neural networks. This paper analyzes new ways of encoding the inputs and outputs of DeepStack's deep counterfactual value networks based on traditional abstraction techniques, as well as an unabstracted encoding, which was able to increase the network's accuracy.Comment: Long version of publication appearing at KI 2018: The 41st German Conference on Artificial Intelligence (http://dx.doi.org/10.1007/978-3-030-00111-7_26). Corrected typo in titl

arXiv.org e-Print Archive

Crossref

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

Author: Heinrich Johannes
Silver David
Publication venue
Publication date: 03/03/2016
Field of study

Many real-world applications can be described as large-scale games of imperfect information. To deal with these challenging domains, prior work has focused on computing Nash equilibria in a handcrafted abstraction of the domain. In this paper we introduce the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge. Our method combines fictitious self-play with deep reinforcement learning. When applied to Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium, whereas common reinforcement learning methods diverged. In Limit Texas Holdem, a poker game of real-world scale, NFSP learnt a strategy that approached the performance of state-of-the-art, superhuman algorithms based on significant domain expertise.Comment: updated version, incorporating conference feedbac

arXiv.org e-Print Archive

UCL Discovery

Differentiable Algorithm Networks for Composable Robot Learning

Author: Hsu David
Kaelbling Leslie Pack
Karkus Peter
Lee Wee Sun
Lozano-Perez Tomas
Ma Xiao
Publication venue
Publication date: 28/05/2019
Field of study

This paper introduces the Differentiable Algorithm Network (DAN), a composable architecture for robot learning systems. A DAN is composed of neural network modules, each encoding a differentiable robot algorithm and an associated model; and it is trained end-to-end from data. DAN combines the strengths of model-driven modular system design and data-driven end-to-end learning. The algorithms and models act as structural assumptions to reduce the data requirements for learning; end-to-end learning allows the modules to adapt to one another and compensate for imperfect models and algorithms, in order to achieve the best overall system performance. We illustrate the DAN methodology through a case study on a simulated robot system, which learns to navigate in complex 3-D environments with only local visual observations and an image of a partially correct 2-D floor map.Comment: RSS 2019 camera ready. Video is available at https://youtu.be/4jcYlTSJF4

arXiv.org e-Print Archive

DSpace@MIT

The Hanabi Challenge: A New Frontier for AI Research

Author: Bard Nolan
Bellemare Marc G.
Bowling Michael
Burch Neil
Chandar Sarath
Dumoulin Vincent
Dunning Iain
Foerster Jakob N.
Hughes Edward
Lanctot Marc
Larochelle Hugo
Moitra Subhodeep
Mourad Shibl
Parisotto Emilio
Song H. Francis
Publication venue: 'Elsevier BV'
Publication date: 06/12/2019
Field of study

From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making. In recent years, machine learning has made dramatic advances with artificial agents reaching superhuman performance in challenge domains like Go, Atari, and some variants of poker. As with their predecessors of chess, checkers, and backgammon, these game domains have driven research by providing sophisticated yet well-defined challenges for artificial intelligence practitioners. We continue this tradition by proposing the game of Hanabi as a new challenge domain with novel problems that arise from its combination of purely cooperative gameplay with two to five players and imperfect information. In particular, we argue that Hanabi elevates reasoning about the beliefs and intentions of other agents to the foreground. We believe developing novel techniques for such theory of mind reasoning will not only be crucial for success in Hanabi, but also in broader collaborative efforts, especially those with human partners. To facilitate future research, we introduce the open-source Hanabi Learning Environment, propose an experimental framework for the research community to evaluate algorithmic advances, and assess the performance of current state-of-the-art techniques.Comment: 32 pages, 5 figures, In Press (Artificial Intelligence

arXiv.org e-Print Archive

PolyPublie