12,364 research outputs found
Analysis and Optimization of Deep Counterfactual Value Networks
Recently a strong poker-playing algorithm called DeepStack was published,
which is able to find an approximate Nash equilibrium during gameplay by using
heuristic values of future states predicted by deep neural networks. This paper
analyzes new ways of encoding the inputs and outputs of DeepStack's deep
counterfactual value networks based on traditional abstraction techniques, as
well as an unabstracted encoding, which was able to increase the network's
accuracy.Comment: Long version of publication appearing at KI 2018: The 41st German
Conference on Artificial Intelligence
(http://dx.doi.org/10.1007/978-3-030-00111-7_26). Corrected typo in titl
Deep Reinforcement Learning from Self-Play in Imperfect-Information Games
Many real-world applications can be described as large-scale games of
imperfect information. To deal with these challenging domains, prior work has
focused on computing Nash equilibria in a handcrafted abstraction of the
domain. In this paper we introduce the first scalable end-to-end approach to
learning approximate Nash equilibria without prior domain knowledge. Our method
combines fictitious self-play with deep reinforcement learning. When applied to
Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium,
whereas common reinforcement learning methods diverged. In Limit Texas Holdem,
a poker game of real-world scale, NFSP learnt a strategy that approached the
performance of state-of-the-art, superhuman algorithms based on significant
domain expertise.Comment: updated version, incorporating conference feedbac
Differentiable Algorithm Networks for Composable Robot Learning
This paper introduces the Differentiable Algorithm Network (DAN), a
composable architecture for robot learning systems. A DAN is composed of neural
network modules, each encoding a differentiable robot algorithm and an
associated model; and it is trained end-to-end from data. DAN combines the
strengths of model-driven modular system design and data-driven end-to-end
learning. The algorithms and models act as structural assumptions to reduce the
data requirements for learning; end-to-end learning allows the modules to adapt
to one another and compensate for imperfect models and algorithms, in order to
achieve the best overall system performance. We illustrate the DAN methodology
through a case study on a simulated robot system, which learns to navigate in
complex 3-D environments with only local visual observations and an image of a
partially correct 2-D floor map.Comment: RSS 2019 camera ready. Video is available at
https://youtu.be/4jcYlTSJF4
The Hanabi Challenge: A New Frontier for AI Research
From the early days of computing, games have been important testbeds for
studying how well machines can do sophisticated decision making. In recent
years, machine learning has made dramatic advances with artificial agents
reaching superhuman performance in challenge domains like Go, Atari, and some
variants of poker. As with their predecessors of chess, checkers, and
backgammon, these game domains have driven research by providing sophisticated
yet well-defined challenges for artificial intelligence practitioners. We
continue this tradition by proposing the game of Hanabi as a new challenge
domain with novel problems that arise from its combination of purely
cooperative gameplay with two to five players and imperfect information. In
particular, we argue that Hanabi elevates reasoning about the beliefs and
intentions of other agents to the foreground. We believe developing novel
techniques for such theory of mind reasoning will not only be crucial for
success in Hanabi, but also in broader collaborative efforts, especially those
with human partners. To facilitate future research, we introduce the
open-source Hanabi Learning Environment, propose an experimental framework for
the research community to evaluate algorithmic advances, and assess the
performance of current state-of-the-art techniques.Comment: 32 pages, 5 figures, In Press (Artificial Intelligence
- …