48,800 research outputs found
Reinforcement Learning using Augmented Neural Networks
Neural networks allow Q-learning reinforcement learning agents such as deep
Q-networks (DQN) to approximate complex mappings from state spaces to value
functions. However, this also brings drawbacks when compared to other function
approximators such as tile coding or their generalisations, radial basis
functions (RBF) because they introduce instability due to the side effect of
globalised updates present in neural networks. This instability does not even
vanish in neural networks that do not have any hidden layers. In this paper, we
show that simple modifications to the structure of the neural network can
improve stability of DQN learning when a multi-layer perceptron is used for
function approximation.Comment: 7 pages; two columns; 4 figure
A geometrical analysis of global stability in trained feedback networks
Recurrent neural networks have been extensively studied in the context of
neuroscience and machine learning due to their ability to implement complex
computations. While substantial progress in designing effective learning
algorithms has been achieved in the last years, a full understanding of trained
recurrent networks is still lacking. Specifically, the mechanisms that allow
computations to emerge from the underlying recurrent dynamics are largely
unknown. Here we focus on a simple, yet underexplored computational setup: a
feedback architecture trained to associate a stationary output to a stationary
input. As a starting point, we derive an approximate analytical description of
global dynamics in trained networks which assumes uncorrelated connectivity
weights in the feedback and in the random bulk. The resulting mean-field theory
suggests that the task admits several classes of solutions, which imply
different stability properties. Different classes are characterized in terms of
the geometrical arrangement of the readout with respect to the input vectors,
defined in the high-dimensional space spanned by the network population. We
find that such approximate theoretical approach can be used to understand how
standard training techniques implement the input-output task in finite-size
feedback networks. In particular, our simplified description captures the local
and the global stability properties of the target solution, and thus predicts
training performance
Deep Reinforcement Learning with Double Q-learning
The popular Q-learning algorithm is known to overestimate action values under
certain conditions. It was not previously known whether, in practice, such
overestimations are common, whether they harm performance, and whether they can
generally be prevented. In this paper, we answer all these questions
affirmatively. In particular, we first show that the recent DQN algorithm,
which combines Q-learning with a deep neural network, suffers from substantial
overestimations in some games in the Atari 2600 domain. We then show that the
idea behind the Double Q-learning algorithm, which was introduced in a tabular
setting, can be generalized to work with large-scale function approximation. We
propose a specific adaptation to the DQN algorithm and show that the resulting
algorithm not only reduces the observed overestimations, as hypothesized, but
that this also leads to much better performance on several games.Comment: AAAI 201
Deep Reinforcement Learning from Self-Play in Imperfect-Information Games
Many real-world applications can be described as large-scale games of
imperfect information. To deal with these challenging domains, prior work has
focused on computing Nash equilibria in a handcrafted abstraction of the
domain. In this paper we introduce the first scalable end-to-end approach to
learning approximate Nash equilibria without prior domain knowledge. Our method
combines fictitious self-play with deep reinforcement learning. When applied to
Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium,
whereas common reinforcement learning methods diverged. In Limit Texas Holdem,
a poker game of real-world scale, NFSP learnt a strategy that approached the
performance of state-of-the-art, superhuman algorithms based on significant
domain expertise.Comment: updated version, incorporating conference feedbac
- …