1,497 research outputs found
Deep Ordinal Reinforcement Learning
Reinforcement learning usually makes use of numerical rewards, which have
nice properties but also come with drawbacks and difficulties. Using rewards on
an ordinal scale (ordinal rewards) is an alternative to numerical rewards that
has received more attention in recent years. In this paper, a general approach
to adapting reinforcement learning problems to the use of ordinal rewards is
presented and motivated. We show how to convert common reinforcement learning
algorithms to an ordinal variation by the example of Q-learning and introduce
Ordinal Deep Q-Networks, which adapt deep reinforcement learning to ordinal
rewards. Additionally, we run evaluations on problems provided by the OpenAI
Gym framework, showing that our ordinal variants exhibit a performance that is
comparable to the numerical variations for a number of problems. We also give
first evidence that our ordinal variant is able to produce better results for
problems with less engineered and simpler-to-design reward signals.Comment: replaced figures for better visibility, added github repository, more
details about source of experimental results, updated target value
calculation for standard and ordinal Deep Q-Networ
Grounding Language for Transfer in Deep Reinforcement Learning
In this paper, we explore the utilization of natural language to drive
transfer for reinforcement learning (RL). Despite the wide-spread application
of deep RL techniques, learning generalized policy representations that work
across domains remains a challenging problem. We demonstrate that textual
descriptions of environments provide a compact intermediate channel to
facilitate effective policy transfer. Specifically, by learning to ground the
meaning of text to the dynamics of the environment such as transitions and
rewards, an autonomous agent can effectively bootstrap policy learning on a new
domain given its description. We employ a model-based RL approach consisting of
a differentiable planning module, a model-free component and a factorized state
representation to effectively use entity descriptions. Our model outperforms
prior work on both transfer and multi-task scenarios in a variety of different
environments. For instance, we achieve up to 14% and 11.5% absolute improvement
over previously existing models in terms of average and initial rewards,
respectively.Comment: JAIR 201
Deep Reinforcement Learning with Double Q-learning
The popular Q-learning algorithm is known to overestimate action values under
certain conditions. It was not previously known whether, in practice, such
overestimations are common, whether they harm performance, and whether they can
generally be prevented. In this paper, we answer all these questions
affirmatively. In particular, we first show that the recent DQN algorithm,
which combines Q-learning with a deep neural network, suffers from substantial
overestimations in some games in the Atari 2600 domain. We then show that the
idea behind the Double Q-learning algorithm, which was introduced in a tabular
setting, can be generalized to work with large-scale function approximation. We
propose a specific adaptation to the DQN algorithm and show that the resulting
algorithm not only reduces the observed overestimations, as hypothesized, but
that this also leads to much better performance on several games.Comment: AAAI 201
- …