20,839 research outputs found
Crossmodal Attentive Skill Learner
This paper presents the Crossmodal Attentive Skill Learner (CASL), integrated
with the recently-introduced Asynchronous Advantage Option-Critic (A2OC)
architecture [Harb et al., 2017] to enable hierarchical reinforcement learning
across multiple sensory inputs. We provide concrete examples where the approach
not only improves performance in a single task, but accelerates transfer to new
tasks. We demonstrate the attention mechanism anticipates and identifies useful
latent features, while filtering irrelevant sensor modalities during execution.
We modify the Arcade Learning Environment [Bellemare et al., 2013] to support
audio queries, and conduct evaluations of crossmodal learning in the Atari 2600
game Amidar. Finally, building on the recent work of Babaeizadeh et al. [2017],
we open-source a fast hybrid CPU-GPU implementation of CASL.Comment: International Conference on Autonomous Agents and Multiagent Systems
(AAMAS) 2018, NIPS 2017 Deep Reinforcement Learning Symposiu
Classifying Options for Deep Reinforcement Learning
In this paper we combine one method for hierarchical reinforcement learning -
the options framework - with deep Q-networks (DQNs) through the use of
different "option heads" on the policy network, and a supervisory network for
choosing between the different options. We utilise our setup to investigate the
effects of architectural constraints in subtasks with positive and negative
transfer, across a range of network capacities. We empirically show that our
augmented DQN has lower sample complexity when simultaneously learning subtasks
with negative transfer, without degrading performance when learning subtasks
with positive transfer.Comment: IJCAI 2016 Workshop on Deep Reinforcement Learning: Frontiers and
Challenge
Learning and Transfer of Modulated Locomotor Controllers
We study a novel architecture and training procedure for locomotion tasks. A
high-frequency, low-level "spinal" network with access to proprioceptive
sensors learns sensorimotor primitives by training on simple tasks. This
pre-trained module is fixed and connected to a low-frequency, high-level
"cortical" network, with access to all sensors, which drives behavior by
modulating the inputs to the spinal network. Where a monolithic end-to-end
architecture fails completely, learning with a pre-trained spinal module
succeeds at multiple high-level tasks, and enables the effective exploration
required to learn from sparse rewards. We test our proposed architecture on
three simulated bodies: a 16-dimensional swimming snake, a 20-dimensional
quadruped, and a 54-dimensional humanoid. Our results are illustrated in the
accompanying video at https://youtu.be/sboPYvhpraQComment: Supplemental video available at https://youtu.be/sboPYvhpra
OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning
Reinforcement learning has shown promise in learning policies that can solve
complex problems. However, manually specifying a good reward function can be
difficult, especially for intricate tasks. Inverse reinforcement learning
offers a useful paradigm to learn the underlying reward function directly from
expert demonstrations. Yet in reality, the corpus of demonstrations may contain
trajectories arising from a diverse set of underlying reward functions rather
than a single one. Thus, in inverse reinforcement learning, it is useful to
consider such a decomposition. The options framework in reinforcement learning
is specifically designed to decompose policies in a similar light. We therefore
extend the options framework and propose a method to simultaneously recover
reward options in addition to policy options. We leverage adversarial methods
to learn joint reward-policy options using only observed expert states. We show
that this approach works well in both simple and complex continuous control
tasks and shows significant performance increases in one-shot transfer
learning.Comment: Accepted to the Thirthy-Second AAAI Conference On Artificial
Intelligence (AAAI), 201
- …