8,668 research outputs found
Toward Interpretable Deep Reinforcement Learning with Linear Model U-Trees
Deep Reinforcement Learning (DRL) has achieved impressive success in many
applications. A key component of many DRL models is a neural network
representing a Q function, to estimate the expected cumulative reward following
a state-action pair. The Q function neural network contains a lot of implicit
knowledge about the RL problems, but often remains unexamined and
uninterpreted. To our knowledge, this work develops the first mimic learning
framework for Q functions in DRL. We introduce Linear Model U-trees (LMUTs) to
approximate neural network predictions. An LMUT is learned using a novel
on-line algorithm that is well-suited for an active play setting, where the
mimic learner observes an ongoing interaction between the neural net and the
environment. Empirical evaluation shows that an LMUT mimics a Q function
substantially better than five baseline methods. The transparent tree structure
of an LMUT facilitates understanding the network's learned knowledge by
analyzing feature influence, extracting rules, and highlighting the
super-pixels in image inputs.Comment: This paper is accepted by ECML-PKDD 201
Deep Predictive Policy Training using Reinforcement Learning
Skilled robot task learning is best implemented by predictive action policies
due to the inherent latency of sensorimotor processes. However, training such
predictive policies is challenging as it involves finding a trajectory of motor
activations for the full duration of the action. We propose a data-efficient
deep predictive policy training (DPPT) framework with a deep neural network
policy architecture which maps an image observation to a sequence of motor
activations. The architecture consists of three sub-networks referred to as the
perception, policy and behavior super-layers. The perception and behavior
super-layers force an abstraction of visual and motor data trained with
synthetic and simulated training samples, respectively. The policy super-layer
is a small sub-network with fewer parameters that maps data in-between the
abstracted manifolds. It is trained for each task using methods for policy
search reinforcement learning. We demonstrate the suitability of the proposed
architecture and learning framework by training predictive policies for skilled
object grasping and ball throwing on a PR2 robot. The effectiveness of the
method is illustrated by the fact that these tasks are trained using only about
180 real robot attempts with qualitative terminal rewards.Comment: This work is submitted to IEEE/RSJ International Conference on
Intelligent Robots and Systems 2017 (IROS2017
Data-efficient learning of feedback policies from image pixels using deep dynamical models
Data-efficient reinforcement learning (RL) in continuous state-action spaces using very high-dimensional observations remains a key challenge in developing fully autonomous systems. We consider a particularly important instance of this challenge, the pixels-to-torques problem, where an RL agent learns a closed-loop control policy ( torques ) from pixel information only. We introduce a data-efficient, model-based reinforcement learning algorithm that learns such a closed-loop policy directly from pixel information. The key ingredient is a deep dynamical model for learning a low-dimensional feature embedding of images jointly with a predictive model in this low-dimensional feature space. Joint learning is crucial for long-term predictions, which lie at the core of the adaptive nonlinear model predictive control strategy that we use for closed-loop control. Compared to state-of-the-art RL methods for continuous states and actions, our approach learns quickly, scales to high-dimensional state spaces, is lightweight and an important step toward fully autonomous end-to-end learning from pixels to torques
- …