14,860 research outputs found
Hybrid Reinforcement Learning with Expert State Sequences
Existing imitation learning approaches often require that the complete
demonstration data, including sequences of actions and states, are available.
In this paper, we consider a more realistic and difficult scenario where a
reinforcement learning agent only has access to the state sequences of an
expert, while the expert actions are unobserved. We propose a novel
tensor-based model to infer the unobserved actions of the expert state
sequences. The policy of the agent is then optimized via a hybrid objective
combining reinforcement learning and imitation learning. We evaluated our
hybrid approach on an illustrative domain and Atari games. The empirical
results show that (1) the agents are able to leverage state expert sequences to
learn faster than pure reinforcement learning baselines, (2) our tensor-based
action inference model is advantageous compared to standard deep neural
networks in inferring expert actions, and (3) the hybrid policy optimization
objective is robust against noise in expert state sequences.Comment: AAAI 2019; https://github.com/XiaoxiaoGuo/tensor4r
Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
Model-free deep reinforcement learning algorithms have been shown to be
capable of learning a wide range of robotic skills, but typically require a
very large number of samples to achieve good performance. Model-based
algorithms, in principle, can provide for much more efficient learning, but
have proven difficult to extend to expressive, high-capacity models such as
deep neural networks. In this work, we demonstrate that medium-sized neural
network models can in fact be combined with model predictive control (MPC) to
achieve excellent sample complexity in a model-based reinforcement learning
algorithm, producing stable and plausible gaits to accomplish various complex
locomotion tasks. We also propose using deep neural network dynamics models to
initialize a model-free learner, in order to combine the sample efficiency of
model-based approaches with the high task-specific performance of model-free
methods. We empirically demonstrate on MuJoCo locomotion tasks that our pure
model-based approach trained on just random action data can follow arbitrary
trajectories with excellent sample efficiency, and that our hybrid algorithm
can accelerate model-free learning on high-speed benchmark tasks, achieving
sample efficiency gains of 3-5x on swimmer, cheetah, hopper, and ant agents.
Videos can be found at https://sites.google.com/view/mbm
Efficient Model Learning for Human-Robot Collaborative Tasks
We present a framework for learning human user models from joint-action
demonstrations that enables the robot to compute a robust policy for a
collaborative task with a human. The learning takes place completely
automatically, without any human intervention. First, we describe the
clustering of demonstrated action sequences into different human types using an
unsupervised learning algorithm. These demonstrated sequences are also used by
the robot to learn a reward function that is representative for each type,
through the employment of an inverse reinforcement learning algorithm. The
learned model is then used as part of a Mixed Observability Markov Decision
Process formulation, wherein the human type is a partially observable variable.
With this framework, we can infer, either offline or online, the human type of
a new user that was not included in the training set, and can compute a policy
for the robot that will be aligned to the preference of this new user and will
be robust to deviations of the human actions from prior demonstrations. Finally
we validate the approach using data collected in human subject experiments, and
conduct proof-of-concept demonstrations in which a person performs a
collaborative task with a small industrial robot
- …