71 research outputs found
Composable Deep Reinforcement Learning for Robotic Manipulation
Model-free deep reinforcement learning has been shown to exhibit good
performance in domains ranging from video games to simulated robotic
manipulation and locomotion. However, model-free methods are known to perform
poorly when the interaction time with the environment is limited, as is the
case for most real-world robotic tasks. In this paper, we study how maximum
entropy policies trained using soft Q-learning can be applied to real-world
robotic manipulation. The application of this method to real-world manipulation
is facilitated by two important features of soft Q-learning. First, soft
Q-learning can learn multimodal exploration strategies by learning policies
represented by expressive energy-based models. Second, we show that policies
learned with soft Q-learning can be composed to create new policies, and that
the optimality of the resulting policy can be bounded in terms of the
divergence between the composed policies. This compositionality provides an
especially valuable tool for real-world manipulation, where constructing new
policies by composing existing skills can provide a large gain in efficiency
over training from scratch. Our experimental evaluation demonstrates that soft
Q-learning is substantially more sample efficient than prior model-free deep
reinforcement learning methods, and that compositionality can be performed for
both simulated and real-world tasks.Comment: Videos: https://sites.google.com/view/composing-real-world-policies
Robot eye-hand coordination learning by watching human demonstrations: a task function approximation approach
We present a robot eye-hand coordination learning method that can directly
learn visual task specification by watching human demonstrations. Task
specification is represented as a task function, which is learned using inverse
reinforcement learning(IRL) by inferring differential rewards between state
changes. The learned task function is then used as continuous feedbacks in an
uncalibrated visual servoing(UVS) controller designed for the execution phase.
Our proposed method can directly learn from raw videos, which removes the need
for hand-engineered task specification. It can also provide task
interpretability by directly approximating the task function. Besides,
benefiting from the use of a traditional UVS controller, our training process
is efficient and the learned policy is independent from a particular robot
platform. Various experiments were designed to show that, for a certain DOF
task, our method can adapt to task/environment variances in target positions,
backgrounds, illuminations, and occlusions without prior retraining.Comment: Accepted in ICRA 201
- …