4,468 research outputs found
Experiments with hierarchical reinforcement learning of multiple grasping policies
Robotic grasping has attracted considerable interest, but it
still remains a challenging task. The data-driven approach is a promising
solution to the robotic grasping problem; this approach leverages a
grasp dataset and generalizes grasps for various objects. However, these
methods often depend on the quality of the given datasets, which are not
trivial to obtain with sufficient quality. Although reinforcement learning
approaches have been recently used to achieve autonomous collection
of grasp datasets, the existing algorithms are often limited to specific
grasp types. In this paper, we present a framework for hierarchical reinforcement
learning of grasping policies. In our framework, the lowerlevel
hierarchy learns multiple grasp types, and the upper-level hierarchy
learns a policy to select from the learned grasp types according to a point
cloud of a new object. Through experiments, we validate that our approach
learns grasping by constructing the grasp dataset autonomously.
The experimental results show that our approach learns multiple grasping
policies and generalizes the learned grasps by using local point cloud
information
Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets
Imitation learning has traditionally been applied to learn a single task from
demonstrations thereof. The requirement of structured and isolated
demonstrations limits the scalability of imitation learning approaches as they
are difficult to apply to real-world scenarios, where robots have to be able to
execute a multitude of tasks. In this paper, we propose a multi-modal imitation
learning framework that is able to segment and imitate skills from unlabelled
and unstructured demonstrations by learning skill segmentation and imitation
learning jointly. The extensive simulation results indicate that our method can
efficiently separate the demonstrations into individual skills and learn to
imitate them using a single multi-modal policy. The video of our experiments is
available at http://sites.google.com/view/nips17intentionganComment: Paper accepted to NIPS 201
Hierarchical Policy Search via Return-Weighted Density Estimation
Learning an optimal policy from a multi-modal reward function is a
challenging problem in reinforcement learning (RL). Hierarchical RL (HRL)
tackles this problem by learning a hierarchical policy, where multiple option
policies are in charge of different strategies corresponding to modes of a
reward function and a gating policy selects the best option for a given
context. Although HRL has been demonstrated to be promising, current
state-of-the-art methods cannot still perform well in complex real-world
problems due to the difficulty of identifying modes of the reward function. In
this paper, we propose a novel method called hierarchical policy search via
return-weighted density estimation (HPSDE), which can efficiently identify the
modes through density estimation with return-weighted importance sampling. Our
proposed method finds option policies corresponding to the modes of the return
function and automatically determines the number and the location of option
policies, which significantly reduces the burden of hyper-parameters tuning.
Through experiments, we demonstrate that the proposed HPSDE successfully learns
option policies corresponding to modes of the return function and that it can
be successfully applied to a challenging motion planning problem of a redundant
robotic manipulator.Comment: The 32nd AAAI Conference on Artificial Intelligence (AAAI 2018), 9
page
Neural Task Programming: Learning to Generalize Across Hierarchical Tasks
In this work, we propose a novel robot learning framework called Neural Task
Programming (NTP), which bridges the idea of few-shot learning from
demonstration and neural program induction. NTP takes as input a task
specification (e.g., video demonstration of a task) and recursively decomposes
it into finer sub-task specifications. These specifications are fed to a
hierarchical neural program, where bottom-level programs are callable
subroutines that interact with the environment. We validate our method in three
robot manipulation tasks. NTP achieves strong generalization across sequential
tasks that exhibit hierarchal and compositional structures. The experimental
results show that NTP learns to generalize well to- wards unseen tasks with
increasing lengths, variable topologies, and changing objectives.Comment: ICRA 201
Model Learning for Look-ahead Exploration in Continuous Control
We propose an exploration method that incorporates look-ahead search over
basic learnt skills and their dynamics, and use it for reinforcement learning
(RL) of manipulation policies . Our skills are multi-goal policies learned in
isolation in simpler environments using existing multigoal RL formulations,
analogous to options or macroactions. Coarse skill dynamics, i.e., the state
transition caused by a (complete) skill execution, are learnt and are unrolled
forward during lookahead search. Policy search benefits from temporal
abstraction during exploration, though itself operates over low-level primitive
actions, and thus the resulting policies does not suffer from suboptimality and
inflexibility caused by coarse skill chaining. We show that the proposed
exploration strategy results in effective learning of complex manipulation
policies faster than current state-of-the-art RL methods, and converges to
better policies than methods that use options or parametrized skills as
building blocks of the policy itself, as opposed to guiding exploration. We
show that the proposed exploration strategy results in effective learning of
complex manipulation policies faster than current state-of-the-art RL methods,
and converges to better policies than methods that use options or parameterized
skills as building blocks of the policy itself, as opposed to guiding
exploration.Comment: This is a pre-print of our paper which is accepted in AAAI 201
Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning
Skilled robotic manipulation benefits from complex synergies between
non-prehensile (e.g. pushing) and prehensile (e.g. grasping) actions: pushing
can help rearrange cluttered objects to make space for arms and fingers;
likewise, grasping can help displace objects to make pushing movements more
precise and collision-free. In this work, we demonstrate that it is possible to
discover and learn these synergies from scratch through model-free deep
reinforcement learning. Our method involves training two fully convolutional
networks that map from visual observations to actions: one infers the utility
of pushes for a dense pixel-wise sampling of end effector orientations and
locations, while the other does the same for grasping. Both networks are
trained jointly in a Q-learning framework and are entirely self-supervised by
trial and error, where rewards are provided from successful grasps. In this
way, our policy learns pushing motions that enable future grasps, while
learning grasps that can leverage past pushes. During picking experiments in
both simulation and real-world scenarios, we find that our system quickly
learns complex behaviors amid challenging cases of clutter, and achieves better
grasping success rates and picking efficiencies than baseline alternatives
after only a few hours of training. We further demonstrate that our method is
capable of generalizing to novel objects. Qualitative results (videos), code,
pre-trained models, and simulation environments are available at
http://vpg.cs.princeton.eduComment: To appear at the International Conference On Intelligent Robots and
Systems (IROS) 2018. Project webpage: http://vpg.cs.princeton.edu Summary
video: https://youtu.be/-OkyX7Zlhi
- …