14,851 research outputs found
Self-Supervised Sim-to-Real Adaptation for Visual Robotic Manipulation
Collecting and automatically obtaining reward signals from real robotic
visual data for the purposes of training reinforcement learning algorithms can
be quite challenging and time-consuming. Methods for utilizing unlabeled data
can have a huge potential to further accelerate robotic learning. We consider
here the problem of performing manipulation tasks from pixels. In such tasks,
choosing an appropriate state representation is crucial for planning and
control. This is even more relevant with real images where noise, occlusions
and resolution affect the accuracy and reliability of state estimation. In this
work, we learn a latent state representation implicitly with deep reinforcement
learning in simulation, and then adapt it to the real domain using unlabeled
real robot data. We propose to do so by optimizing sequence-based self
supervised objectives. These exploit the temporal nature of robot experience,
and can be common in both the simulated and real domains, without assuming any
alignment of underlying states in simulated and unlabeled real images. We
propose Contrastive Forward Dynamics loss, which combines dynamics model
learning with time-contrastive techniques. The learned state representation
that results from our methods can be used to robustly solve a manipulation task
in simulation and to successfully transfer the learned skill on a real system.
We demonstrate the effectiveness of our approaches by training a vision-based
reinforcement learning agent for cube stacking. Agents trained with our method,
using only 5 hours of unlabeled real robot data for adaptation, shows a clear
improvement over domain randomization, and standard visual domain adaptation
techniques for sim-to-real transfer
Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks
Contact-rich manipulation tasks in unstructured environments often require
both haptic and visual feedback. However, it is non-trivial to manually design
a robot controller that combines modalities with very different
characteristics. While deep reinforcement learning has shown success in
learning control policies for high-dimensional inputs, these algorithms are
generally intractable to deploy on real robots due to sample complexity. We use
self-supervision to learn a compact and multimodal representation of our
sensory inputs, which can then be used to improve the sample efficiency of our
policy learning. We evaluate our method on a peg insertion task, generalizing
over different geometry, configurations, and clearances, while being robust to
external perturbations. Results for simulated and real robot experiments are
presented.Comment: ICRA 201
Reinforcement and Imitation Learning for Diverse Visuomotor Skills
We propose a model-free deep reinforcement learning method that leverages a
small amount of demonstration data to assist a reinforcement learning agent. We
apply this approach to robotic manipulation tasks and train end-to-end
visuomotor policies that map directly from RGB camera inputs to joint
velocities. We demonstrate that our approach can solve a wide variety of
visuomotor tasks, for which engineering a scripted controller would be
laborious. In experiments, our reinforcement and imitation agent achieves
significantly better performances than agents trained with reinforcement
learning or imitation learning alone. We also illustrate that these policies,
trained with large visual and dynamics variations, can achieve preliminary
successes in zero-shot sim2real transfer. A brief visual description of this
work can be viewed in https://youtu.be/EDl8SQUNjj0Comment: 13 pages, 6 figures, Published in RSS 201
Time Reversal as Self-Supervision
A longstanding challenge in robot learning for manipulation tasks has been
the ability to generalize to varying initial conditions, diverse objects, and
changing objectives. Learning based approaches have shown promise in producing
robust policies, but require heavy supervision to efficiently learn precise
control, especially from visual inputs. We propose a novel self-supervision
technique that uses time-reversal to learn goals and provide a high level plan
to reach them. In particular, we introduce the time-reversal model (TRM), a
self-supervised model which explores outward from a set of goal states and
learns to predict these trajectories in reverse. This provides a high level
plan towards goals, allowing us to learn complex manipulation tasks with no
demonstrations or exploration at test time. We test our method on the domain of
assembly, specifically the mating of tetris-style block pairs. Using our method
operating atop visual model predictive control, we are able to assemble tetris
blocks on a physical robot using only uncalibrated RGB camera input, and
generalize to unseen block pairs. sites.google.com/view/time-reversalComment: 7 pages, 10 figure
VRGym: A Virtual Testbed for Physical and Interactive AI
We propose VRGym, a virtual reality testbed for realistic human-robot
interaction. Different from existing toolkits and virtual reality environments,
the VRGym emphasizes on building and training both physical and interactive
agents for robotics, machine learning, and cognitive science. VRGym leverages
mechanisms that can generate diverse 3D scenes with high realism through
physics-based simulation. We demonstrate that VRGym is able to (i) collect
human interactions and fine manipulations, (ii) accommodate various robots with
a ROS bridge, (iii) support experiments for human-robot interaction, and (iv)
provide toolkits for training the state-of-the-art machine learning algorithms.
We hope VRGym can help to advance general-purpose robotics and machine learning
agents, as well as assisting human studies in the field of cognitive science
Learning to Poke by Poking: Experiential Learning of Intuitive Physics
We investigate an experiential learning paradigm for acquiring an internal
model of intuitive physics. Our model is evaluated on a real-world robotic
manipulation task that requires displacing objects to target locations by
poking. The robot gathered over 400 hours of experience by executing more than
100K pokes on different objects. We propose a novel approach based on deep
neural networks for modeling the dynamics of robot's interactions directly from
images, by jointly estimating forward and inverse models of dynamics. The
inverse model objective provides supervision to construct informative visual
features, which the forward model can then predict and in turn regularize the
feature space for the inverse model. The interplay between these two objectives
creates useful, accurate models that can then be used for multi-step decision
making. This formulation has the additional benefit that it is possible to
learn forward models in an abstract feature space and thus alleviate the need
of predicting pixels. Our experiments show that this joint modeling approach
outperforms alternative methods
When Autonomous Systems Meet Accuracy and Transferability through AI: A Survey
With widespread applications of artificial intelligence (AI), the
capabilities of the perception, understanding, decision-making and control for
autonomous systems have improved significantly in the past years. When
autonomous systems consider the performance of accuracy and transferability,
several AI methods, like adversarial learning, reinforcement learning (RL) and
meta-learning, show their powerful performance. Here, we review the
learning-based approaches in autonomous systems from the perspectives of
accuracy and transferability. Accuracy means that a well-trained model shows
good results during the testing phase, in which the testing set shares a same
task or a data distribution with the training set. Transferability means that
when a well-trained model is transferred to other testing domains, the accuracy
is still good. Firstly, we introduce some basic concepts of transfer learning
and then present some preliminaries of adversarial learning, RL and
meta-learning. Secondly, we focus on reviewing the accuracy or transferability
or both of them to show the advantages of adversarial learning, like generative
adversarial networks (GANs), in typical computer vision tasks in autonomous
systems, including image style transfer, image superresolution, image
deblurring/dehazing/rain removal, semantic segmentation, depth estimation,
pedestrian detection and person re-identification (re-ID). Then, we further
review the performance of RL and meta-learning from the aspects of accuracy or
transferability or both of them in autonomous systems, involving pedestrian
tracking, robot navigation and robotic manipulation. Finally, we discuss
several challenges and future topics for using adversarial learning, RL and
meta-learning in autonomous systems
Learning Task-Oriented Grasping for Tool Manipulation from Simulated Self-Supervision
Tool manipulation is vital for facilitating robots to complete challenging
task goals. It requires reasoning about the desired effect of the task and thus
properly grasping and manipulating the tool to achieve the task. Task-agnostic
grasping optimizes for grasp robustness while ignoring crucial task-specific
constraints. In this paper, we propose the Task-Oriented Grasping Network
(TOG-Net) to jointly optimize both task-oriented grasping of a tool and the
manipulation policy for that tool. The training process of the model is based
on large-scale simulated self-supervision with procedurally generated tool
objects. We perform both simulated and real-world experiments on two tool-based
manipulation tasks: sweeping and hammering. Our model achieves overall 71.1%
task success rate for sweeping and 80.0% task success rate for hammering.
Supplementary material is available at: bit.ly/task-oriented-graspComment: RSS 201
Task-Embedded Control Networks for Few-Shot Imitation Learning
Much like humans, robots should have the ability to leverage knowledge from
previously learned tasks in order to learn new tasks quickly in new and
unfamiliar environments. Despite this, most robot learning approaches have
focused on learning a single task, from scratch, with a limited notion of
generalisation, and no way of leveraging the knowledge to learn other tasks
more efficiently. One possible solution is meta-learning, but many of the
related approaches are limited in their ability to scale to a large number of
tasks and to learn further tasks without forgetting previously learned ones.
With this in mind, we introduce Task-Embedded Control Networks, which employ
ideas from metric learning in order to create a task embedding that can be used
by a robot to learn new tasks from one or more demonstrations. In the area of
visually-guided manipulation, we present simulation results in which we surpass
the performance of a state-of-the-art method when using only visual information
from each demonstration. Additionally, we demonstrate that our approach can
also be used in conjunction with domain randomisation to train our few-shot
learning ability in simulation and then deploy in the real world without any
additional training. Once deployed, the robot can learn new tasks from a single
real-world demonstration.Comment: Published at the Conference on Robot Learning (CoRL) 201
Dexterous Manipulation with Deep Reinforcement Learning: Efficient, General, and Low-Cost
Dexterous multi-fingered robotic hands can perform a wide range of
manipulation skills, making them an appealing component for general-purpose
robotic manipulators. However, such hands pose a major challenge for autonomous
control, due to the high dimensionality of their configuration space and
complex intermittent contact interactions. In this work, we propose deep
reinforcement learning (deep RL) as a scalable solution for learning complex,
contact rich behaviors with multi-fingered hands. Deep RL provides an
end-to-end approach to directly map sensor readings to actions, without the
need for task specific models or policy classes. We show that contact-rich
manipulation behavior with multi-fingered hands can be learned by directly
training with model-free deep RL algorithms in the real world, with minimal
additional assumption and without the aid of simulation. We learn a variety of
complex behaviors on two different low-cost hardware platforms. We show that
each task can be learned entirely from scratch, and further study how the
learning process can be further accelerated by using a small number of human
demonstrations to bootstrap learning. Our experiments demonstrate that complex
multi-fingered manipulation skills can be learned in the real world in about
4-7 hours for most tasks, and that demonstrations can decrease this to 2-3
hours, indicating that direct deep RL training in the real world is a viable
and practical alternative to simulation and model-based control.
\url{https://sites.google.com/view/deeprl-handmanipulation}Comment: https://sites.google.com/view/deeprl-handmanipulatio
- …