9,117 research outputs found
Combining Self-Supervised Learning and Imitation for Vision-Based Rope Manipulation
Manipulation of deformable objects, such as ropes and cloth, is an important
but challenging problem in robotics. We present a learning-based system where a
robot takes as input a sequence of images of a human manipulating a rope from
an initial to goal configuration, and outputs a sequence of actions that can
reproduce the human demonstration, using only monocular images as input. To
perform this task, the robot learns a pixel-level inverse dynamics model of
rope manipulation directly from images in a self-supervised manner, using about
60K interactions with the rope collected autonomously by the robot. The human
demonstration provides a high-level plan of what to do and the low-level
inverse model is used to execute the plan. We show that by combining the high
and low-level plans, the robot can successfully manipulate a rope into a
variety of target shapes using only a sequence of human-provided images for
direction.Comment: 8 pages, accepted to International Conference on Robotics and
Automation (ICRA) 201
Constraining the Size Growth of the Task Space with Socially Guided Intrinsic Motivation using Demonstrations
This paper presents an algorithm for learning a highly redundant inverse
model in continuous and non-preset environments. Our Socially Guided Intrinsic
Motivation by Demonstrations (SGIM-D) algorithm combines the advantages of both
social learning and intrinsic motivation, to specialise in a wide range of
skills, while lessening its dependence on the teacher. SGIM-D is evaluated on a
fishing skill learning experiment.Comment: JCAI Workshop on Agents Learning Interactively from Human Teachers
(ALIHT), Barcelona : Spain (2011
Time-Contrastive Networks: Self-Supervised Learning from Video
We propose a self-supervised approach for learning representations and
robotic behaviors entirely from unlabeled videos recorded from multiple
viewpoints, and study how this representation can be used in two robotic
imitation settings: imitating object interactions from videos of humans, and
imitating human poses. Imitation of human behavior requires a
viewpoint-invariant representation that captures the relationships between
end-effectors (hands or robot grippers) and the environment, object attributes,
and body pose. We train our representations using a metric learning loss, where
multiple simultaneous viewpoints of the same observation are attracted in the
embedding space, while being repelled from temporal neighbors which are often
visually similar but functionally different. In other words, the model
simultaneously learns to recognize what is common between different-looking
images, and what is different between similar-looking images. This signal
causes our model to discover attributes that do not change across viewpoint,
but do change across time, while ignoring nuisance variables such as
occlusions, motion blur, lighting and background. We demonstrate that this
representation can be used by a robot to directly mimic human poses without an
explicit correspondence, and that it can be used as a reward function within a
reinforcement learning algorithm. While representations are learned from an
unlabeled collection of task-related videos, robot behaviors such as pouring
are learned by watching a single 3rd-person demonstration by a human. Reward
functions obtained by following the human demonstrations under the learned
representation enable efficient reinforcement learning that is practical for
real-world robotic systems. Video results, open-source code and dataset are
available at https://sermanet.github.io/imitat
Learning Human-Robot Collaboration Insights through the Integration of Muscle Activity in Interaction Motion Models
Recent progress in human-robot collaboration makes fast and fluid
interactions possible, even when human observations are partial and occluded.
Methods like Interaction Probabilistic Movement Primitives (ProMP) model human
trajectories through motion capture systems. However, such representation does
not properly model tasks where similar motions handle different objects. Under
current approaches, a robot would not adapt its pose and dynamics for proper
handling. We integrate the use of Electromyography (EMG) into the Interaction
ProMP framework and utilize muscular signals to augment the human observation
representation. The contribution of our paper is increased task discernment
when trajectories are similar but tools are different and require the robot to
adjust its pose for proper handling. Interaction ProMPs are used with an
augmented vector that integrates muscle activity. Augmented time-normalized
trajectories are used in training to learn correlation parameters and robot
motions are predicted by finding the best weight combination and temporal
scaling for a task. Collaborative single task scenarios with similar motions
but different objects were used and compared. For one experiment only joint
angles were recorded, for the other EMG signals were additionally integrated.
Task recognition was computed for both tasks. Observation state vectors with
augmented EMG signals were able to completely identify differences across
tasks, while the baseline method failed every time. Integrating EMG signals
into collaborative tasks significantly increases the ability of the system to
recognize nuances in the tasks that are otherwise imperceptible, up to 74.6% in
our studies. Furthermore, the integration of EMG signals for collaboration also
opens the door to a wide class of human-robot physical interactions based on
haptic communication that has been largely unexploited in the field.Comment: 7 pages, 2 figures, 2 tables. As submitted to Humanoids 201
Efficiently Combining Human Demonstrations and Interventions for Safe Training of Autonomous Systems in Real-Time
This paper investigates how to utilize different forms of human interaction
to safely train autonomous systems in real-time by learning from both human
demonstrations and interventions. We implement two components of the
Cycle-of-Learning for Autonomous Systems, which is our framework for combining
multiple modalities of human interaction. The current effort employs human
demonstrations to teach a desired behavior via imitation learning, then
leverages intervention data to correct for undesired behaviors produced by the
imitation learner to teach novel tasks to an autonomous agent safely, after
only minutes of training. We demonstrate this method in an autonomous perching
task using a quadrotor with continuous roll, pitch, yaw, and throttle commands
and imagery captured from a downward-facing camera in a high-fidelity simulated
environment. Our method improves task completion performance for the same
amount of human interaction when compared to learning from demonstrations
alone, while also requiring on average 32% less data to achieve that
performance. This provides evidence that combining multiple modes of human
interaction can increase both the training speed and overall performance of
policies for autonomous systems.Comment: 9 pages, 6 figure
- …