1,609 research outputs found
SA-Net: Deep Neural Network for Robot Trajectory Recognition from RGB-D Streams
Learning from demonstration (LfD) and imitation learning offer new paradigms
for transferring task behavior to robots. A class of methods that enable such
online learning require the robot to observe the task being performed and
decompose the sensed streaming data into sequences of state-action pairs, which
are then input to the methods. Thus, recognizing the state-action pairs
correctly and quickly in sensed data is a crucial prerequisite for these
methods. We present SA-Net a deep neural network architecture that recognizes
state-action pairs from RGB-D data streams. SA-Net performed well in two
diverse robotic applications of LfD -- one involving mobile ground robots and
another involving a robotic manipulator -- which demonstrates that the
architecture generalizes well to differing contexts. Comprehensive evaluations
including deployment on a physical robot show that \sanet{} significantly
improves on the accuracy of the previous method that utilizes traditional image
processing and segmentation.Comment: (in press
Time-Contrastive Networks: Self-Supervised Learning from Video
We propose a self-supervised approach for learning representations and
robotic behaviors entirely from unlabeled videos recorded from multiple
viewpoints, and study how this representation can be used in two robotic
imitation settings: imitating object interactions from videos of humans, and
imitating human poses. Imitation of human behavior requires a
viewpoint-invariant representation that captures the relationships between
end-effectors (hands or robot grippers) and the environment, object attributes,
and body pose. We train our representations using a metric learning loss, where
multiple simultaneous viewpoints of the same observation are attracted in the
embedding space, while being repelled from temporal neighbors which are often
visually similar but functionally different. In other words, the model
simultaneously learns to recognize what is common between different-looking
images, and what is different between similar-looking images. This signal
causes our model to discover attributes that do not change across viewpoint,
but do change across time, while ignoring nuisance variables such as
occlusions, motion blur, lighting and background. We demonstrate that this
representation can be used by a robot to directly mimic human poses without an
explicit correspondence, and that it can be used as a reward function within a
reinforcement learning algorithm. While representations are learned from an
unlabeled collection of task-related videos, robot behaviors such as pouring
are learned by watching a single 3rd-person demonstration by a human. Reward
functions obtained by following the human demonstrations under the learned
representation enable efficient reinforcement learning that is practical for
real-world robotic systems. Video results, open-source code and dataset are
available at https://sermanet.github.io/imitat
A Hierarchical Bayesian model for Inverse RL in Partially-Controlled Environments
Robots learning from observations in the real world using inverse
reinforcement learning (IRL) may encounter objects or agents in the
environment, other than the expert, that cause nuisance observations during the
demonstration. These confounding elements are typically removed in
fully-controlled environments such as virtual simulations or lab settings. When
complete removal is impossible the nuisance observations must be filtered out.
However, identifying the source of observations when large amounts of
observations are made is difficult. To address this, we present a hierarchical
Bayesian model that incorporates both the expert's and the confounding
elements' observations thereby explicitly modeling the diverse observations a
robot may receive. We extend an existing IRL algorithm originally designed to
work under partial occlusion of the expert to consider the diverse
observations. In a simulated robotic sorting domain containing both occlusion
and confounding elements, we demonstrate the model's effectiveness. In
particular, our technique outperforms several other comparative methods, second
only to having perfect knowledge of the subject's trajectory.Comment: 8 pages, 10 figure
Occlusion-Aware Crowd Navigation Using People as Sensors
Autonomous navigation in crowded spaces poses a challenge for mobile robots
due to the highly dynamic, partially observable environment. Occlusions are
highly prevalent in such settings due to a limited sensor field of view and
obstructing human agents. Previous work has shown that observed interactive
behaviors of human agents can be used to estimate potential obstacles despite
occlusions. We propose integrating such social inference techniques into the
planning pipeline. We use a variational autoencoder with a specially designed
loss function to learn representations that are meaningful for occlusion
inference. This work adopts a deep reinforcement learning approach to
incorporate the learned representation for occlusion-aware planning. In
simulation, our occlusion-aware policy achieves comparable collision avoidance
performance to fully observable navigation by estimating agents in occluded
spaces. We demonstrate successful policy transfer from simulation to the
real-world Turtlebot 2i. To the best of our knowledge, this work is the first
to use social occlusion inference for crowd navigation.Comment: 7 pages, 4 figure
- …