57 research outputs found
From virtual demonstration to real-world manipulation using LSTM and MDN
Robots assisting the disabled or elderly must perform complex manipulation
tasks and must adapt to the home environment and preferences of their user.
Learning from demonstration is a promising choice, that would allow the
non-technical user to teach the robot different tasks. However, collecting
demonstrations in the home environment of a disabled user is time consuming,
disruptive to the comfort of the user, and presents safety challenges. It would
be desirable to perform the demonstrations in a virtual environment. In this
paper we describe a solution to the challenging problem of behavior transfer
from virtual demonstration to a physical robot. The virtual demonstrations are
used to train a deep neural network based controller, which is using a Long
Short Term Memory (LSTM) recurrent neural network to generate trajectories. The
training process uses a Mixture Density Network (MDN) to calculate an error
signal suitable for the multimodal nature of demonstrations. The controller
learned in the virtual environment is transferred to a physical robot (a
Rethink Robotics Baxter). An off-the-shelf vision component is used to
substitute for geometric knowledge available in the simulation and an inverse
kinematics module is used to allow the Baxter to enact the trajectory. Our
experimental studies validate the three contributions of the paper: (1) the
controller learned from virtual demonstrations can be used to successfully
perform the manipulation tasks on a physical robot, (2) the LSTM+MDN
architectural choice outperforms other choices, such as the use of feedforward
networks and mean-squared error based training signals and (3) allowing
imperfect demonstrations in the training set also allows the controller to
learn how to correct its manipulation mistakes
Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End Learning from Demonstration
We propose a technique for multi-task learning from demonstration that trains
the controller of a low-cost robotic arm to accomplish several complex picking
and placing tasks, as well as non-prehensile manipulation. The controller is a
recurrent neural network using raw images as input and generating robot arm
trajectories, with the parameters shared across the tasks. The controller also
combines VAE-GAN-based reconstruction with autoregressive multimodal action
prediction. Our results demonstrate that it is possible to learn complex
manipulation tasks, such as picking up a towel, wiping an object, and
depositing the towel to its previous position, entirely from raw images with
direct behavior cloning. We show that weight sharing and reconstruction-based
regularization substantially improve generalization and robustness, and
training on multiple tasks simultaneously increases the success rate on all
tasks
Task Focused Robotic Imitation Learning
For many years, successful applications of robotics were the domain of controlled environments, such as industrial assembly lines. Such environments are custom designed for the convenience of the robot and separated from human operators. In recent years, advances in artificial intelligence, in particular, deep learning and computer vision, allowed researchers to successfully demonstrate robots that operate in unstructured environments and directly interact with humans. One of the major applications of such robots is in assistive robotics. For instance, a wheelchair mounted robotic arm can help disabled users in the performance of activities of daily living (ADLs) such as feeding and personal grooming. Early systems relied entirely on the control of the human operator, something that is difficult to accomplish by a user with motor and/or cognitive disabilities. In this dissertation, we are describing research results that advance the field of assistive robotics. The overall goal is to improve the ability of the wheelchair / robotic arm assembly to help the user with the performance of the ADLs by requiring only high-level commands from the user. Let us consider an ADL involving the manipulation of an object in the user\u27s home. This task can be naturally decomposed into two components: the movement of the wheelchair in such a way that the manipulator can conveniently grasp the object and the movement of the manipulator itself. This dissertation we provide an approach for addressing the challenge of finding the position appropriate for the required manipulation. We introduce the ease-of-reach score (ERS), a metric that quantifies the preferences for the positioning of the base while taking into consideration the shape and position of obstacles and clutter in the environment. As the brute force computation of ERS is computationally expensive, we propose a machine learning approach to estimate the ERS based on features and characteristics of the obstacles. This dissertation addresses the second component as well, the ability of the robotic arm to manipulate objects. Recent work in end-to-end learning of robotic manipulation had demonstrated that a deep learning-based controller of vision-enabled robotic arms can be thought to manipulate objects from a moderate number of demonstrations. However, the current state of the art systems are limited in robustness to physical and visual disturbances and do not generalize well to new objects. We describe new techniques based on task-focused attention that show significant improvement in the robustness of manipulation and performance in clutter
Teaching Robots Novel Objects by Pointing at Them
Robots that must operate in novel environments and collaborate with humans
must be capable of acquiring new knowledge from human experts during operation.
We propose teaching a robot novel objects it has not encountered before by
pointing a hand at the new object of interest. An end-to-end neural network is
used to attend to the novel object of interest indicated by the pointing hand
and then to localize the object in new scenes. In order to attend to the novel
object indicated by the pointing hand, we propose a spatial attention
modulation mechanism that learns to focus on the highlighted object while
ignoring the other objects in the scene. We show that a robot arm can
manipulate novel objects that are highlighted by pointing a hand at them. We
also evaluate the performance of the proposed architecture on a synthetic
dataset constructed using emojis and on a real-world dataset of common objects
An Unbiased Look at Datasets for Visuo-Motor Pre-Training
Visual representation learning hold great promise for robotics, but is
severely hampered by the scarcity and homogeneity of robotics datasets. Recent
works address this problem by pre-training visual representations on
large-scale but out-of-domain data (e.g., videos of egocentric interactions)
and then transferring them to target robotics tasks. While the field is heavily
focused on developing better pre-training algorithms, we find that dataset
choice is just as important to this paradigm's success. After all, the
representation can only learn the structures or priors present in the
pre-training dataset. To this end, we flip the focus on algorithms, and instead
conduct a dataset centric analysis of robotic pre-training. Our findings call
into question some common wisdom in the field. We observe that traditional
vision datasets (like ImageNet, Kinetics and 100 Days of Hands) are
surprisingly competitive options for visuo-motor representation learning, and
that the pre-training dataset's image distribution matters more than its size.
Finally, we show that common simulation benchmarks are not a reliable proxy for
real world performance and that simple regularization strategies can
dramatically improve real world policy learning.
https://data4robotics.github.ioComment: Accepted to CoRL 202
- …