746 research outputs found
Programming by Demonstration on Riemannian Manifolds
This thesis presents a Riemannian approach to Programming by Demonstration (PbD).
It generalizes an existing PbD method from Euclidean manifolds to Riemannian manifolds.
In this abstract, we review the objectives, methods and contributions of the presented
approach.
OBJECTIVES
PbD aims at providing a user-friendly method for skill transfer between human and
robot. It enables a user to teach a robot new tasks using few demonstrations. In order
to surpass simple record-and-replay, methods for PbD need to \u2018understand\u2019 what to
imitate; they need to extract the functional goals of a task from the demonstration data.
This is typically achieved through the application of statisticalmethods.
The variety of data encountered in robotics is large. Typical manipulation tasks involve
position, orientation, stiffness, force and torque data. These data are not solely
Euclidean. Instead, they originate from a variety of manifolds, curved spaces that are
only locally Euclidean. Elementary operations, such as summation, are not defined on
manifolds. Consequently, standard statistical methods are not well suited to analyze
demonstration data that originate fromnon-Euclidean manifolds. In order to effectively
extract what-to-imitate, methods for PbD should take into account the underlying geometry
of the demonstration manifold; they should be geometry-aware.
Successful task execution does not solely depend on the control of individual task
variables. By controlling variables individually, a task might fail when one is perturbed
and the others do not respond. Task execution also relies on couplings among task variables.
These couplings describe functional relations which are often called synergies. In
order to understand what-to-imitate, PbDmethods should be able to extract and encode
synergies; they should be synergetic.
In unstructured environments, it is unlikely that tasks are found in the same scenario
twice. The circumstances under which a task is executed\u2014the task context\u2014are more
likely to differ each time it is executed. Task context does not only vary during task execution,
it also varies while learning and recognizing tasks. To be effective, a robot should
be able to learn, recognize and synthesize skills in a variety of familiar and unfamiliar
contexts; this can be achieved when its skill representation is context-adaptive.
THE RIEMANNIAN APPROACH
In this thesis, we present a skill representation that is geometry-aware, synergetic and
context-adaptive. The presented method is probabilistic; it assumes that demonstrations
are samples from an unknown probability distribution. This distribution is approximated
using a Riemannian GaussianMixtureModel (GMM).
Instead of using the \u2018standard\u2019 Euclidean Gaussian, we rely on the Riemannian Gaussian\u2014
a distribution akin the Gaussian, but defined on a Riemannian manifold. A Riev
mannian manifold is a manifold\u2014a curved space which is locally Euclidean\u2014that provides
a notion of distance. This notion is essential for statistical methods as such methods
rely on a distance measure. Examples of Riemannian manifolds in robotics are: the
Euclidean spacewhich is used for spatial data, forces or torques; the spherical manifolds,
which can be used for orientation data defined as unit quaternions; and Symmetric Positive
Definite (SPD) manifolds, which can be used to represent stiffness and manipulability.
The Riemannian Gaussian is intrinsically geometry-aware. Its definition is based on
the geometry of the manifold, and therefore takes into account the manifold curvature.
In robotics, the manifold structure is often known beforehand. In the case of PbD, it follows
from the structure of the demonstration data. Like the Gaussian distribution, the
Riemannian Gaussian is defined by a mean and covariance. The covariance describes
the variance and correlation among the state variables. These can be interpreted as local
functional couplings among state variables: synergies. This makes the Riemannian
Gaussian synergetic. Furthermore, information encoded in multiple Riemannian Gaussians
can be fused using the Riemannian product of Gaussians. This feature allows us to
construct a probabilistic context-adaptive task representation.
CONTRIBUTIONS
In particular, this thesis presents a generalization of existing methods of PbD, namely
GMM-GMR and TP-GMM. This generalization involves the definition ofMaximum Likelihood
Estimate (MLE), Gaussian conditioning and Gaussian product for the Riemannian
Gaussian, and the definition of ExpectationMaximization (EM) and GaussianMixture
Regression (GMR) for the Riemannian GMM. In this generalization, we contributed
by proposing to use parallel transport for Gaussian conditioning. Furthermore, we presented
a unified approach to solve the aforementioned operations using aGauss-Newton
algorithm. We demonstrated how synergies, encoded in a Riemannian Gaussian, can be
transformed into synergetic control policies using standard methods for LinearQuadratic
Regulator (LQR). This is achieved by formulating the LQR problem in a (Euclidean) tangent
space of the Riemannian manifold. Finally, we demonstrated how the contextadaptive
Task-Parameterized Gaussian Mixture Model (TP-GMM) can be used for context
inference\u2014the ability to extract context from demonstration data of known tasks.
Our approach is the first attempt of context inference in the light of TP-GMM. Although
effective, we showed that it requires further improvements in terms of speed and reliability.
The efficacy of the Riemannian approach is demonstrated in a variety of scenarios.
In shared control, the Riemannian Gaussian is used to represent control intentions of a
human operator and an assistive system. Doing so, the properties of the Gaussian can
be employed to mix their control intentions. This yields shared-control systems that
continuously re-evaluate and assign control authority based on input confidence. The
context-adaptive TP-GMMis demonstrated in a Pick & Place task with changing pick and
place locations, a box-taping task with changing box sizes, and a trajectory tracking task
typically found in industr
Advancing Robot Autonomy for Long-Horizon Tasks
Autonomous robots have real-world applications in diverse fields, such as
mobile manipulation and environmental exploration, and many such tasks benefit
from a hands-off approach in terms of human user involvement over a long task
horizon. However, the level of autonomy achievable by a deployment is limited
in part by the problem definition or task specification required by the system.
Task specifications often require technical, low-level information that is
unintuitive to describe and may result in generic solutions, burdening the user
technically both before and after task completion. In this thesis, we aim to
advance task specification abstraction toward the goal of increasing robot
autonomy in real-world scenarios. We do so by tackling problems that address
several different angles of this goal. First, we develop a way for the
automatic discovery of optimal transition points between subtasks in the
context of constrained mobile manipulation, removing the need for the human to
hand-specify these in the task specification. We further propose a way to
automatically describe constraints on robot motion by using demonstrated data
as opposed to manually-defined constraints. Then, within the context of
environmental exploration, we propose a flexible task specification framework,
requiring just a set of quantiles of interest from the user that allows the
robot to directly suggest locations in the environment for the user to study.
We next systematically study the effect of including a robot team in the task
specification and show that multirobot teams have the ability to improve
performance under certain specification conditions, including enabling
inter-robot communication. Finally, we propose methods for a communication
protocol that autonomously selects useful but limited information to share with
the other robots.Comment: PhD dissertation. 160 page
Learning from Demonstration with Weakly Supervised Disentanglement
Robotic manipulation tasks, such as wiping with a soft sponge, require
control from multiple rich sensory modalities. Human-robot interaction, aimed
at teaching robots, is difficult in this setting as there is potential for
mismatch between human and machine comprehension of the rich data streams. We
treat the task of interpretable learning from demonstration as an optimisation
problem over a probabilistic generative model. To account for the
high-dimensionality of the data, a high-capacity neural network is chosen to
represent the model. The latent variables in this model are explicitly aligned
with high-level notions and concepts that are manifested in a set of
demonstrations. We show that such alignment is best achieved through the use of
labels from the end user, in an appropriately restricted vocabulary, in
contrast to the conventional approach of the designer picking a prior over the
latent variables. Our approach is evaluated in the context of two table-top
robot manipulation tasks performed by a PR2 robot -- that of dabbing liquids
with a sponge (forcefully pressing a sponge and moving it along a surface) and
pouring between different containers. The robot provides visual information,
arm joint positions and arm joint efforts. We have made videos of the tasks and
data available - see supplementary materials at:
https://sites.google.com/view/weak-label-lfd.Comment: 18 pages, 16 figures, accepted at the International Conference on
Learning Representations (ICLR) 2021, supplementary website at
https://sites.google.com/view/weak-label-lf
TimewarpVAE: Simultaneous Time-Warping and Representation Learning of Trajectories
Human demonstrations of trajectories are an important source of training data
for many machine learning problems. However, the difficulty of collecting human
demonstration data for complex tasks makes learning efficient representations
of those trajectories challenging. For many problems, such as for handwriting
or for quasistatic dexterous manipulation, the exact timings of the
trajectories should be factored from their spatial path characteristics. In
this work, we propose TimewarpVAE, a fully differentiable manifold-learning
algorithm that incorporates Dynamic Time Warping (DTW) to simultaneously learn
both timing variations and latent factors of spatial variation. We show how the
TimewarpVAE algorithm learns appropriate time alignments and meaningful
representations of spatial variations in small handwriting and fork
manipulation datasets. Our results have lower spatial reconstruction test error
than baseline approaches and the learned low-dimensional representations can be
used to efficiently generate semantically meaningful novel trajectories.Comment: 17 pages, 12 figure
K-VIL: Keypoints-based Visual Imitation Learning
Visual imitation learning provides efficient and intuitive solutions for
robotic systems to acquire novel manipulation skills. However, simultaneously
learning geometric task constraints and control policies from visual inputs
alone remains a challenging problem. In this paper, we propose an approach for
keypoint-based visual imitation (K-VIL) that automatically extracts sparse,
object-centric, and embodiment-independent task representations from a small
number of human demonstration videos. The task representation is composed of
keypoint-based geometric constraints on principal manifolds, their associated
local frames, and the movement primitives that are then needed for the task
execution. Our approach is capable of extracting such task representations from
a single demonstration video, and of incrementally updating them when new
demonstrations become available. To reproduce manipulation skills using the
learned set of prioritized geometric constraints in novel scenes, we introduce
a novel keypoint-based admittance controller. We evaluate our approach in
several real-world applications, showcasing its ability to deal with cluttered
scenes, new instances of categorical objects, and large object pose and shape
variations, as well as its efficiency and robustness in both one-shot and
few-shot imitation learning settings. Videos and source code are available at
https://sites.google.com/view/k-vil
- …