15 research outputs found
Robot Learning with Crash Constraints
In the past decade, numerous machine learning algorithms have been shown to
successfully learn optimal policies to control real robotic systems. However,
it is common to encounter failing behaviors as the learning loop progresses.
Specifically, in robot applications where failing is undesired but not
catastrophic, many algorithms struggle with leveraging data obtained from
failures. This is usually caused by (i) the failed experiment ending
prematurely, or (ii) the acquired data being scarce or corrupted. Both
complicate the design of proper reward functions to penalize failures. In this
paper, we propose a framework that addresses those issues. We consider failing
behaviors as those that violate a constraint and address the problem of
learning with crash constraints, where no data is obtained upon constraint
violation. The no-data case is addressed by a novel GP model (GPCR) for the
constraint that combines discrete events (failure/success) with continuous
observations (only obtained upon success). We demonstrate the effectiveness of
our framework on simulated benchmarks and on a real jumping quadruped, where
the constraint threshold is unknown a priori. Experimental data is collected,
by means of constrained Bayesian optimization, directly on the real robot. Our
results outperform manual tuning and GPCR proves useful on estimating the
constraint threshold.Comment: 8 pages, 4 figures, 1 table, 1 algorithm. Accepted for publication in
IEEE Robotics and Automation Letters (RA-L). Video demonstration of the
experiments available at https://youtu.be/RAiIo0l6_rE . Algorithm
implementation available at
https://github.com/alonrot/classified_regression.gi
Inferring Versatile Behavior from Demonstrations by Matching Geometric Descriptors
Humans intuitively solve tasks in versatile ways, varying their behavior in
terms of trajectory-based planning and for individual steps. Thus, they can
easily generalize and adapt to new and changing environments. Current Imitation
Learning algorithms often only consider unimodal expert demonstrations and act
in a state-action-based setting, making it difficult for them to imitate human
behavior in case of versatile demonstrations. Instead, we combine a mixture of
movement primitives with a distribution matching objective to learn versatile
behaviors that match the expert's behavior and versatility. To facilitate
generalization to novel task configurations, we do not directly match the
agent's and expert's trajectory distributions but rather work with concise
geometric descriptors which generalize well to unseen task configurations. We
empirically validate our method on various robot tasks using versatile human
demonstrations and compare to imitation learning algorithms in a state-action
setting as well as a trajectory-based setting. We find that the geometric
descriptors greatly help in generalizing to new task configurations and that
combining them with our distribution-matching objective is crucial for
representing and reproducing versatile behavior.Comment: Accepted as a poster at the 6th Conference on Robot Learning (CoRL),
202
Inferring Versatile Behavior from Demonstrations by Matching Geometric Descriptors
Humans intuitively solve tasks in versatile ways, varying their behavior in
terms of trajectory-based planning and for individual steps. Thus, they can
easily generalize and adapt to new and changing environments. Current Imitation
Learning algorithms often only consider unimodal expert demonstrations and act
in a state-action-based setting, making it difficult for them to imitate human
behavior in case of versatile demonstrations. Instead, we combine a mixture of
movement primitives with a distribution matching objective to learn versatile
behaviors that match the expert's behavior and versatility. To facilitate
generalization to novel task configurations, we do not directly match the
agent's and expert's trajectory distributions but rather work with concise
geometric descriptors which generalize well to unseen task configurations. We
empirically validate our method on various robot tasks using versatile human
demonstrations and compare to imitation learning algorithms in a state-action
setting as well as a trajectory-based setting. We find that the geometric
descriptors greatly help in generalizing to new task configurations and that
combining them with our distribution-matching objective is crucial for
representing and reproducing versatile behavior.Comment: Accepted as a poster at the 6th Conference on Robot Learning (CoRL),
202
MimicPlay: Long-Horizon Imitation Learning by Watching Human Play
Imitation learning from human demonstrations is a promising paradigm for
teaching robots manipulation skills in the real world. However, learning
complex long-horizon tasks often requires an unattainable amount of
demonstrations. To reduce the high data requirement, we resort to human play
data - video sequences of people freely interacting with the environment using
their hands. Even with different morphologies, we hypothesize that human play
data contain rich and salient information about physical interactions that can
readily facilitate robot policy learning. Motivated by this, we introduce a
hierarchical learning framework named MimicPlay that learns latent plans from
human play data to guide low-level visuomotor control trained on a small number
of teleoperated demonstrations. With systematic evaluations of 14 long-horizon
manipulation tasks in the real world, we show that MimicPlay outperforms
state-of-the-art imitation learning methods in task success rate,
generalization ability, and robustness to disturbances. Code and videos are
available at https://mimic-play.github.ioComment: 7th Conference on Robot Learning (CoRL 2023 oral presentation