139,661 research outputs found
Bridging Action Space Mismatch in Learning from Demonstrations
Learning from demonstrations (LfD) methods guide learning agents to a desired
solution using demonstrations from a teacher. While some LfD methods can handle
small mismatches in the action spaces of the teacher and student, here we
address the case where the teacher demonstrates the task in an action space
that can be substantially different from that of the student -- thereby
inducing a large action space mismatch. We bridge this gap with a framework,
Morphological Adaptation in Imitation Learning (MAIL), that allows training an
agent from demonstrations by other agents with significantly different
morphologies (from the student or each other). MAIL is able to learn from
suboptimal demonstrations, so long as they provide some guidance towards a
desired solution. We demonstrate MAIL on challenging household cloth
manipulation tasks and introduce a new DRY CLOTH task -- cloth manipulation in
3D task with obstacles. In these tasks, we train a visual control policy for a
robot with one end-effector using demonstrations from a simulated agent with
two end-effectors. MAIL shows up to 27% improvement over LfD and non-LfD
baselines. It is deployed to a real Franka Panda robot, and can handle multiple
variations in cloth properties (color, thickness, size, material) and pose
(rotation and translation). We further show generalizability to transfers from
n-to-m end-effectors, in the context of a simple rearrangement task
A Mathematical Framework for Unmanned Aerial Vehicle Obstacle Avoidance
The obstacle avoidance navigation problem for Unmanned Aerial Vehicles (UAVs) is a very challenging problem. It lies at the intersection of many fields such as probability, differential geometry, optimal control, and robotics. We build a mathematical framework to solve this problem for quadrotors using both a theoretical approach through a Hamiltonian system and a machine learning approach that learns from human sub-experts\u27 multiple demonstrations in obstacle avoidance. Prior research on the machine learning approach uses an algorithm that does not incorporate geometry. We have developed tools to solve and test the obstacle avoidance problem through mathematics
Curriculum-Based Imitation of Versatile Skills
Learning skills by imitation is a promising concept for the intuitive teaching of robots. A common way to learn such skills is to learn a parametric model by maximizing the likelihood given the demonstrations. Yet, human demonstrations are often multi-modal, i.e., the same task is solved in multiple ways which is a major challenge for most imitation learning methods that are based on such a maximum likelihood (ML) objective. The ML objective forces the model to cover all data, it prevents specialization in the context space and can cause mode-averaging in the behavior space, leading to suboptimal or potentially catastrophic behavior. Here, we alleviate those issues by introducing a curriculum using a weight for each data point, allowing the model to specialize on data it can represent while incentivizing it to cover as much data as possible by an entropy bonus. We extend our algorithm to a Mixture of (linear) Experts (MoE) such that the single components can specialize on local context regions, while the MoE covers all data points. We evaluate our approach in complex simulated and real robot control tasks and show it learns from versatile human demonstrations and significantly outperforms current SOTA methods. A reference implementation can be found at https://github.com/intuitive-robots/ml-cu
Curriculum-Based Imitation of Versatile Skills
Learning skills by imitation is a promising concept for the intuitive
teaching of robots. A common way to learn such skills is to learn a parametric
model by maximizing the likelihood given the demonstrations. Yet, human
demonstrations are often multi-modal, i.e., the same task is solved in multiple
ways which is a major challenge for most imitation learning methods that are
based on such a maximum likelihood (ML) objective. The ML objective forces the
model to cover all data, it prevents specialization in the context space and
can cause mode-averaging in the behavior space, leading to suboptimal or
potentially catastrophic behavior. Here, we alleviate those issues by
introducing a curriculum using a weight for each data point, allowing the model
to specialize on data it can represent while incentivizing it to cover as much
data as possible by an entropy bonus. We extend our algorithm to a Mixture of
(linear) Experts (MoE) such that the single components can specialize on local
context regions, while the MoE covers all data points. We evaluate our approach
in complex simulated and real robot control tasks and show it learns from
versatile human demonstrations and significantly outperforms current SOTA
methods. A reference implementation can be found at
https://github.com/intuitive-robots/ml-cu
Value function estimation using conditional diffusion models for control
A fairly reliable trend in deep reinforcement learning is that the
performance scales with the number of parameters, provided a complimentary
scaling in amount of training data. As the appetite for large models increases,
it is imperative to address, sooner than later, the potential problem of
running out of high-quality demonstrations. In this case, instead of collecting
only new data via costly human demonstrations or risking a simulation-to-real
transfer with uncertain effects, it would be beneficial to leverage vast
amounts of readily-available low-quality data. Since classical control
algorithms such as behavior cloning or temporal difference learning cannot be
used on reward-free or action-free data out-of-the-box, this solution warrants
novel training paradigms for continuous control. We propose a simple algorithm
called Diffused Value Function (DVF), which learns a joint multi-step model of
the environment-robot interaction dynamics using a diffusion model. This model
can be efficiently learned from state sequences (i.e., without access to reward
functions nor actions), and subsequently used to estimate the value of each
action out-of-the-box. We show how DVF can be used to efficiently capture the
state visitation measure for multiple controllers, and show promising
qualitative and quantitative results on challenging robotics benchmarks
- …