13 research outputs found
Goal-Conditioned Imitation Learning using Score-based Diffusion Policies
We propose a new policy representation based on score-based diffusion models (SDMs). We apply our new policy representation in the domain of Goal-Conditioned Imitation Learning (GCIL) to learn general-purpose goal-specified policies from large uncurated datasets without rewards. Our new goal-conditioned policy architecture "havior generation with cre-based Diffusion Policies" (BESO) leverages a generative, score-based diffusion model as its policy. BESO decouples the learning of the score model from the inference sampling process, and, hence allows for fast sampling strategies to generate goal-specified behavior in just 3 denoising steps, compared to 30+ steps of other diffusion based policies. Furthermore, BESO is highly expressive and can effectively capture multi-modality present in the solution space of the play data. Unlike previous methods such as Latent Plans or C-Bet, BESO does not rely on complex hierarchical policies or additional clustering for effective goal-conditioned behavior learning. Finally, we show how BESO can even be used to learn a goal-independent policy from play-data using classifier-free guidance. To the best of our knowledge this is the first work that a) represents a behavior policy based on such a decoupled SDM b) learns an SDM based policy in the domain of GCIL and c) provides a way to simultaneously learn a goal-dependent and a goal-independent policy from play-data. We evaluate BESO through detailed simulation and show that it consistently outperforms several state-of-the-art goal-conditioned imitation learning methods on challenging benchmarks. We additionally provide extensive ablation studies and experiments to demonstrate the effectiveness of our method for effective goal-conditioned behavior generation
End-to-End Learning of Hybrid Inverse Dynamics Models for Precise and Compliant Impedance Control
It is well-known that inverse dynamics models can improve tracking performance in robot control. These models need to precisely capture the robot dynamics, which consist of well-understood components, e.g., rigid body dynamics, and effects that remain challenging to capture, e.g., stick-slip friction and mechanical flexibilities. Such effects exhibit hysteresis and partial observability, rendering them, particularly challenging to model. Hence, hybrid models, which combine a physical prior with data-driven approaches are especially well-suited in this setting. We present a novel hybrid model formulation that enables us to identify fully physically consistent inertial parameters of a rigid body dynamics model which is paired with a recurrent neural network architecture, allowing us to capture unmodeled partially observable effects using the network memory. We compare our approach against state-of-the-art inverse dynamics models on a 7 degree of freedom manipulator. Using data sets obtained through an optimal experiment design approach, we study the accuracy of offline torque prediction and generalization capabilities of joint learning methods. In control experiments on the real system, we evaluate the model as a feed-forward term for impedance control and show the feedback gains can be drastically reduced to achieve a given tracking accuracy
Information Maximizing Curriculum: A Curriculum-Based Approach for Training Mixtures of Experts
Mixtures of Experts (MoE) are known for their ability to learn complex
conditional distributions with multiple modes. However, despite their
potential, these models are challenging to train and often tend to produce poor
performance, explaining their limited popularity. Our hypothesis is that this
under-performance is a result of the commonly utilized maximum likelihood (ML)
optimization, which leads to mode averaging and a higher likelihood of getting
stuck in local maxima. We propose a novel curriculum-based approach to learning
mixture models in which each component of the MoE is able to select its own
subset of the training data for learning. This approach allows for independent
optimization of each component, resulting in a more modular architecture that
enables the addition and deletion of components on the fly, leading to an
optimization less susceptible to local optima. The curricula can ignore
data-points from modes not represented by the MoE, reducing the mode-averaging
problem. To achieve a good data coverage, we couple the optimization of the
curricula with a joint entropy objective and optimize a lower bound of this
objective. We evaluate our curriculum-based approach on a variety of multimodal
behavior learning tasks and demonstrate its superiority over competing methods
for learning MoE models and conditional generative models
End-to-End Learning of Hybrid Inverse Dynamics Models for Precise and Compliant Impedance Control
It is well-known that inverse dynamics models can improve tracking
performance in robot control. These models need to precisely capture the robot
dynamics, which consist of well-understood components, e.g., rigid body
dynamics, and effects that remain challenging to capture, e.g., stick-slip
friction and mechanical flexibilities. Such effects exhibit hysteresis and
partial observability, rendering them, particularly challenging to model.
Hence, hybrid models, which combine a physical prior with data-driven
approaches are especially well-suited in this setting. We present a novel
hybrid model formulation that enables us to identify fully physically
consistent inertial parameters of a rigid body dynamics model which is paired
with a recurrent neural network architecture, allowing us to capture unmodeled
partially observable effects using the network memory. We compare our approach
against state-of-the-art inverse dynamics models on a 7 degree of freedom
manipulator. Using data sets obtained through an optimal experiment design
approach, we study the accuracy of offline torque prediction and generalization
capabilities of joint learning methods. In control experiments on the real
system, we evaluate the model as a feed-forward term for impedance control and
show the feedback gains can be drastically reduced to achieve a given tracking
accuracy.Comment: Accepted for publication at Robotics: Science and System XVIII (RSS),
year 2022. Paper length is 13 pages (i.e. 9 pages of technical content, 1
page of the Bibliography/References and 3 pages of Appendix