Product of experts for robot learning from demonstration

Pignat, Emmanuel

Product of experts for robot learning from demonstration

Authors: Emmanuel Pignat
Publication date: 29 March 2021
Publisher: Lausanne, EPFL
Doi

Abstract

Adaptability and ease of programming are key features necessary for a wider spread of robotics in factories and everyday assistance. Learning from demonstration (LfD) is an approach to address this problem. It aims to develop algorithms and interfaces such that a non-expert user can teach the robot new tasks by showing examples. While the configuration of a manipulator is defined by its joint angles, postures and movements are often best explained under several task spaces. These task spaces are exploited by most of the existing LfD approaches. However, models are often learned independently in the different task spaces and only combined later, at the controller level. This simplification implies several limitations such as recovering the precision and hierarchy of the different tasks. They are also unable to uncover secondary task masked by the resolution of primary ones. In this thesis, we aim to overcome these limitations by proposing a consistent framework for LfD based on product of experts models (PoEs). In PoEs, data is modelled as a fusion from multiple sources or "experts". Each of them is giving an ``opinion'' on a different view or transformation of the data, which corresponds to different task spaces. Mathematically, the experts are probability density functions, which are multiplied together and renormalized. Distributions of two different nature are targeted in this thesis. In the first part of the thesis, PoEs are proposed to model distributions of robot configurations. These distributions are a key component of many LfD approaches. They are commonly used to define motions by introducing a dependence to time, as observation models in hidden-Markov models or transformed by a time-dependent basis matrix. Through multiple experiments, we show the advantages of learning models in several task spaces jointly in the PoE framework. We also compare PoE against more general techniques like variational autoencoders and generative adversarial nets. However, training a PoE requires costly approximations to which the performance can be very sensitive. An alternative approach to contrastive divergence is presented, by using variational inference and mixture model approximations. We also propose an extension to PoE with a nullspace structure (PoENS). This model can recover tasks that are masked by the resolution of higher-level objectives. In the second part of the thesis, PoEs are used to learn stochastic policies. We propose to learn motion primitives as distributions of trajectories. Instead of approximating complicated normalizing constants as in maximum entropy inverse optimal control, we propose to use a generative adversarial approach. The policy is parametrized as a product of Gaussian distributions of velocities, accelerations or forces, acting in different task spaces. Given an approximate and stochastic dynamic model of the system, the policy is trained by stochastic gradient descent, such that the distributions of rollouts match the distribution of demonstrations

Similar works

Full text

Available Versions

Infoscience - École polytechnique fédérale de Lausanne

oai:infoscience.epfl.ch:284634

Last time updated on 08/04/2021