Search CORE

19 research outputs found

Versatile Inverse Reinforcement Learning via Cumulative Rewards

Author: Becker Philipp
Freymuth Niklas
Neumann Gerhard
Publication venue
Publication date: 15/11/2021
Field of study

Inverse Reinforcement Learning infers a reward function from expert demonstrations, aiming to encode the behavior and intentions of the expert. Current approaches usually do this with generative and uni-modal models, meaning that they encode a single behavior. In the common setting, where there are various solutions to a problem and the experts show versatile behavior this severely limits the generalization capabilities of these methods. We propose a novel method for Inverse Reinforcement Learning that overcomes these problems by formulating the recovered reward as a sum of iteratively trained discriminators. We show on simulated tasks that our approach is able to recover general, high-quality reward functions and produces policies of the same quality as behavioral cloning approaches designed for versatile behavior

arXiv.org e-Print Archive

KITopen

Online Observer-Based Inverse Reinforcement Learning

Author: Bai He
Coleman Kevin
Kamalapurkar Rushikesh
Self Ryan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/11/2020
Field of study

In this paper, a novel approach to the output-feedback inverse reinforcement learning (IRL) problem is developed by casting the IRL problem, for linear systems with quadratic cost functions, as a state estimation problem. Two observer-based techniques for IRL are developed, including a novel observer method that re-uses previous state estimates via history stacks. Theoretical guarantees for convergence and robustness are established under appropriate excitation conditions. Simulations demonstrate the performance of the developed observers and filters under noisy and noise-free measurements.Comment: 7 pages, 3 figure

arXiv.org e-Print Archive

Primal Wasserstein Imitation Learning

Author: Dadashi Robert
Geist Matthieu
Hussenot Léonard
Pietquin Olivier
Publication venue
Publication date: 17/03/2021
Field of study

Imitation Learning (IL) methods seek to match the behavior of an agent with that of an expert. In the present work, we propose a new IL method based on a conceptually simple algorithm: Primal Wasserstein Imitation Learning (PWIL), which ties to the primal form of the Wasserstein distance between the expert and the agent state-action distributions. We present a reward function which is derived offline, as opposed to recent adversarial IL algorithms that learn a reward function through interactions with the environment, and which requires little fine-tuning. We show that we can recover expert behavior on a variety of continuous control tasks of the MuJoCo domain in a sample efficient manner in terms of agent interactions and of expert interactions with the environment. Finally, we show that the behavior of the agent we train matches the behavior of the expert with the Wasserstein distance, rather than the commonly used proxy of performance.Comment: Published in International Conference on Learning Representations (ICLR 2021

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot