12,776 research outputs found

    MESSI: Maximum Entropy Semi-Supervised Inverse Reinforcement Learning

    Get PDF
    International audienceA popular approach to apprenticeship learning (AL) is to formulate it as an inverse reinforcement learning (IRL) problem. The MaxEnt-IRL algorithm successfully integrates the maximum entropy principle into IRL and unlike its predecessors, it resolves the ambiguity arising from the fact that a possibly large number of policies could match the expert's behavior. In this paper, we study an AL setting in which in addition to the expert's trajectories, a number of unsupervised trajectories is available. We introduce MESSI, a novel algorithm that combines MaxEnt-IRL with principles coming from semi-supervised learning. In particular, MESSI integrates the unsupervised data into the MaxEnt-IRL framework using a pairwise penalty on trajectories. Empirical results in a highway driving and grid-world problems indicate that MESSI is able to take advantage of the unsupervised trajectories and improve the performance of MaxEnt-IRL

    Maximum entropy approaches for inverse reinforcement learning

    Get PDF
    We make decisions to maximize our perceived reward, but handcrafting a reward function for an autonomous agent is challenging. Inverse Reinforcement Learning (IRL), which is concerned with learning a reward function from expert demonstrations, has recently attracted significant interest, with the Maximum Entropy (MaxEnt) approach being a popular method.In this talk, we will explore and contrast a variety of MaxEnt IRL approaches. We show that in the presence of stochastic dynamics, a minimum KL-divergence condition provides a rigorous derivation of the MaxEnt model, improving over a prior heuristic derivation. Furthermore, we explore extensions of the MaxEnt IRL method to the case of unknown stochastic transition dynamics, including a generative model for trajectories, a discriminative model for action sequences, and a simple logistic regression model

    Revisiting Maximum Entropy Inverse Reinforcement Learning: New Perspectives and Algorithms

    Full text link
    We provide new perspectives and inference algorithms for Maximum Entropy (MaxEnt) Inverse Reinforcement Learning (IRL), which provides a principled method to find a most non-committal reward function consistent with given expert demonstrations, among many consistent reward functions. We first present a generalized MaxEnt formulation based on minimizing a KL-divergence instead of maximizing an entropy. This improves the previous heuristic derivation of the MaxEnt IRL model (for stochastic MDPs), allows a unified view of MaxEnt IRL and Relative Entropy IRL, and leads to a model-free learning algorithm for the MaxEnt IRL model. Second, a careful review of existing inference algorithms and implementations showed that they approximately compute the marginals required for learning the model. We provide examples to illustrate this, and present an efficient and exact inference algorithm. Our algorithm can handle variable length demonstrations; in addition, while a basic version takes time quadratic in the maximum demonstration length L, an improved version of this algorithm reduces this to linear using a padding trick. Experiments show that our exact algorithm improves reward learning as compared to the approximate ones. Furthermore, our algorithm scales up to a large, real-world dataset involving driver behaviour forecasting. We provide an optimized implementation compatible with the OpenAI Gym interface. Our new insight and algorithms could possibly lead to further interest and exploration of the original MaxEnt IRL model.Comment: Published as a conference paper at the 2020 IEEE Symposium Series on Computational Intelligence (SSCI

    BC-IRL: Learning Generalizable Reward Functions from Demonstrations

    Full text link
    How well do reward functions learned with inverse reinforcement learning (IRL) generalize? We illustrate that state-of-the-art IRL algorithms, which maximize a maximum-entropy objective, learn rewards that overfit to the demonstrations. Such rewards struggle to provide meaningful rewards for states not covered by the demonstrations, a major detriment when using the reward to learn policies in new situations. We introduce BC-IRL a new inverse reinforcement learning method that learns reward functions that generalize better when compared to maximum-entropy IRL approaches. In contrast to the MaxEnt framework, which learns to maximize rewards around demonstrations, BC-IRL updates reward parameters such that the policy trained with the new reward matches the expert demonstrations better. We show that BC-IRL learns rewards that generalize better on an illustrative simple task and two continuous robotic control tasks, achieving over twice the success rate of baselines in challenging generalization settings
    corecore