973 research outputs found
Revisiting Maximum Entropy Inverse Reinforcement Learning: New Perspectives and Algorithms
We provide new perspectives and inference algorithms for Maximum Entropy
(MaxEnt) Inverse Reinforcement Learning (IRL), which provides a principled
method to find a most non-committal reward function consistent with given
expert demonstrations, among many consistent reward functions.
We first present a generalized MaxEnt formulation based on minimizing a
KL-divergence instead of maximizing an entropy. This improves the previous
heuristic derivation of the MaxEnt IRL model (for stochastic MDPs), allows a
unified view of MaxEnt IRL and Relative Entropy IRL, and leads to a model-free
learning algorithm for the MaxEnt IRL model. Second, a careful review of
existing inference algorithms and implementations showed that they
approximately compute the marginals required for learning the model. We provide
examples to illustrate this, and present an efficient and exact inference
algorithm. Our algorithm can handle variable length demonstrations; in
addition, while a basic version takes time quadratic in the maximum
demonstration length L, an improved version of this algorithm reduces this to
linear using a padding trick.
Experiments show that our exact algorithm improves reward learning as
compared to the approximate ones. Furthermore, our algorithm scales up to a
large, real-world dataset involving driver behaviour forecasting. We provide an
optimized implementation compatible with the OpenAI Gym interface. Our new
insight and algorithms could possibly lead to further interest and exploration
of the original MaxEnt IRL model.Comment: Published as a conference paper at the 2020 IEEE Symposium Series on
Computational Intelligence (SSCI
Human-robot cross-training: Computational formulation, modeling and evaluation of a human team training strategy
We design and evaluate human-robot cross-training, a strategy widely used and validated for effective human team training. Cross-training is an interactive planning method in which a human and a robot iteratively switch roles to learn a shared plan for a collaborative task. We first present a computational formulation of the robot's interrole knowledge and show that it is quantitatively comparable to the human mental model. Based on this encoding, we formulate human-robot cross-training and evaluate it in human subject experiments (n = 36). We compare human-robot cross-training to standard reinforcement learning techniques, and show that cross-training provides statistically significant improvements in quantitative team performance measures. Additionally, significant differences emerge in the perceived robot performance and human trust. These results support the hypothesis that effective and fluent human-robot teaming may be best achieved by modeling effective practices for human teamwork.ABB Inc.U.S. Commercial Regional CenterAlexander S. Onassis Public Benefit Foundatio
Primal Wasserstein Imitation Learning
Imitation Learning (IL) methods seek to match the behavior of an agent with
that of an expert. In the present work, we propose a new IL method based on a
conceptually simple algorithm: Primal Wasserstein Imitation Learning (PWIL),
which ties to the primal form of the Wasserstein distance between the expert
and the agent state-action distributions. We present a reward function which is
derived offline, as opposed to recent adversarial IL algorithms that learn a
reward function through interactions with the environment, and which requires
little fine-tuning. We show that we can recover expert behavior on a variety of
continuous control tasks of the MuJoCo domain in a sample efficient manner in
terms of agent interactions and of expert interactions with the environment.
Finally, we show that the behavior of the agent we train matches the behavior
of the expert with the Wasserstein distance, rather than the commonly used
proxy of performance.Comment: Published in International Conference on Learning Representations
(ICLR 2021
- …