Search CORE

1,692 research outputs found

Imitating Driver Behavior with Generative Adversarial Networks

Author: Kochenderfer Mykel
Kuefler Alex
Morton Jeremy
Wheeler Tim
Publication venue
Publication date: 23/01/2017
Field of study

The ability to accurately predict and simulate human driving behavior is critical for the development of intelligent transportation systems. Traditional modeling methods have employed simple parametric models and behavioral cloning. This paper adopts a method for overcoming the problem of cascading errors inherent in prior approaches, resulting in realistic behavior that is robust to trajectory perturbations. We extend Generative Adversarial Imitation Learning to the training of recurrent policies, and we demonstrate that our model outperforms rule-based controllers and maximum likelihood models in realistic highway simulations. Our model both reproduces emergent behavior of human drivers, such as lane change rate, while maintaining realistic control over long time horizons.Comment: 8 pages, 6 figure

arXiv.org e-Print Archive

Crossref

Visual Imitation Learning with Recurrent Siamese Networks

Author: Berseth Glen
Pal Christopher J.
Publication venue
Publication date: 28/05/2019
Field of study

It would be desirable for a reinforcement learning (RL) based agent to learn behaviour by merely watching a demonstration. However, defining rewards that facilitate this goal within the RL paradigm remains a challenge. Here we address this problem with Siamese networks, trained to compute distances between observed behaviours and the agent's behaviours. Given a desired motion such Siamese networks can be used to provide a reward signal to an RL agent via the distance between the desired motion and the agent's motion. We experiment with an RNN-based comparator model that can compute distances in space and time between motion clips while training an RL policy to minimize this distance. Through experimentation, we have had also found that the inclusion of multi-task data and an additional image encoding loss helps enforce the temporal consistency. These two components appear to balance reward for matching a specific instance of behaviour versus that behaviour in general. Furthermore, we focus here on a particularly challenging form of this problem where only a single demonstration is provided for a given task -- the one-shot learning setting. We demonstrate our approach on humanoid agents in both 2D with

10

degrees of freedom (DoF) and 3D with

38

DoF.Comment: PrePrin

arXiv.org e-Print Archive