Search CORE

1,559 research outputs found

Apprenticeship Learning for Motion Planning with Application to Parking Lot Navigation -A Summary

Author: Twan Van Laarhoven
Publication venue
Publication date: 30/04/2020
Field of study

Stochastic Inverse Reinforcement Learning

Author: Ju Ce
Publication venue
Publication date: 10/09/2020
Field of study

The goal of the inverse reinforcement learning (IRL) problem is to recover the reward functions from expert demonstrations. However, the IRL problem like any ill-posed inverse problem suffers the congenital defect that the policy may be optimal for many reward functions, and expert demonstrations may be optimal for many policies. In this work, we generalize the IRL problem to a well-posed expectation optimization problem stochastic inverse reinforcement learning (SIRL) to recover the probability distribution over reward functions. We adopt the Monte Carlo expectation-maximization (MCEM) method to estimate the parameter of the probability distribution as the first solution to the SIRL problem. The solution is succinct, robust, and transferable for a learning task and can generate alternative solutions to the IRL problem. Through our formulation, it is possible to observe the intrinsic property for the IRL problem from a global viewpoint, and our approach achieves a considerable performance on the objectworld.Comment: 8+2 pages, 5 figures, Under Revie

arXiv.org e-Print Archive

Driving with Style: Inverse Reinforcement Learning in General-Purpose Planning for Automated Driving

Author: Großjohann Simon
Homoceanu Silviu
James Vinit
Rosbach Sascha
Roth Stefan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/09/2020
Field of study

Behavior and motion planning play an important role in automated driving. Traditionally, behavior planners instruct local motion planners with predefined behaviors. Due to the high scene complexity in urban environments, unpredictable situations may occur in which behavior planners fail to match predefined behavior templates. Recently, general-purpose planners have been introduced, combining behavior and local motion planning. These general-purpose planners allow behavior-aware motion planning given a single reward function. However, two challenges arise: First, this function has to map a complex feature space into rewards. Second, the reward function has to be manually tuned by an expert. Manually tuning this reward function becomes a tedious task. In this paper, we propose an approach that relies on human driving demonstrations to automatically tune reward functions. This study offers important insights into the driving style optimization of general-purpose planners with maximum entropy inverse reinforcement learning. We evaluate our approach based on the expected value difference between learned and demonstrated policies. Furthermore, we compare the similarity of human driven trajectories with optimal policies of our planner under learned and expert-tuned reward functions. Our experiments show that we are able to learn reward functions exceeding the level of manual expert tuning without prior domain knowledge.Comment: Appeared at IROS 2019. Accepted version. Added/updated footnote, minor correction in preliminarie

arXiv.org e-Print Archive

Crossref

Motion learning in variable environments using probabilistic flow tubes

Author: Dong Shuonan
Williams Brian Charles
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2011
Field of study

Commanding an autonomous system through complex motions at a low level can be tedious or impractical for systems with many degrees of freedom. Allowing an operator to demonstrate the desired motions directly can often enable more intuitive and efficient interaction. Two challenges in the field of learning from demonstration include (1) how to best represent learned motions to accurately reflect a human's intentions, and (2) how to enable learned motions to be easily applicable in new situations. This paper introduces a novel representation of continuous actions called probabilistic flow tubes that can provide flexibility during execution while robustly encoding a human's intended motions. Our approach also automatically determines certain qualitative characteristics of a motion so that these characteristics can be preserved when autonomously executing the motion in a new situation. We demonstrate the effectiveness of our motion learning approach both in a simulated two-dimensional environment and on the All Terrain Hex-Limbed Extra-Terrestrial Explorer (ATHLETE) robot performing object manipulation tasks.United States. Dept. of Defense (National Defense Science and Engineering Graduate Fellowship 32 CFR 168a)United States. National Aeronautics and Space Administration (JPL Strategic University Research Partnership

DSpace@MIT

Modeling Driver Behavior From Demonstrations in Dynamic Environments Using Spatiotemporal Lattices

Author: Dibangoye Jilles
Erkent Özgür
Laugier Christian
Romero-Cano Víctor
Sierra González David
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/05/2018
Field of study

International audienceOne of the most challenging tasks in the development of path planners for intelligent vehicles is the design of the cost function that models the desired behavior of the vehicle. While this task has been traditionally accomplished by hand-tuning the model parameters, recent approaches propose to learn the model automatically from demonstrated driving data using Inverse Reinforcement Learning (IRL). To determine if the model has correctly captured the demonstrated behavior, most IRL methods require obtaining a policy by solving the forward control problem repetitively. Calculating the full policy is a costly task in continuous or large domains and thus often approximated by finding a single trajectory using traditional path-planning techniques. In this work, we propose to find such a trajectory using a conformal spatiotemporal state lattice, which offers two main advantages. First, by conforming the lattice to the environment, the search is focused only on feasible motions for the robot, saving computational power. And second, by considering time as part of the state, the trajectory is optimized with respect to the motion of the dynamic obstacles in the scene. As a consequence, the resulting trajectory can be used for the model assessment. We show how the proposed IRL framework can successfully handle highly dynamic environments by modeling the highway tactical driving task from demonstrated driving data gathered with an instrumented vehicle

Crossref

INRIA a CCSD electronic archive server

HAL Descartes