13,916 research outputs found
Meta Inverse Reinforcement Learning via Maximum Reward Sharing for Human Motion Analysis
This work handles the inverse reinforcement learning (IRL) problem where only
a small number of demonstrations are available from a demonstrator for each
high-dimensional task, insufficient to estimate an accurate reward function.
Observing that each demonstrator has an inherent reward for each state and the
task-specific behaviors mainly depend on a small number of key states, we
propose a meta IRL algorithm that first models the reward function for each
task as a distribution conditioned on a baseline reward function shared by all
tasks and dependent only on the demonstrator, and then finds the most likely
reward function in the distribution that explains the task-specific behaviors.
We test the method in a simulated environment on path planning tasks with
limited demonstrations, and show that the accuracy of the learned reward
function is significantly improved. We also apply the method to analyze the
motion of a patient under rehabilitation.Comment: arXiv admin note: text overlap with arXiv:1707.0939
Efficient Supervision for Robot Learning via Imitation, Simulation, and Adaptation
Recent successes in machine learning have led to a shift in the design of
autonomous systems, improving performance on existing tasks and rendering new
applications possible. Data-focused approaches gain relevance across diverse,
intricate applications when developing data collection and curation pipelines
becomes more effective than manual behaviour design. The following work aims at
increasing the efficiency of this pipeline in two principal ways: by utilising
more powerful sources of informative data and by extracting additional
information from existing data. In particular, we target three orthogonal
fronts: imitation learning, domain adaptation, and transfer from simulation.Comment: Dissertation Summar
Driving with Style: Inverse Reinforcement Learning in General-Purpose Planning for Automated Driving
Behavior and motion planning play an important role in automated driving.
Traditionally, behavior planners instruct local motion planners with predefined
behaviors. Due to the high scene complexity in urban environments,
unpredictable situations may occur in which behavior planners fail to match
predefined behavior templates. Recently, general-purpose planners have been
introduced, combining behavior and local motion planning. These general-purpose
planners allow behavior-aware motion planning given a single reward function.
However, two challenges arise: First, this function has to map a complex
feature space into rewards. Second, the reward function has to be manually
tuned by an expert. Manually tuning this reward function becomes a tedious
task. In this paper, we propose an approach that relies on human driving
demonstrations to automatically tune reward functions. This study offers
important insights into the driving style optimization of general-purpose
planners with maximum entropy inverse reinforcement learning. We evaluate our
approach based on the expected value difference between learned and
demonstrated policies. Furthermore, we compare the similarity of human driven
trajectories with optimal policies of our planner under learned and
expert-tuned reward functions. Our experiments show that we are able to learn
reward functions exceeding the level of manual expert tuning without prior
domain knowledge.Comment: Appeared at IROS 2019. Accepted version. Added/updated footnote,
minor correction in preliminarie
- …