Search CORE

18,520 research outputs found

A Function Approximation Method for Model-based High-Dimensional Inverse Reinforcement Learning

Author: Burdick Joel W.
Li Kun
Publication venue
Publication date: 23/08/2017
Field of study

This works handles the inverse reinforcement learning problem in high-dimensional state spaces, which relies on an efficient solution of model-based high-dimensional reinforcement learning problems. To solve the computationally expensive reinforcement learning problems, we propose a function approximation method to ensure that the Bellman Optimality Equation always holds, and then estimate a function based on the observed human actions for inverse reinforcement learning problems. The time complexity of the proposed method is linearly proportional to the cardinality of the action set, thus it can handle high-dimensional even continuous state spaces efficiently. We test the proposed method in a simulated environment to show its accuracy, and three clinical tasks to show how it can be used to evaluate a doctor's proficiency

arXiv.org e-Print Archive

Caltech Authors

A Function Approximation Method for Model-based High-Dimensional Inverse Reinforcement Learning

Author: Burdick Joel W.
Li Kun
Publication venue
Publication date: 23/08/2017
Field of study

Inverse Reinforcement Learning in Large State Spaces via Function Approximation

Author: Burdick Joel W.
Li Kun
Publication venue
Publication date: 28/07/2017
Field of study

This paper introduces a new method for inverse reinforcement learning in large-scale and high-dimensional state spaces. To avoid solving the computationally expensive reinforcement learning problems in reward learning, we propose a function approximation method to ensure that the Bellman Optimality Equation always holds, and then estimate a function to maximize the likelihood of the observed motion. The time complexity of the proposed method is linearly proportional to the cardinality of the action set, thus it can handle large state spaces efficiently. We test the proposed method in a simulated environment, and show that it is more accurate than existing methods and significantly better in scalability. We also show that the proposed method can extend many existing methods to high-dimensional state spaces. We then apply the method to evaluating the effect of rehabilitative stimulations on patients with spinal cord injuries based on the observed patient motions.Comment: Experiment update

arXiv.org e-Print Archive

Caltech Authors

Meta Inverse Reinforcement Learning via Maximum Reward Sharing for Human Motion Analysis

Author: Burdick Joel W.
Li Kun
Publication venue
Publication date: 12/10/2017
Field of study

This work handles the inverse reinforcement learning (IRL) problem where only a small number of demonstrations are available from a demonstrator for each high-dimensional task, insufficient to estimate an accurate reward function. Observing that each demonstrator has an inherent reward for each state and the task-specific behaviors mainly depend on a small number of key states, we propose a meta IRL algorithm that first models the reward function for each task as a distribution conditioned on a baseline reward function shared by all tasks and dependent only on the demonstrator, and then finds the most likely reward function in the distribution that explains the task-specific behaviors. We test the method in a simulated environment on path planning tasks with limited demonstrations, and show that the accuracy of the learned reward function is significantly improved. We also apply the method to analyze the motion of a patient under rehabilitation.Comment: arXiv admin note: text overlap with arXiv:1707.0939

arXiv.org e-Print Archive

Count-Based Exploration in Feature Space for Reinforcement Learning

Author: Everitt Tom
Hutter Marcus
Martin Jarryd
Sasikumar Suraj Narayanan
Publication venue
Publication date: 25/06/2017
Field of study

We introduce a new count-based optimistic exploration algorithm for Reinforcement Learning (RL) that is feasible in environments with high-dimensional state-action spaces. The success of RL algorithms in these domains depends crucially on generalisation from limited training experience. Function approximation techniques enable RL agents to generalise in order to estimate the value of unvisited states, but at present few methods enable generalisation regarding uncertainty. This has prevented the combination of scalable RL algorithms with efficient exploration strategies that drive the agent to reduce its uncertainty. We present a new method for computing a generalised state visit-count, which allows the agent to estimate the uncertainty associated with any state. Our \phi-pseudocount achieves generalisation by exploiting same feature representation of the state space that is used for value function approximation. States that have less frequently observed features are deemed more uncertain. The \phi-Exploration-Bonus algorithm rewards the agent for exploring in feature space rather than in the untransformed state space. The method is simpler and less computationally expensive than some previous proposals, and achieves near state-of-the-art results on high-dimensional RL benchmarks.Comment: Conference: Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI-17), 8 pages, 1 figur

arXiv.org e-Print Archive

Crossref

Kernelizing LSPE λ

Author: Jung T.
Polani D.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

We propose the use of kernel-based methods as underlying function approximator in the least-squares based policy evaluation framework of LSPE(λ) and LSTD(λ). In particular we present the ‘kernelization’ of model-free LSPE(λ). The ‘kernelization’ is computationally made possible by using the subset of regressors approximation, which approximates the kernel using a vastly reduced number of basis functions. The core of our proposed solution is an efficient recursive implementation with automatic supervised selection of the relevant basis functions. The LSPE method is well-suited for optimistic policy iteration and can thus be used in the context of online reinforcement learning. We use the high-dimensional Octopus benchmark to demonstrate this

CiteSeerX

University of Hertfordshire Research Archive