Search CORE

517 research outputs found

Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition

Author: Dietterich Thomas G.
Publication venue
Publication date: 01/01/1998
Field of study

This paper presents the MAXQ approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The paper defines the MAXQ hierarchy, proves formal results on its representational power, and establishes five conditions for the safe use of state abstractions. The paper presents an online model-free learning algorithm, MAXQ-Q, and proves that it converges wih probability 1 to a kind of locally-optimal policy known as a recursively optimal policy, even in the presence of the five kinds of state abstraction. The paper evaluates the MAXQ representation and MAXQ-Q through a series of experiments in three domains and shows experimentally that MAXQ-Q (with state abstractions) converges to a recursively optimal policy much faster than flat Q learning. The fact that MAXQ learns a representation of the value function has an important benefit: it makes it possible to compute and execute an improved, non-hierarchical policy via a procedure similar to the policy improvement step of policy iteration. The paper demonstrates the effectiveness of this non-hierarchical execution experimentally. Finally, the paper concludes with a comparison to related work and a discussion of the design tradeoffs in hierarchical reinforcement learning.Comment: 63 pages, 15 figure

arXiv.org e-Print Archive

CiteSeerX

State Abstraction in MAXQ Hierarchical Reinforcement Learning

Author: Dietterich Thomas G.
Publication venue
Publication date: 21/05/1999
Field of study

Many researchers have explored methods for hierarchical reinforcement learning (RL) with temporal abstractions, in which abstract actions are defined that can perform many primitive actions before terminating. However, little is known about learning with state abstractions, in which aspects of the state space are ignored. In previous work, we developed the MAXQ method for hierarchical RL. In this paper, we define five conditions under which state abstraction can be combined with the MAXQ value function decomposition. We prove that the MAXQ-Q learning algorithm converges under these conditions and show experimentally that state abstraction is important for the successful application of MAXQ-Q learning.Comment: 7 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

Eligibility Propagation to Speed up Time Hopping for Reinforcement Learning

Author: Dong Fangyan
Hirota Kaoru
Kormushev Petar
Nomoto Kohei
Publication venue
Publication date: 03/04/2009
Field of study

A mechanism called Eligibility Propagation is proposed to speed up the Time Hopping technique used for faster Reinforcement Learning in simulations. Eligibility Propagation provides for Time Hopping similar abilities to what eligibility traces provide for conventional Reinforcement Learning. It propagates values from one state to all of its temporal predecessors using a state transitions graph. Experiments on a simulated biped crawling robot confirm that Eligibility Propagation accelerates the learning process more than 3 times.Comment: 7 page

arXiv.org e-Print Archive

Crossref

Spiral - Imperial College Digital Repository

Recommended from our members

Action selection in modular reinforcement learning

Author: Zhang Ruohan
Publication venue
Publication date: 16/09/2014
Field of study

textModular reinforcement learning is an approach to resolve the curse of dimensionality problem in traditional reinforcement learning. We design and implement a modular reinforcement learning algorithm, which is based on three major components: Markov decision process decomposition, module training, and global action selection. We define and formalize module class and module instance concepts in decomposition step. Under our framework of decomposition, we train each modules efficiently using SARSA(

\lambda

) algorithm. Then we design, implement, test, and compare three action selection algorithms based on different heuristics: Module Combination, Module Selection, and Module Voting. For last two algorithms, we propose a method to calculate module weights efficiently, by using standard deviation of Q-values of each module. We show that Module Combination and Module Voting algorithms produce satisfactory performance in our test domain.Computer Science

Texas ScholarWorks

Classifying Options for Deep Reinforcement Learning

Author: Arulkumaran Kai
Bharath Anil Anthony
Dilokthanakul Nat
Shanahan Murray
Publication venue
Publication date: 23/05/2016
Field of study

In this paper we combine one method for hierarchical reinforcement learning - the options framework - with deep Q-networks (DQNs) through the use of different "option heads" on the policy network, and a supervisory network for choosing between the different options. We utilise our setup to investigate the effects of architectural constraints in subtasks with positive and negative transfer, across a range of network capacities. We empirically show that our augmented DQN has lower sample complexity when simultaneously learning subtasks with negative transfer, without degrading performance when learning subtasks with positive transfer.Comment: IJCAI 2016 Workshop on Deep Reinforcement Learning: Frontiers and Challenge

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository