On learning history based policies for controlling Markov decision
  processes

Mahajan, Aditya; Patil, Gandharv; Precup, Doina

On learning history based policies for controlling Markov decision processes

Authors: Aditya Mahajan
Gandharv Patil
Doina Precup
Publication date: 5 November 2022
Publisher

Abstract

Reinforcementlearning(RL)folkloresuggeststhathistory-basedfunctionapproximationmethods,suchas recurrent neural nets or history-based state abstraction, perform better than their memory-less counterparts, due to the fact that function approximation in Markov decision processes (MDP) can be viewed as inducing a Partially observable MDP. However, there has been little formal analysis of such history-based algorithms, as most existing frameworks focus exclusively on memory-less features. In this paper, we introduce a theoretical framework for studying the behaviour of RL algorithms that learn to control an MDP using history-based feature abstraction mappings. Furthermore, we use this framework to design a practical RL algorithm and we numerically evaluate its effectiveness on a set of continuous control tasks

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2211.03011

Last time updated on 12/12/2022