Skip to main content
Article thumbnail
Location of Repository

Learning Optimal Policies using Bound Estimation

By Andrea Bonarini, Alessandro Lazaric and Marcello Restelli

Abstract

Reinforcement learning problems related to real-world applications are often characterized by large state spaces, which imply high memory requirements. In the past, the use of function approximators has been studied to get compact representations of the value function. Although function approximators perform well in the supervised learning context, when applied to the reinforcement learning framework they may find unsatisfactory solutions or even diverge. In this paper, we focus our attention on a particular kind of function approximator: state aggregation. We show how it is possible to compute upper and lower bounds of the optimal values of actions to be applied in the states contained in each aggregate. We propose an algorithm that modifies the state aggregation until the optimal solution is reached. Furthermore, this approach is extended to a multi-representation algorithm where overlapping partitions are used to compute tighter bounds. Although this paper limits its analysis to deterministic environments, it establishes new and relevant results, that will be extended to stochastic problems in a next future.

Year: 2005
OAI identifier: oai:CiteSeerX.psu:10.1.1.135.688
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://home.dei.polimi.it/laza... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.