Skip to main content
Article thumbnail
Location of Repository

Min Max Generalization for Deterministic Batch Mode Reinforcement Learning: Relaxation Schemes

By Raphaël Fonteneau

Abstract

We study the min max optimization problem introduced in [Fonteneau et al. (2011), ``Towards min max reinforcement learning'', Springer CCIS, vol. 129, pp. 61-77] for computing policies for batch mode reinforcement learning in a deterministic setting with fixed, finite time horizon. First, we show that the min part of this problem is NP-hard. We then provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, can also be solved in polynomial time. We also theoretically prove and empirically illustrate that both relaxation schemes provide better results than those given in [Fonteneau et al. (2011)

Topics: Reinforcement Learning, Engineering, computing & technology :: Computer science, Ingénierie, informatique & technologie :: Sciences informatiques
Year: 2013
OAI identifier: oai:orbi.ulg.ac.be:2268/182290

Suggested articles


To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.