Min Max Generalization for Two-stage Deterministic Batch Mode
  Reinforcement Learning: Relaxation Schemes

Boigelot, Bernard; Ernst, Damien; Fonteneau, Raphael; Louveaux, Quentin

research

Min Max Generalization for Two-stage Deterministic Batch Mode Reinforcement Learning: Relaxation Schemes

Authors: Bernard Boigelot
Damien Ernst
Raphael Fonteneau
Quentin Louveaux
Publication date: 1 January 2012
Publisher

Abstract

We study the minmax optimization problem introduced in [22] for computing policies for batch mode reinforcement learning in a deterministic setting. First, we show that this problem is NP-hard. In the two-stage case, we provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, leads to a conic quadratic programming problem. We also theoretically prove and empirically illustrate that both relaxation schemes provide better results than those given in [22]

Similar works

Full text

Available Versions

Open Repository and Bibliography - Liège

oai:orbi.ulg.ac.be:2268/136851

Last time updated on 21/08/2013