5 research outputs found
Min Max Generalization for Two-stage Deterministic Batch Mode Reinforcement Learning: Relaxation Schemes
We study the minmax optimization problem introduced in [22] for computing
policies for batch mode reinforcement learning in a deterministic setting.
First, we show that this problem is NP-hard. In the two-stage case, we provide
two relaxation schemes. The first relaxation scheme works by dropping some
constraints in order to obtain a problem that is solvable in polynomial time.
The second relaxation scheme, based on a Lagrangian relaxation where all
constraints are dualized, leads to a conic quadratic programming problem. We
also theoretically prove and empirically illustrate that both relaxation
schemes provide better results than those given in [22]
A survey of time consistency of dynamic risk measures and dynamic performance measures in discrete time : LM-measure perspective
In this work we give a comprehensive overview of the time consistency property of dynamic risk and performance measures, focusing on a the discrete time setup. The two key operational concepts used throughout are the notion of the LM-measure and the notion of the update rule that, we believe, are the key tools for studying time consistency in a unified framework