Search CORE

6,336 research outputs found

Controlled approximation of the value function in stochastic dynamic programming for multi-reservoir systems

Author: Lamond Bernard
Lang Pascal
Zéphyr Luckny
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2015
Field of study

We present a new approach for adaptive approximation of the value function in stochastic dynamic programming. Under convexity assumptions, our method is based on a simplicial partition of the state space. Bounds on the value function provide guidance as to where refinement should be done, if at all. Thus, the method allows for a trade-off between solution time and accuracy. The proposed scheme is experimented in the particular context of hydroelectric production across multiple reservoirs

CorpusUL

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

Author: Heinrich Johannes
Silver David
Publication venue
Publication date: 03/03/2016
Field of study

Many real-world applications can be described as large-scale games of imperfect information. To deal with these challenging domains, prior work has focused on computing Nash equilibria in a handcrafted abstraction of the domain. In this paper we introduce the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge. Our method combines fictitious self-play with deep reinforcement learning. When applied to Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium, whereas common reinforcement learning methods diverged. In Limit Texas Holdem, a poker game of real-world scale, NFSP learnt a strategy that approached the performance of state-of-the-art, superhuman algorithms based on significant domain expertise.Comment: updated version, incorporating conference feedbac

arXiv.org e-Print Archive

UCL Discovery

Joint dynamic probabilistic constraints with projected linear decision rules

Author: Guigues Vincent
Henrion Rene
Publication venue
Publication date: 01/01/2016
Field of study

We consider multistage stochastic linear optimization problems combining joint dynamic probabilistic constraints with hard constraints. We develop a method for projecting decision rules onto hard constraints of wait-and-see type. We establish the relation between the original (infinite dimensional) problem and approximating problems working with projections from different subclasses of decision policies. Considering the subclass of linear decision rules and a generalized linear model for the underlying stochastic process with noises that are Gaussian or truncated Gaussian, we show that the value and gradient of the objective and constraint functions of the approximating problems can be computed analytically

arXiv.org e-Print Archive

Publications Server of the Weierstrass Institute for Applied Analysis and Stochastics

Repositorium für Naturwissenschaften und Technik

Tree-based Fitted Q-iteration for Multi-Objective Markov Decision problems

Author: A. Castelletti
F. Pianosi
M. Restelli
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref