6,336 research outputs found

    Controlled approximation of the value function in stochastic dynamic programming for multi-reservoir systems

    Get PDF
    We present a new approach for adaptive approximation of the value function in stochastic dynamic programming. Under convexity assumptions, our method is based on a simplicial partition of the state space. Bounds on the value function provide guidance as to where refinement should be done, if at all. Thus, the method allows for a trade-off between solution time and accuracy. The proposed scheme is experimented in the particular context of hydroelectric production across multiple reservoirs

    Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

    Get PDF
    Many real-world applications can be described as large-scale games of imperfect information. To deal with these challenging domains, prior work has focused on computing Nash equilibria in a handcrafted abstraction of the domain. In this paper we introduce the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge. Our method combines fictitious self-play with deep reinforcement learning. When applied to Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium, whereas common reinforcement learning methods diverged. In Limit Texas Holdem, a poker game of real-world scale, NFSP learnt a strategy that approached the performance of state-of-the-art, superhuman algorithms based on significant domain expertise.Comment: updated version, incorporating conference feedbac

    Joint dynamic probabilistic constraints with projected linear decision rules

    Get PDF
    We consider multistage stochastic linear optimization problems combining joint dynamic probabilistic constraints with hard constraints. We develop a method for projecting decision rules onto hard constraints of wait-and-see type. We establish the relation between the original (infinite dimensional) problem and approximating problems working with projections from different subclasses of decision policies. Considering the subclass of linear decision rules and a generalized linear model for the underlying stochastic process with noises that are Gaussian or truncated Gaussian, we show that the value and gradient of the objective and constraint functions of the approximating problems can be computed analytically
    corecore