Search CORE

5,516 research outputs found

Gradient-Bounded Dynamic Programming with Submodular and Concave Extensible Value Functions

Author: Goulart Paul
Lebedev Denis
Margellos Kostas
Publication venue
Publication date: 22/05/2020
Field of study

We consider dynamic programming problems with finite, discrete-time horizons and prohibitively high-dimensional, discrete state-spaces for direct computation of the value function from the Bellman equation. For the case that the value function of the dynamic program is concave extensible and submodular in its state-space, we present a new algorithm that computes deterministic upper and stochastic lower bounds of the value function similar to dual dynamic programming. We then show that the proposed algorithm terminates after a finite number of iterations. Finally, we demonstrate the efficacy of our approach on a high-dimensional numerical example from delivery slot pricing in attended home delivery.Comment: 6 pages, 2 figures, accepted for IFAC World Congress 202

arXiv.org e-Print Archive

Oxford University Research Archive

From Infinite to Finite Programs: Explicit Error Bounds with Applications to Approximate Dynamic Programming

Author: Esfahani Peyman Mohajerin
Kuhn Daniel
Lygeros John
Sutter Tobias
Publication venue
Publication date: 20/02/2017
Field of study

We consider linear programming (LP) problems in infinite dimensional spaces that are in general computationally intractable. Under suitable assumptions, we develop an approximation bridge from the infinite-dimensional LP to tractable finite convex programs in which the performance of the approximation is quantified explicitly. To this end, we adopt the recent developments in two areas of randomized optimization and first order methods, leading to a priori as well as a posterior performance guarantees. We illustrate the generality and implications of our theoretical results in the special case of the long-run average cost and discounted cost optimal control problems for Markov decision processes on Borel spaces. The applicability of the theoretical results is demonstrated through a constrained linear quadratic optimal control problem and a fisheries management problem.Comment: 30 pages, 5 figure

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Markov Decision Processes

Author: Bäuerle N.
Rieder U.
Publication venue: Springer
Publication date: 06/02/2013
Field of study

The theory of Markov Decision Processes is the theory of controlled Markov chains. Its origins can be traced back to R. Bellman and L. Shapley in the 1950\u27s. During the decades of the last century this theory has grown dramatically. It has found applications in various areas like e.g. computer science, engineering, operations research, biology and economics. In this article we give a short introduction to parts of this theory. We treat Markov Decision Processes with finite and infinite time horizon where we will restrict the presentation to the so-called (generalized) negative case. Solution algorithms like Howard\u27s policy improvement and linear programming are also explained. Various examples show the application of the theory. We treat stochastic linear-quadratic control problems, bandit problems and dividend pay-out problems

KITopen