Search CORE

1,503 research outputs found

Successive approximations for Markov decision processes and Markov games with unbounded rewards

Author: van Nunen J.A.E.E.
Wessels J.
Publication venue: Technische Hogeschool Eindhoven
Publication date: 01/01/1978
Field of study

The aim of this paper is to give an overview of recent developments in the area of successive approximations for Markov decision processes and Markov games. We will emphasize two aspects, viz. the conditions under which successive approximations converge in some strong sense and variations of these methods which diminish the amount of computational work to be executed. With respect to the first aspect it will be shown how much unboundedness of the rewards may be allowed without violation of the convergence. With respect to the second aspect we will present four ideas, that can be applied in conjunction, which may diminish the amount of work to be done. These ideas are: I. the use of the actual convergence of the iterates for the construction of upper and lower bounds (McQueen bounds), 2. the use of alternative policy improvement procedures (based on stopping times), 3. a better evaluation of the values of actual policies in each iteration step by a value oriented approach, 4. the elimination of suboptimal actions not only permanently, but also temporarily. The general presentation ~s given for Markov decision processes with a final section devoted to the possibilities of extension to Markov games

Pure OAI Repository

Robust Markov Decision Processes

Author: Berç Rustem
Daniel Kuhn
Wolfram Wiesemann
Publication venue
Publication date
Field of study

Markov decision processes (MDPs) are powerful tools for decision making in uncertain dynamic environments. However, the solutions of MDPs are of limited practical use due to their sensitivity to distributional model parameters, which are typically unknown and have to be estimated by the decision maker. To counter the detrimental effects of estimation errors, we consider robust MDPs that offer probabilistic guarantees in view of the unknown parameters. To this end, we assume that an observation history of the MDP is available. Based on this history, we derive a confidence region that contains the unknown parameters with a pre-specified probability 1-ß. Afterwards, we determine a policy that attains the highest worst-case performance over this confidence region. By construction, this policy achieves or exceeds its worst-case performance with a confidence of at least 1 - ß. Our method involves the solution of tractable conic programs of moderate size.

Research Papers in Economics

Play selection in football : a case study in neuro-dynamic programming

Author
Publication venue: Massachusetts Institute of Technology, Laboratory for Information and Decision Systems
Publication date: 01/01/1996
Field of study

Includes bibliographical references (p. 34-35).Supported by the US Army Research Office. AASERT-DAAH04-93-GD169Stephen D. Patek, Dimitri P. Bertsekas

DSpace@MIT

Solution algorithms for Markov decision processes and sequential analysis

Author: Schmid John Richard
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/1973
Field of study

Digital Repository @ Iowa State University (ISU)

Multigrid methods for two-player zero-sum stochastic games

Author: Akian
Akian
Altman
Bank
Bardi
Bardi
Barles
Başar
Bellman
Bensoussan
Bensoussan
Berman
Bertsekas
Bokanowski
Bonnans
Brandt
Brandt
Cochet-Terrasson
Crandall
Davis
Denardo
Denardo
Elliott
Falgout
Fearnley
Filar
Fleming
Fleming
Fleming
Friedman
Friedmann
Hoffman
Hoppe
Hoppe
Howard
Kushner
Kushner
Lions
McEneaney
Mense
Munos
Neyman
Notay
Puterman
Puterman
Rockafellar
Ruge
Shapley
Shimkin
Sorin
Świpolhkech
Publication venue: 'Wiley'
Publication date: 22/11/2011
Field of study

We present a fast numerical algorithm for large scale zero-sum stochastic games with perfect information, which combines policy iteration and algebraic multigrid methods. This algorithm can be applied either to a true finite state space zero-sum two player game or to the discretization of an Isaacs equation. We present numerical tests on discretizations of Isaacs equations or variational inequalities. We also present a full multi-level policy iteration, similar to FMG, which allows to improve substantially the computation time for solving some variational inequalities.Comment: 31 page

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL-Polytechnique

On Markov games

Author: Wal van der, J.
Wessels J.
Publication venue: Technische Hogeschool Eindhoven
Publication date: 01/01/1975
Field of study

In the paper it is demonstrated, how a dynamic programming approach may be useful for the analysis of Markov games. Markov games with finitely many stages are dealt with extensively. The existence of optimal Markov strategies is proven for finite stage Markov games using a shortcut of a proof by Derman for the analogous result for Markov decision processes. For Markov games with a countably infinite number of stages some results are summarized. Here again the results and the methods of prove have much in common with results and proofs for Markov decision processes. Actually the theory of Markov games is a generalisation. The paper contains short introductions into the theories of matrix games and tree games

Pure OAI Repository

Certified Reinforcement Learning with Logic Guidance

Author: Abate Alessandro
Hasanbeig Mohammadhosein
Kroening Daniel
Publication venue
Publication date: 10/02/2020
Field of study

This paper proposes the first model-free Reinforcement Learning (RL) framework to synthesise policies for unknown, and continuous-state Markov Decision Processes (MDPs), such that a given linear temporal property is satisfied. We convert the given property into a Limit Deterministic Buchi Automaton (LDBA), namely a finite-state machine expressing the property. Exploiting the structure of the LDBA, we shape a synchronous reward function on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces that probabilistically satisfy the linear temporal property. This probability (certificate) is also calculated in parallel with policy learning when the state space of the MDP is finite: as such, the RL algorithm produces a policy that is certified with respect to the property. Under the assumption of finite state space, theoretical guarantees are provided on the convergence of the RL algorithm to an optimal policy, maximising the above probability. We also show that our method produces ''best available'' control policies when the logical property cannot be satisfied. In the general case of a continuous state space, we propose a neural network architecture for RL and we empirically show that the algorithm finds satisfying policies, if there exist such policies. The performance of the proposed framework is evaluated via a set of numerical examples and benchmarks, where we observe an improvement of one order of magnitude in the number of iterations required for the policy synthesis, compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782

arXiv.org e-Print Archive

Multiscale Markov Decision Problems: Compression, Solution, and Transfer Learning

Author: Bouvrie Jake
Maggioni Mauro
Publication venue
Publication date: 05/12/2012
Field of study

Many problems in sequential decision making and stochastic control often have natural multiscale structure: sub-tasks are assembled together to accomplish complex goals. Systematically inferring and leveraging hierarchical structure, particularly beyond a single level of abstraction, has remained a longstanding challenge. We describe a fast multiscale procedure for repeatedly compressing, or homogenizing, Markov decision processes (MDPs), wherein a hierarchy of sub-problems at different scales is automatically determined. Coarsened MDPs are themselves independent, deterministic MDPs, and may be solved using existing algorithms. The multiscale representation delivered by this procedure decouples sub-tasks from each other and can lead to substantial improvements in convergence rates both locally within sub-problems and globally across sub-problems, yielding significant computational savings. A second fundamental aspect of this work is that these multiscale decompositions yield new transfer opportunities across different problems, where solutions of sub-tasks at different levels of the hierarchy may be amenable to transfer to new problems. Localized transfer of policies and potential operators at arbitrary scales is emphasized. Finally, we demonstrate compression and transfer in a collection of illustrative domains, including examples involving discrete and continuous statespaces.Comment: 86 pages, 15 figure

arXiv.org e-Print Archive

CiteSeerX