Search CORE

460 research outputs found

Scalable First-Order Methods for Robust MDPs

Author: Grand-Clément Julien
Kroer Christian
Publication venue
Publication date: 14/01/2021
Field of study

Robust Markov Decision Processes (MDPs) are a powerful framework for modeling sequential decision-making problems with model uncertainty. This paper proposes the first first-order framework for solving robust MDPs. Our algorithm interleaves primal-dual first-order updates with approximate Value Iteration updates. By carefully controlling the tradeoff between the accuracy and cost of Value Iteration updates, we achieve an ergodic convergence rate of

O \left( A^{2} S^{3}\log(S)\log(\epsilon^{-1}) \epsilon^{-1} \right)

for the best choice of parameters on ellipsoidal and Kullback-Leibler

s

-rectangular uncertainty sets, where

S

and

A

is the number of states and actions, respectively. Our dependence on the number of states and actions is significantly better (by a factor of

O(A^{1.5}S^{1.5})

) than that of pure Value Iteration algorithms. In numerical experiments on ellipsoidal uncertainty sets we show that our algorithm is significantly more scalable than state-of-the-art approaches. Our framework is also the first one to solve robust MDPs with

s

-rectangular KL uncertainty sets

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Robust Markov Decision Processes

Author: Berç Rustem
Daniel Kuhn
Wolfram Wiesemann
Publication venue
Publication date
Field of study

Markov decision processes (MDPs) are powerful tools for decision making in uncertain dynamic environments. However, the solutions of MDPs are of limited practical use due to their sensitivity to distributional model parameters, which are typically unknown and have to be estimated by the decision maker. To counter the detrimental effects of estimation errors, we consider robust MDPs that offer probabilistic guarantees in view of the unknown parameters. To this end, we assume that an observation history of the MDP is available. Based on this history, we derive a confidence region that contains the unknown parameters with a pre-specified probability 1-ß. Afterwards, we determine a policy that attains the highest worst-case performance over this confidence region. By construction, this policy achieves or exceeds its worst-case performance with a confidence of at least 1 - ß. Our method involves the solution of tractable conic programs of moderate size.

Research Papers in Economics

Probabilistic Bisimulations for PCTL Model Checking of Interval MDPs

Author: Hashemi Vahid
Hatefi Hassan
Krčál Jan
Publication venue: 'Open Publishing Association'
Publication date: 10/04/2014
Field of study

Verification of PCTL properties of MDPs with convex uncertainties has been investigated recently by Puggelli et al. However, model checking algorithms typically suffer from state space explosion. In this paper, we address probabilistic bisimulation to reduce the size of such an MDPs while preserving PCTL properties it satisfies. We discuss different interpretations of uncertainty in the models which are studied in the literature and that result in two different definitions of bisimulations. We give algorithms to compute the quotients of these bisimulations in time polynomial in the size of the model and exponential in the uncertain branching. Finally, we show by a case study that large models in practice can have small branching and that a substantial state space reduction can be achieved by our approach.Comment: In Proceedings SynCoP 2014, arXiv:1403.784

arXiv.org e-Print Archive

Directory of Open Access Journals

Robust Control of Uncertain Markov Decision Processes with Temporal Logic Specifications

Author: Murray Richard M.
Topcu Ufuk
Wolff Eric M.
Publication venue: 'California Institute of Technology Library'
Publication date: 25/09/2011
Field of study

We present a method for designing robust controllers for dynamical systems with linear temporal logic specifications. We abstract the original system by a finite Markov Decision Process (MDP) that has transition probabilities in a specified uncertainty set. A robust control policy for the MDP is generated that maximizes the worst-case probability of satisfying the specification over all transition probabilities in the uncertainty set. To do this, we use a procedure from probabilistic model checking to combine the system model with an automaton representing the specification. This new MDP is then transformed into an equivalent form that satisfies assumptions for stochastic shortest path dynamic programming. A robust version of dynamic programming allows us to solve for a

\epsilon

-suboptimal robust control policy with time complexity

O(\log 1/\epsilon)

times that for the non-robust case. We then implement this control policy on the original dynamical system

CiteSeerX

Crossref

Caltech Authors