Search CORE

12 research outputs found

Stability estimating in optimal stopping problem

Author: Zaitseva Elena
Publication venue: Institute of Information Theory and Automation AS CR
Publication date: 01/01/2008
Field of study

summary:We consider the optimal stopping problem for a discrete-time Markov process on a Borel state space

X

. It is supposed that an unknown transition probability

p(\cdot |x)

x\in X

, is approximated by the transition probability

\widetilde{p}(\cdot |x)

x\in X

, and the stopping rule

\widetilde{\tau }_*

, optimal for

\widetilde{p}

, is applied to the process governed by

p

. We found an upper bound for the difference between the total expected cost, resulting when applying

\widetilde{\tau }_*

, and the minimal total expected cost. The bound given is a constant times

\displaystyle \sup \nolimits _{x\in X}\Vert p(\cdot |x)-\widetilde{p}(\cdot |x)\Vert

, where

\Vert \cdot \Vert

is the total variation norm

Institute of Mathematics AS CR, v. v. i.

Distributionally Robust Markov Decision Processes and their Connection to Risk Measures

Author: Bäuerle Nicole
Glauner Alexander
Publication venue
Publication date: 26/07/2020
Field of study

We consider robust Markov Decision Processes with Borel state and action spaces, unbounded cost and finite time horizon. Our formulation leads to a Stackelberg game against nature. Under integrability, continuity and compactness assumptions we derive a robust cost iteration for a fixed policy of the decision maker and a value iteration for the robust optimization problem. Moreover, we show the existence of deterministic optimal policies for both players. This is in contrast to classical zero-sum games. In case the state space is the real line we show under some convexity assumptions that the interchange of supremum and infimum is possible with the help of Sion's minimax Theorem. Further, we consider the problem with special ambiguity sets. In particular we are able to derive some cases where the robust optimization problem coincides with the minimization of a coherent risk measure. In the final section we discuss two applications: A robust LQ problem and a robust problem for managing regenerative energy

arXiv.org e-Print Archive

First-order sensitivity of the optimal value in a Markov decision model with respect to deviations in the transition probability function

Author: Kern Patrick
Simroth Axel
Zähle Henryk
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2020
Field of study

Markov decision models (MDM) used in practical applications are most often less complex than the underlying ‘true’ MDM. The reduction of model complexity is performed for several reasons. However, it is obviously of interest to know what kind of model reduction is reasonable (in regard to the optimal value) and what kind is not. In this article we propose a way how to address this question. We introduce a sort of derivative of the optimal value as a function of the transition probabilities, which can be used to measure the (first-order) sensitivity of the optimal value w.r.t. changes in the transition probabilities. ‘Differentiability’ is obtained for a fairly broad class of MDMs, and the ‘derivative’ is specified explicitly. Our theoretical findings are illustrated by means of optimization problems in inventory control and mathematical finance

Acronym

Robustness of Stochastic Optimal Control to Approximate Diffusion Models under Several Cost Evaluation Criteria

Author: Pradhan Somnath
Yuksel Serdar
Publication venue
Publication date: 15/09/2023
Field of study

In control theory, typically a nominal model is assumed based on which an optimal control is designed and then applied to an actual (true) system. This gives rise to the problem of performance loss due to the mismatch between the true model and the assumed model. A robustness problem in this context is to show that the error due to the mismatch between a true model and an assumed model decreases to zero as the assumed model approaches the true model. We study this problem when the state dynamics of the system are governed by controlled diffusion processes. In particular, we will discuss continuity and robustness properties of finite horizon and infinite-horizon

\alpha

-discounted/ergodic optimal control problems for a general class of non-degenerate controlled diffusion processes, as well as for optimal control up to an exit time. Under a general set of assumptions and a convergence criterion on the models, we first establish that the optimal value of the approximate model converges to the optimal value of the true model. We then establish that the error due to mismatch that occurs by application of a control policy, designed for an incorrectly estimated model, to a true model decreases to zero as the incorrect model approaches the true model. We will see that, compared to related results in the discrete-time setup, the continuous-time theory will let us utilize the strong regularity properties of solutions to optimality (HJB) equations, via the theory of uniformly elliptic PDEs, to arrive at strong continuity and robustness properties.Comment: 33 page

arXiv.org e-Print Archive

Estabilidad de procesos de control de Markov para el caso descontado

Author: FRANCISCO SERGIO SALEM SILVA
Publication venue
Publication date: 26/09/2000
Field of study

BINDANI

Tractable POMDP-planning for robots with complex non-linear dynamics

Author: Hoerger Marcus
Publication venue: 'University of Queensland Library'
Publication date: 16/03/2020
Field of study

Planning under partial observability is an essential capability of autonomous robots. While robots operate in the real world, they are inherently subject to various uncertainties such a control and sensing errors, and limited information regarding the operating environment.Conceptually these type of planning problems can be solved in a principled manner when framed as a Partially Observable Markov Decision Process (POMDP). POMDPs model the aforementioned uncertainties as conditional probability functions and estimate the state of the system as probability functions over the state space, called beliefs. Instead of computing the best strategy with respect to single states, POMDP solvers compute the best strategy with respect to beliefs. Solving a POMDP exactly is computationally intractable in general.However, in the past two decades we have seen tremendous progress in the development of approximately optimal solvers that trade optimality for computational tractability. Despite this progress, approximately solving POMDPs for systems with complex non-linear dynamics remains challenging. Most state-of-the-art solvers rely on a large number of expensive forward simulations of the system to find an approximate-optimal strategy. For systems with complex non-linear dynamics that admit no closed-form solution, this strategy can become prohibitively expensive. Another difficulty in applying POMDPs to physical robots with complex transition dynamics is the fact that almost all implementations of state-of-the-art on-line POMDP solvers restrict the user to specific data structures for the POMDP model, and the model has to be hard-coded within the solver implementation. This, in turn, severely hinders the process of applying POMDPs to physical robots.In this thesis we aim to make POMDPs more practical for realistic robotic motion planning tasks under partial observability. We show that systematic approximations of complex, non-linear transition dynamics can be used to design on-line POMDP solvers that are more efficient than current solvers. Furthermore, we propose a new software-framework that supports the user in modeling complex planning problems under uncertainty with minimal implementation effort

University of Queensland eSpace

State-similarity metrics for continuous Markov decision processes

Author: Ferns Norman Francis.
Publication venue: McGill University
Publication date
Field of study

In recent years, various metrics have been developed for measuring the similarity of states in probabilistic transition systems (Desharnais et al., 1999; van Breugel & Worrell, 2001a). In the context of Markov decision processes, we have devised metrics providing a robust quantitative analogue of bisimulation. Most importantly, the metric distances can be used to bound the differences in the optimal value function that is integral to reinforcement learning (Ferns et al. 2004; 2005). More recently, we have discovered an efficient algorithm to calculate distances in the case of finite systems (Ferns et al., 2006). In this thesis, we seek to properly extend state-similarity metrics to Markov decision processes with continuous state spaces both in theory and in practice. In particular, we provide the first distance-estimation scheme for metrics based on bisimulation for continuous probabilistic transition systems. Our work, based on statistical sampling and infinite dimensional linear programming, is a crucial first step in real-world planning; many practical problems are continuous in nature, e.g. robot navigation, and often a parametric model or crude finite approximation does not suffice. State-similarity metrics allow us to reason about the quality of replacing one model with another. In practice, they can be used directly to aggregate states

eScholarship@McGill