12 research outputs found
Stability estimating in optimal stopping problem
summary:We consider the optimal stopping problem for a discrete-time Markov process on a Borel state space . It is supposed that an unknown transition probability , , is approximated by the transition probability , , and the stopping rule , optimal for , is applied to the process governed by . We found an upper bound for the difference between the total expected cost, resulting when applying , and the minimal total expected cost. The bound given is a constant times , where is the total variation norm
Distributionally Robust Markov Decision Processes and their Connection to Risk Measures
We consider robust Markov Decision Processes with Borel state and action
spaces, unbounded cost and finite time horizon. Our formulation leads to a
Stackelberg game against nature. Under integrability, continuity and
compactness assumptions we derive a robust cost iteration for a fixed policy of
the decision maker and a value iteration for the robust optimization problem.
Moreover, we show the existence of deterministic optimal policies for both
players. This is in contrast to classical zero-sum games. In case the state
space is the real line we show under some convexity assumptions that the
interchange of supremum and infimum is possible with the help of Sion's minimax
Theorem. Further, we consider the problem with special ambiguity sets. In
particular we are able to derive some cases where the robust optimization
problem coincides with the minimization of a coherent risk measure. In the
final section we discuss two applications: A robust LQ problem and a robust
problem for managing regenerative energy
First-order sensitivity of the optimal value in a Markov decision model with respect to deviations in the transition probability function
Markov decision models (MDM) used in practical applications are most often less complex than the underlying ‘true’ MDM. The reduction of model complexity is performed for several reasons. However, it is obviously of interest to know what kind of model reduction is reasonable (in regard to the optimal value) and what kind is not. In this article we propose a way how to address this question. We introduce a sort of derivative of the optimal value as a function of the transition probabilities, which can be used to measure the (first-order) sensitivity of the optimal value w.r.t. changes in the transition probabilities. ‘Differentiability’ is obtained for a fairly broad class of MDMs, and the ‘derivative’ is specified explicitly. Our theoretical findings are illustrated by means of optimization problems in inventory control and mathematical finance
Robustness of Stochastic Optimal Control to Approximate Diffusion Models under Several Cost Evaluation Criteria
In control theory, typically a nominal model is assumed based on which an
optimal control is designed and then applied to an actual (true) system. This
gives rise to the problem of performance loss due to the mismatch between the
true model and the assumed model. A robustness problem in this context is to
show that the error due to the mismatch between a true model and an assumed
model decreases to zero as the assumed model approaches the true model. We
study this problem when the state dynamics of the system are governed by
controlled diffusion processes. In particular, we will discuss continuity and
robustness properties of finite horizon and infinite-horizon
-discounted/ergodic optimal control problems for a general class of
non-degenerate controlled diffusion processes, as well as for optimal control
up to an exit time. Under a general set of assumptions and a convergence
criterion on the models, we first establish that the optimal value of the
approximate model converges to the optimal value of the true model. We then
establish that the error due to mismatch that occurs by application of a
control policy, designed for an incorrectly estimated model, to a true model
decreases to zero as the incorrect model approaches the true model. We will see
that, compared to related results in the discrete-time setup, the
continuous-time theory will let us utilize the strong regularity properties of
solutions to optimality (HJB) equations, via the theory of uniformly elliptic
PDEs, to arrive at strong continuity and robustness properties.Comment: 33 page
Tractable POMDP-planning for robots with complex non-linear dynamics
Planning under partial observability is an essential capability of autonomous robots. While robots operate in the real world, they are inherently subject to various uncertainties such a control and sensing errors, and limited information regarding the operating environment.Conceptually these type of planning problems can be solved in a principled manner when framed as a Partially Observable Markov Decision Process (POMDP). POMDPs model the aforementioned uncertainties as conditional probability functions and estimate the state of the system as probability functions over the state space, called beliefs. Instead of computing the best strategy with respect to single states, POMDP solvers compute the best strategy with respect to beliefs. Solving a POMDP exactly is computationally intractable in general.However, in the past two decades we have seen tremendous progress in the development of approximately optimal solvers that trade optimality for computational tractability. Despite this progress, approximately solving POMDPs for systems with complex non-linear dynamics remains challenging. Most state-of-the-art solvers rely on a large number of expensive forward simulations of the system to find an approximate-optimal strategy. For systems with complex non-linear dynamics that admit no closed-form solution, this strategy can become prohibitively expensive. Another difficulty in applying POMDPs to physical robots with complex transition dynamics is the fact that almost all implementations of state-of-the-art on-line POMDP solvers restrict the user to specific data structures for the POMDP model, and the model has to be hard-coded within the solver implementation. This, in turn, severely hinders the process of applying POMDPs to physical robots.In this thesis we aim to make POMDPs more practical for realistic robotic motion planning tasks under partial observability. We show that systematic approximations of complex, non-linear transition dynamics can be used to design on-line POMDP solvers that are more efficient than current solvers. Furthermore, we propose a new software-framework that supports the user in modeling complex planning problems under uncertainty with minimal implementation effort
State-similarity metrics for continuous Markov decision processes
In recent years, various metrics have been developed for measuring the similarity of states in probabilistic transition systems (Desharnais et al., 1999; van Breugel & Worrell, 2001a). In the context of Markov decision processes, we have devised metrics providing a robust quantitative analogue of bisimulation. Most importantly, the metric distances can be used to bound the differences in the optimal value function that is integral to reinforcement learning (Ferns et al. 2004; 2005). More recently, we have discovered an efficient algorithm to calculate distances in the case of finite systems (Ferns et al., 2006). In this thesis, we seek to properly extend state-similarity metrics to Markov decision processes with continuous state spaces both in theory and in practice. In particular, we provide the first distance-estimation scheme for metrics based on bisimulation for continuous probabilistic transition systems. Our work, based on statistical sampling and infinite dimensional linear programming, is a crucial first step in real-world planning; many practical problems are continuous in nature, e.g. robot navigation, and often a parametric model or crude finite approximation does not suffice. State-similarity metrics allow us to reason about the quality of replacing one model with another. In practice, they can be used directly to aggregate states