Search CORE

29 research outputs found

Discrete-time controlled markov processes with average cost criterion: a survey

Author: Arapostathis Aristotle
Borkar Vivek S.
Fernandez-Gaucherand Emmanuel
Ghosh Mrinal K.
Marcus Steven I.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/03/1993
Field of study

This work is a survey of the average cost control problem for discrete-time Markov processes. The authors have attempted to put together a comprehensive account of the considerable research on this problem over the past three decades. The exposition ranges from finite to Borel state and action spaces and includes a variety of methodologies to find and characterize optimal policies. The authors have included a brief historical perspective of the research efforts in this area and have compiled a substantial yet not exhaustive bibliography. The authors have also identified several important questions that are still open to investigation

N—Person Stochastic Games: Extensions of the Finite State Space Case and Correlation

Author: A Federgruen
A Jaskiewicz
AS Nowak
AS Nowak
AS Nowak
AS Nowak
AS Nowak
AS Nowak
AS Nowak
AS Nowak
AS Nowak
AS Nowak
C Castaing
C Harris
CJ Himmelberg
CJ Himmelberg
CJ Himmelberg
CJ Himmelberg
D Duffle
DP Bertsekas
E Altman
E Altman
E Altman
E Solan
F Forges
FM Spieksma
H-U Kiienle
H-U Küenle
IE Glicksberg
J-F Mertens
J-F Mertens
J-F Mertens
K Kuratowski
LD Brown
LI Sennott
LO Curtat
N Dunford
O Hernández-Lerma
O Passchier
P Billingsley
PK Dutta
R Amir
SP Meyn
SP Meyn
T Parthasarathy
T Parthasarathy
TB Bielecki
U Rieder
VS Borkar
W Whitt
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

In this chapter, we present a framework for m-person stochastic games with an infinite state space. Our main purpose is to present a correlated equilibrium theorem proved by Nowak and Raghavan [42] for discounted stochastic games with a measurable state space, where the correlation o

CiteSeerX

Crossref

Rolling Horizon Procedure on Controlled Semi-Markov Models. The Discounted Case

Author: Di Marco Silvia
Jean-Marie Alain
Vecchia Eugenio Della
Publication venue
Publication date: 12/07/2022
Field of study

We study the behavior of the rolling horizon procedure for semi-Markov decision processes, with infinite-horizon discounted reward, when the state space is a Borel set and the action spaces are considered compact. We prove the convergence of the rewards produced by the rolling horizon policies to the optimal reward function, when the horizon length tends to infinity, under different assumptions on the instantaneous reward function. The approach is based on extensions of the results obtained in [7] for the discrete-time Markov decision process case and in [3] for the case of discrete-time Markov games. Finally, we also analyse the performance of an approximate rolling horizon procedure.Sociedad Argentina de Informática e Investigación Operativ

Servicio de Difusión de la Creación Intelectual

Continuous-Time Markov Decision Processes with Exponential Utility

Author: Zhang Y
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 25/11/2016
Field of study

In this paper, we consider a continuous-time Markov decision process (CTMDP) in Borel spaces, where the certainty equivalent with respect to the exponential utility of the total undiscounted cost is to be minimized. The cost rate is nonnegative. We establish the optimality equation. Under the compactness-continuity condition, we show the existence of a deterministic stationary optimal policy. We reduce the risk-sensitive CTMDP problem to an equivalent risk-sensitive discrete-time Markov decision process, which is with the same state and action spaces as the original CTMDP. In particular, the value iteration algorithm for the CTMDP problem follows from this reduction. We essentially do not need to impose a condition on the growth of the transition and cost rate in the state, and the controlled process could be explosive

arXiv.org e-Print Archive

University of Liverpool Repository

University of Birmingham Research Portal

Verification and Control of Partially Observable Probabilistic Real-Time Systems

Author: B Finkbeiner
C Baier
C Baier
F Cassez
G Behrmann
G Norman
G Shani
M Kang
M Kwiatkowska
M Kwiatkowska
O Madani
P Bouyer
P Bouyer
P Černý
R Alur
S Giro
TA Henzinger
W Lovejoy
Publication venue
Publication date: 01/01/2015
Field of study

We propose automated techniques for the verification and control of probabilistic real-time systems that are only partially observable. To formally model such systems, we define an extension of probabilistic timed automata in which local states are partially visible to an observer or controller. We give a probabilistic temporal logic that can express a range of quantitative properties of these models, relating to the probability of an event's occurrence or the expected value of a reward measure. We then propose techniques to either verify that such a property holds or to synthesise a controller for the model which makes it true. Our approach is based on an integer discretisation of the model's dense-time behaviour and a grid-based abstraction of the uncountable belief space induced by partial observability. The latter is necessarily approximate since the underlying problem is undecidable, however we show how both lower and upper bounds on numerical results can be generated. We illustrate the effectiveness of the approach by implementing it in the PRISM model checker and applying it to several case studies, from the domains of computer security and task scheduling

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Birmingham Research Portal

Enlighten

Some contributions to Markov decision processes

Author: Chu Shanyun
Publication venue
Publication date
Field of study

In a nutshell, this thesis studies discrete-time Markov decision processes (MDPs) on Borel Spaces, with possibly unbounded costs, and both expected (discounted) total cost and long-run expected average cost criteria. In Chapter 2, we systematically investigate a constrained absorbing MDP with expected total cost criterion and possibly unbounded (from both above and below) cost functions. We apply the convex analytic approach to derive the optimality and duality results, along with the existence of an optimal finite mixing policy. We also provide mild conditions under which a general constrained MDP model with state-action-dependent discount factors can be equivalently transformed into an absorbing MDP model. Chapter 3 treats a more constrained absorbing MDP, as compared with that in Chapter 2. The dynamic programming approach is applied to a reformulated unconstrained MDP model and the optimality results are obtained. In addition, the correspondence between policies in the original model and the reformulated one is illustrated. In Chapter 4, we attempt to extend the dynamic programming approach for standard MDPs with expected total cost criterion to the case, where the (iterated) coherent risk measure of the cost is taken as the performance measure to be minimized. The cost function under our consideration is allowed to be unbounded from the below, and possibly arbitrarily unbounded from the above. Under a fairly weak version of continuity-compactness conditions, we derive the optimality results for both the finite and infinite horizon cases, and establish value iteration as well as policy iteration algorithms. The standard MDP and the iterated conditional value-at-risk of the cost function are illustrated as two examples. Chapter 5 and 6 tackle MDPs with long-run expected average cost criterion. In Chapter 5, we consider a constrained MDP with possibly unbounded (from both above and below) cost functions. Under Lyapunov-like conditions, we show the sufficiency of stable policies to the concerned constrained problem. Furthermore, we introduce the corresponding space of performance vectors and manage to characterize each of its extreme points with a deterministic stationary policy. Finally, the existence of an optimal finite mixing policy is justified. Chapter 6 concerns an unconstrained MDP with the cost functions unbounded from the below and possibly arbitrarily unbounded from the above. We provide a detailed discussion on the issue of sufficient policies in the denumerable case, establish the average cost optimality inequality (ACOI) and show the existence of an optimal deterministic stationary policy. In Chapter 7, an inventory-production system is taken as an example of real-world applications to illustrate the main results in Chapter 2 and 5

University of Liverpool Repository

The importance of bias-terms for error bounds and comparison results

Author: Dijk N.M.van
Publication venue
Publication date: 01/01/1989
Field of study

VU Research Portal

Verification and control of partially observable probabilistic systems

Author: Norman Gethin
Parker David
Zou Xueyi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

We present automated techniques for the verification and control of partially observable, probabilistic systems for both discrete and dense models of time. For the discrete-time case, we formally model these systems using partially observable Markov decision processes; for dense time, we propose an extension of probabilistic timed automata in which local states are partially visible to an observer or controller. We give probabilistic temporal logics that can express a range of quantitative properties of these models, relating to the probability of an event’s occurrence or the expected value of a reward measure. We then propose techniques to either verify that such a property holds or synthesise a controller for the model which makes it true. Our approach is based on a grid-based abstraction of the uncountable belief space induced by partial observability and, for dense-time models, an integer discretisation of real-time behaviour. The former is necessarily approximate since the underlying problem is undecidable, however we show how both lower and upper bounds on numerical results can be generated. We illustrate the effectiveness of the approach by implementing it in the PRISM model checker and applying it to several case studies from the domains of task and network scheduling, computer security and planning

Springer - Publisher Connector

University of Birmingham Research Portal

Enlighten