Search CORE

35 research outputs found

Continuous-time Markov decision processes under the risk-sensitive average cost criterion

Author: Chen Xian
Wei Qingda
Publication venue
Publication date: 21/12/2015
Field of study

This paper studies continuous-time Markov decision processes under the risk-sensitive average cost criterion. The state space is a finite set, the action space is a Borel space, the cost and transition rates are bounded, and the risk-sensitivity coefficient can take arbitrary positive real numbers. Under the mild conditions, we develop a new approach to establish the existence of a solution to the risk-sensitive average cost optimality equation and obtain the existence of an optimal deterministic stationary policy.Comment: 14 page

arXiv.org e-Print Archive

Maximal Cost-Bounded Reachability Probability on Continuous-Time Markov Decision Processes

Author: Fu Hongfei
Publication venue
Publication date: 01/01/2014
Field of study

In this paper, we consider multi-dimensional maximal cost-bounded reachability probability over continuous-time Markov decision processes (CTMDPs). Our major contributions are as follows. Firstly, we derive an integral characterization which states that the maximal cost-bounded reachability probability function is the least fixed point of a system of integral equations. Secondly, we prove that the maximal cost-bounded reachability probability can be attained by a measurable deterministic cost-positional scheduler. Thirdly, we provide a numerical approximation algorithm for maximal cost-bounded reachability probability. We present these results under the setting of both early and late schedulers

arXiv.org e-Print Archive

Publikationsserver der RWTH Aachen University

Transient Reward Approximation for Continuous-Time Markov Chains

Author: Becker Bernd
Hahn Ernst Moritz
Hermanns Holger
Wimmer Ralf
Publication venue
Publication date: 01/01/2015
Field of study

We are interested in the analysis of very large continuous-time Markov chains (CTMCs) with many distinct rates. Such models arise naturally in the context of reliability analysis, e.g., of computer network performability analysis, of power grids, of computer virus vulnerability, and in the study of crowd dynamics. We use abstraction techniques together with novel algorithms for the computation of bounds on the expected final and accumulated rewards in continuous-time Markov decision processes (CTMDPs). These ingredients are combined in a partly symbolic and partly explicit (symblicit) analysis approach. In particular, we circumvent the use of multi-terminal decision diagrams, because the latter do not work well if facing a large number of different rates. We demonstrate the practical applicability and efficiency of the approach on two case studies.Comment: Accepted for publication in IEEE Transactions on Reliabilit

arXiv.org e-Print Archive

Queen's University Belfast Research Portal

Institute Of Software, Chinese Academy Of Sciences

Policy learning in Continuous-Time Markov Decision Processes using Gaussian Processes

Author: Bartocci Ezio
Bortolussi Luca
Brázdil Tomás
Milios Dimitrios
Sanguinetti Guido
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Continuous-time Markov decision processes provide a very powerful mathematical framework to solve policy-making problems in a wide range of applications, ranging from the control of populations to cyber\u2013physical systems. The key problem to solve for these models is to efficiently compute an optimal policy to control the system in order to maximise the probability of satisfying a set of temporal logic specifications. Here we introduce a novel method based on statistical model checking and an unbiased estimation of a functional gradient in the space of possible policies. Our approach presents several advantages over the classical methods based on discretisation techniques, as it does not assume the a-priori knowledge of a model that can be replaced by a black-box, and does not suffer from state-space explosion. The use of a stochastic moment-based gradient ascent algorithm to guide our search considerably improves the efficiency of learning policies and accelerates the convergence using the momentum term. We demonstrate the strong performance of our approach on two examples of non-linear population models: an epidemiology model with no permanent recovery and a queuing system with non-deterministic choice

Archivio istituzionale della ricerca - Università di Trieste

Edinburgh Research Explorer

Sissa Digital Library

A tutorial on interactive Markov chains

Author: A. Bianco
C. Baier
C. Baier
C. Eisentraut
D. Guck
D. Guck
D. Harel
D.P. Bertsekas
E. Böde
G.G.I. López
H. Boudali
H. Hansson
H. Hermanns
H.L.S. Younes
L. Alfaro de
L. Zhang
M. Bozzano
M. Bravetti
M. Kwiatkowska
M. Timmer
M.R. Neuhäußer
N. Coste
N. Coste
N. López
P. Buchholz
P. Crouzen
P.C. Kanellakis
R. Segala
T. Han
Publication venue: Springer
Publication date: 01/01/2014
Field of study

Interactive Markov chains (IMCs) constitute a powerful sto- chastic model that extends both continuous-time Markov chains and labelled transition systems. IMCs enable a wide range of modelling and analysis techniques and serve as a semantic model for many industrial and scientific formalisms, such as AADL, GSPNs and many more. Applications cover various engineering contexts ranging from industrial system-on-chip manufacturing to satellite designs. We present a survey of the state-of-the-art in modelling and analysis of IMCs.\ud We cover a set of techniques that can be utilised for compositional modelling, state space generation and reduction, and model checking. The significance of the presented material and corresponding tools is highlighted through multiple case studies

Crossref

VU Research Portal

University of Twente Research Information

The exponential cost optimality for finite horizon semi-Markov decision processes

Author: Huo Haifeng
Wen Xian
Publication venue: 'Institute of Information Theory and Automation'
Publication date: 01/01/2022
Field of study

summary:This paper considers an exponential cost optimality problem for finite horizon semi-Markov decision processes (SMDPs). The objective is to calculate an optimal policy with minimal exponential costs over the full set of policies in a finite horizon. First, under the standard regular and compact-continuity conditions, we establish the optimality equation, prove that the value function is the unique solution of the optimality equation and the existence of an optimal policy by using the minimum nonnegative solution approach. Second, we establish a new value iteration algorithm to calculate both the value function and the

\epsilon

-optimal policy. Finally, we give a computable machine maintenance system to illustrate the convergence of the algorithm

Institute of Mathematics AS CR, v. v. i.

Efficient approximation of optimal control for continuous-time Markov games

Author: Fearnley J
Rabe MN
Schewe S
Zhang L
Publication venue: 'Elsevier BV'
Publication date: 30/12/2015
Field of study

We study the time-bounded reachability problem for continuous-time Markov decision processes (CTMDPs) and games (CTMGs). Existing techniques for this problem use discretisation techniques to partition time into discrete intervals of size ε, and optimal control is approximated for each interval separately. Current techniques provide an accuracy of on each interval, which leads to an infeasibly large number of intervals. We propose a sequence of approximations that achieve accuracies of , , and , that allow us to drastically reduce the number of intervals that are considered. For CTMDPs, the performance of the resulting algorithms is comparable to the heuristic approach given by Buchholz and Schulz, while also being theoretically justified. All of our results generalise to CTMGs, where our results yield the first practically implementable algorithms for this problem. We also provide memoryless strategies for both players that achieve similar error bounds

University of Liverpool Repository

Institute Of Software, Chinese Academy Of Sciences

Formal methods for motion planning and control in dynamic and partially known environments

Author: Medina Ayala Ana Ivonne
Publication venue
Publication date: 12/03/2016
Field of study

This thesis is motivated by time and safety critical applications involving the use of autonomous vehicles to accomplish complex tasks in dynamic and partially known environments. We use temporal logic to formally express such complex tasks. Temporal logic specifications generalize the classical notions of stability and reachability widely studied within the control and hybrid systems communities. Given a model describing the motion of a robotic system in an environment and a formal task specification, the aim is to automatically synthesize a control policy that guarantees the satisfaction of the specification. This thesis presents novel control synthesis algorithms to tackle the problem of motion planning from temporal logic specifications in uncertain environments. For each one of the planning and control synthesis problems addressed in this dissertation, the proposed algorithms are implemented, evaluated, and validated thought experiments and/or simulations. The first part of this thesis focuses on a mobile robot whose success is measured by the completion of temporal logic tasks within a given period of time. In addition to such time constraints, the planning algorithm must also deal with the uncertainty that arises from the changes in the robot's workspace during task execution. In particular, we consider a robot deployed in a partitioned environment subjected to structural changes such as doors that can open and close. The motion of the robot is modeled as a continuous time Markov decision process and the robot's mission is expressed as a Continuous Stochastic Logic (CSL) formula. A complete framework to find a control strategy that satisfies a specification given as a CSL formula is introduced. The second part of this thesis addresses the synthesis of controllers that guarantee the satisfaction of a task specification expressed as a syntactically co-safe Linear Temporal Logic (scLTL) formula. In this case, uncertainty is characterized by the partial knowledge of the robot's environment. Two scenarios are considered. First, a distributed team of robots required to satisfy the specification over a set of service requests occurring at the vertices of a known graph representing the environment is examined. Second, a single agent motion planning problem from the specification over a set of properties known to be satised at the vertices of the known graph environment is studied. In both cases, we exploit the existence of o-the-shelf model checking and runtime verification tools, the efficiency of graph search algorithms, and the efficacy of exploration techniques to solve the motion planning problem constrained by the absence of complete information about the environment. The final part of this thesis extends uncertainty beyond the absence of a complete knowledge of the environment described above by considering a robot equipped with a noisy sensing system. In particular, the robot is tasked with satisfying a scLTL specification over a set of regions of interest known to be present in the environment. In such a case, although the robot is able to measure the properties characterizing such regions of interest, precisely determining the identity of these regions is not feasible. A mixed observability Markov decision process is used to represent the robot's actuation and sensing models. The control synthesis problem from scLTL formulas is then formulated as a maximum probability reachability problem on this model. The integration of dynamic programming, formal methods, and frontier-based exploration tools allow us to derive an algorithm to solve such a reachability problem

Boston University Institutional Repository (OpenBU)