Search CORE

55 research outputs found

On Reward Structures of Markov Decision Processes

Author: Dai Falcon Z.
Publication venue
Publication date: 31/08/2023
Field of study

A Markov decision process can be parameterized by a transition kernel and a reward function. Both play essential roles in the study of reinforcement learning as evidenced by their presence in the Bellman equations. In our inquiry of various kinds of "costs" associated with reinforcement learning inspired by the demands in robotic applications, rewards are central to understanding the structure of a Markov decision process and reward-centric notions can elucidate important concepts in reinforcement learning. Specifically, we study the sample complexity of policy evaluation and develop a novel estimator with an instance-specific error bound of

\tilde{O}(\sqrt{\frac{\tau_s}{n}})

for estimating a single state value. Under the online regret minimization setting, we refine the transition-based MDP constant, diameter, into a reward-based constant, maximum expected hitting cost, and with it, provide a theoretical explanation for how a well-known technique, potential-based reward shaping, could accelerate learning with expert knowledge. In an attempt to study safe reinforcement learning, we model hazardous environments with irrecoverability and proposed a quantitative notion of safe learning via reset efficiency. In this setting, we modify a classic algorithm to account for resets achieving promising preliminary numerical results. Lastly, for MDPs with multiple reward functions, we develop a planning algorithm that computationally efficiently finds Pareto-optimal stochastic policies.Comment: This PhD thesis draws heavily from arXiv:1907.02114 and arXiv:2002.06299; minor edit

arXiv.org e-Print Archive

Simultaneous determination of production and maintenance schedules using in-line equipment condition and yield information

Author: Ben-Daya
Boukas
Dedopoulos
Denardo
Derman
Filliger
Freimer
Gilardoni
Groenevelt
Groenevelt
Heyman
Hong
Hopp
Howard
Hunter
Iravani
Ivy
Jianyong
Kamien
Kao
Kenné
Kim
Krass
Lee
Lee
Lee
Lou
Makis
Meller
Nurani
Porteus
Puterman
Rosenblatt
Sennott
Sheu
Sloan
Sloan
Song
Su
Takahashi
Taylor
Valdez-Flores
Van der Duyn Schouten
Wang
Wang
Yano
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

Perturbation theory for semi-Markov control problems

Author: Abbad Mohammed
Filar Jerzy A
Publication venue: Institute of Electrical and Electronic Engineers
Publication date: 01/01/1991
Field of study

In earlier work, the authors considered the perturbation of systems undergoing Markov processes in which the times between two consecutive decision time points were equidistant. They now consider perturbations of processes for which the times between transition are random variables. These are called semi-Markov processes

Flinders Academic Commons

Analysis of Timed and Long-Run Objectives for Markov Automata

Author: Guck Dennis
Hatefi Hassan
Hermanns Holger
Katoen Joost-Pieter
Timmer Mark
Publication venue: 'Logical Methods in Computer Science e.V.'
Publication date: 01/01/2014
Field of study

Markov automata (MAs) extend labelled transition systems with random delays and probabilistic branching. Action-labelled transitions are instantaneous and yield a distribution over states, whereas timed transitions impose a random delay governed by an exponential distribution. MAs are thus a nondeterministic variation of continuous-time Markov chains. MAs are compositional and are used to provide a semantics for engineering frameworks such as (dynamic) fault trees, (generalised) stochastic Petri nets, and the Architecture Analysis & Design Language (AADL). This paper considers the quantitative analysis of MAs. We consider three objectives: expected time, long-run average, and timed (interval) reachability. Expected time objectives focus on determining the minimal (or maximal) expected time to reach a set of states. Long-run objectives determine the fraction of time to be in a set of states when considering an infinite time horizon. Timed reachability objectives are about computing the probability to reach a set of states within a given time interval. This paper presents the foundations and details of the algorithms and their correctness proofs. We report on several case studies conducted using a prototypical tool implementation of the algorithms, driven by the MAPA modelling language for efficiently generating MAs.Comment: arXiv admin note: substantial text overlap with arXiv:1305.705

arXiv.org e-Print Archive

Episciences.org

University of Twente Research Information

Discrete-time controlled markov processes with average cost criterion: a survey

Author: Arapostathis Aristotle
Borkar Vivek S.
Fernandez-Gaucherand Emmanuel
Ghosh Mrinal K.
Marcus Steven I.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/03/1993
Field of study

This work is a survey of the average cost control problem for discrete-time Markov processes. The authors have attempted to put together a comprehensive account of the considerable research on this problem over the past three decades. The exposition ranges from finite to Borel state and action spaces and includes a variety of methodologies to find and characterize optimal policies. The authors have included a brief historical perspective of the research efforts in this area and have compiled a substantial yet not exhaustive bibliography. The authors have also identified several important questions that are still open to investigation