55 research outputs found
On Reward Structures of Markov Decision Processes
A Markov decision process can be parameterized by a transition kernel and a
reward function. Both play essential roles in the study of reinforcement
learning as evidenced by their presence in the Bellman equations. In our
inquiry of various kinds of "costs" associated with reinforcement learning
inspired by the demands in robotic applications, rewards are central to
understanding the structure of a Markov decision process and reward-centric
notions can elucidate important concepts in reinforcement learning.
Specifically, we study the sample complexity of policy evaluation and develop
a novel estimator with an instance-specific error bound of
for estimating a single state value. Under
the online regret minimization setting, we refine the transition-based MDP
constant, diameter, into a reward-based constant, maximum expected hitting
cost, and with it, provide a theoretical explanation for how a well-known
technique, potential-based reward shaping, could accelerate learning with
expert knowledge. In an attempt to study safe reinforcement learning, we model
hazardous environments with irrecoverability and proposed a quantitative notion
of safe learning via reset efficiency. In this setting, we modify a classic
algorithm to account for resets achieving promising preliminary numerical
results. Lastly, for MDPs with multiple reward functions, we develop a planning
algorithm that computationally efficiently finds Pareto-optimal stochastic
policies.Comment: This PhD thesis draws heavily from arXiv:1907.02114 and
arXiv:2002.06299; minor edit
Perturbation theory for semi-Markov control problems
In earlier work, the authors considered the perturbation of systems undergoing Markov processes in which the times between two consecutive decision time points were equidistant. They now consider perturbations of processes for which the times between transition are random variables. These are called semi-Markov processes
Analysis of Timed and Long-Run Objectives for Markov Automata
Markov automata (MAs) extend labelled transition systems with random delays
and probabilistic branching. Action-labelled transitions are instantaneous and
yield a distribution over states, whereas timed transitions impose a random
delay governed by an exponential distribution. MAs are thus a nondeterministic
variation of continuous-time Markov chains. MAs are compositional and are used
to provide a semantics for engineering frameworks such as (dynamic) fault
trees, (generalised) stochastic Petri nets, and the Architecture Analysis &
Design Language (AADL). This paper considers the quantitative analysis of MAs.
We consider three objectives: expected time, long-run average, and timed
(interval) reachability. Expected time objectives focus on determining the
minimal (or maximal) expected time to reach a set of states. Long-run
objectives determine the fraction of time to be in a set of states when
considering an infinite time horizon. Timed reachability objectives are about
computing the probability to reach a set of states within a given time
interval. This paper presents the foundations and details of the algorithms and
their correctness proofs. We report on several case studies conducted using a
prototypical tool implementation of the algorithms, driven by the MAPA
modelling language for efficiently generating MAs.Comment: arXiv admin note: substantial text overlap with arXiv:1305.705
Discrete-time controlled markov processes with average cost criterion: a survey
This work is a survey of the average cost control problem for discrete-time Markov processes. The authors have attempted to put together a comprehensive account of the considerable research on this problem over the past three decades. The exposition ranges from finite to Borel state and action spaces and includes a variety of methodologies to find and characterize optimal policies. The authors have included a brief historical perspective of the research efforts in this area and have compiled a substantial yet not exhaustive bibliography. The authors have also identified several important questions that are still open to investigation
- …