55 research outputs found

    On Reward Structures of Markov Decision Processes

    Full text link
    A Markov decision process can be parameterized by a transition kernel and a reward function. Both play essential roles in the study of reinforcement learning as evidenced by their presence in the Bellman equations. In our inquiry of various kinds of "costs" associated with reinforcement learning inspired by the demands in robotic applications, rewards are central to understanding the structure of a Markov decision process and reward-centric notions can elucidate important concepts in reinforcement learning. Specifically, we study the sample complexity of policy evaluation and develop a novel estimator with an instance-specific error bound of O~(Ï„sn)\tilde{O}(\sqrt{\frac{\tau_s}{n}}) for estimating a single state value. Under the online regret minimization setting, we refine the transition-based MDP constant, diameter, into a reward-based constant, maximum expected hitting cost, and with it, provide a theoretical explanation for how a well-known technique, potential-based reward shaping, could accelerate learning with expert knowledge. In an attempt to study safe reinforcement learning, we model hazardous environments with irrecoverability and proposed a quantitative notion of safe learning via reset efficiency. In this setting, we modify a classic algorithm to account for resets achieving promising preliminary numerical results. Lastly, for MDPs with multiple reward functions, we develop a planning algorithm that computationally efficiently finds Pareto-optimal stochastic policies.Comment: This PhD thesis draws heavily from arXiv:1907.02114 and arXiv:2002.06299; minor edit

    Perturbation theory for semi-Markov control problems

    Get PDF
    In earlier work, the authors considered the perturbation of systems undergoing Markov processes in which the times between two consecutive decision time points were equidistant. They now consider perturbations of processes for which the times between transition are random variables. These are called semi-Markov processes

    Analysis of Timed and Long-Run Objectives for Markov Automata

    Get PDF
    Markov automata (MAs) extend labelled transition systems with random delays and probabilistic branching. Action-labelled transitions are instantaneous and yield a distribution over states, whereas timed transitions impose a random delay governed by an exponential distribution. MAs are thus a nondeterministic variation of continuous-time Markov chains. MAs are compositional and are used to provide a semantics for engineering frameworks such as (dynamic) fault trees, (generalised) stochastic Petri nets, and the Architecture Analysis & Design Language (AADL). This paper considers the quantitative analysis of MAs. We consider three objectives: expected time, long-run average, and timed (interval) reachability. Expected time objectives focus on determining the minimal (or maximal) expected time to reach a set of states. Long-run objectives determine the fraction of time to be in a set of states when considering an infinite time horizon. Timed reachability objectives are about computing the probability to reach a set of states within a given time interval. This paper presents the foundations and details of the algorithms and their correctness proofs. We report on several case studies conducted using a prototypical tool implementation of the algorithms, driven by the MAPA modelling language for efficiently generating MAs.Comment: arXiv admin note: substantial text overlap with arXiv:1305.705

    Discrete-time controlled markov processes with average cost criterion: a survey

    Get PDF
    This work is a survey of the average cost control problem for discrete-time Markov processes. The authors have attempted to put together a comprehensive account of the considerable research on this problem over the past three decades. The exposition ranges from finite to Borel state and action spaces and includes a variety of methodologies to find and characterize optimal policies. The authors have included a brief historical perspective of the research efforts in this area and have compiled a substantial yet not exhaustive bibliography. The authors have also identified several important questions that are still open to investigation
    • …
    corecore