2,440 research outputs found

    Modelling and analysis of Markov reward automata (extended version)

    Get PDF
    Costs and rewards are important ingredients for cyberphysical systems, modelling critical aspects like energy consumption, task completion, repair costs, and memory usage. This paper introduces Markov reward automata, an extension of Markov automata that allows the modelling of systems incorporating rewards (or costs) in addition to nondeterminism, discrete probabilistic choice and continuous stochastic timing. Rewards come in two flavours: action rewards, acquired instantaneously when taking a transition; and state rewards, acquired while residing in a state. We present algorithms to optimise three reward functions: the expected accumulative reward until a goal is reached; the expected accumulative reward until a certain time bound; and the long-run average reward. We have implemented these algorithms in the SCOOP/IMCA tool chain and show their feasibility via several case studies

    Modelling and analysis of Markov reward automata

    Get PDF
    Costs and rewards are important ingredients for many types of systems, modelling critical aspects like energy consumption, task completion, repair costs, and memory usage. This paper introduces Markov reward automata, an extension of Markov automata that allows the modelling of systems incorporating rewards (or costs) in addition to nondeterminism, discrete probabilistic choice and continuous stochastic timing. Rewards come in two flavours: action rewards, acquired instantaneously when taking a transition; and state rewards, acquired while residing in a state. We present algorithms to optimise three reward functions: the expected cumulative reward until a goal is reached, the expected cumulative reward until a certain time bound, and the long-run average reward. We have implemented these algorithms in the SCOOP/IMCA tool chain and show their feasibility via several case studies

    Towards efficient analysis of Markov automata

    Get PDF
    One of the most expressive formalisms to model concurrent systems is Markov automata. They serve as a semantics for many higher-level formalisms, such as generalised stochastic Petri nets and dynamic fault trees. Two of the most challenging problems for Markov automata to date are (i) the optimal time-bounded reachability probability and (ii) the optimal long-run average rewards. In this thesis, we aim at designing efficient sound techniques to analyse them. We approach the problem of time-bounded reachability from two different angles. First, we study the properties of the optimal solution and exploit this knowledge to construct an efficient algorithm that approximates the optimal values up to a guaranteed error bound. This algorithm is exhaustive, i. e. it computes values for each state of the Markov automaton. This may be a limitation for very large or even infinite Markov automata. To address this issue we design a second algorithm that approximates the optimal solution by only working with part of the total state-space. For the problem of long-run average rewards there exists a polynomial algorithm based on linear programming. Instead of chasing a better theoretical complexity bound we search for a practical solution based on an iterative approach. We design a value iteration algorithm that in our empirical evaluation turns out to scale several orders of magnitude better than the linear programming based approach.Markov-Automaten bilden einen der ausdrucksstĂ€rksten Formalismen um NebenlĂ€ufige Systeme zu modellieren. Sie werden benutzt um die Semantik vieler höherer Formalismen wie stochastischer Petri-Netze [Mar95, EHZ10] und Dynamic Fault Trees [DBB90] zu beschreiben. Die zwei herausfordernder Probleme im Bereich der Analyse großer Markov- Automaten sind (i) die zeitbeschrĂ€nkten Erreichbarkeitwahrscheinlichkeit und (ii) optimale langfristige durchschnittliche Rewards. Diese Arbeit zielt auf das Design effizienter und korrekter Techniken um sie zu untersuchen. Das Problem der zeitbeschrĂ€nkten Erreichbarkeitswahrscheinlichkeit gehen wir aus zwei verschiedenen Richtungen an: Zum einen studieren wir die Eigenschaften optimaler Lösungen und nutzen dieses Wissen um einen effizienten Approximationsalgorithmus zu bilden, der optimale Werte bis auf eine garantierte Fehlertoleranz berechnet. Dieser Algorithmus basiert darauf, Werte fĂŒr jeden Zustand des Markov-Automaten zu berechnen. Dies kann die Anwendbarkeit fĂŒr große oder gar unendliche Automaten einschrĂ€nken. Um diese Problem zu lösen prĂ€sentieren wir einen zweiten Algorithmus, der die optimale Lösung approximiert, und dabei ausschließlich einen Teil des Zustandsraumes betrachtet. FĂŒr das Problem der optimalen langfristigen durchschnittlichen Rewards gibt es einen polynomiellen Algorithmus auf Basis linearer Programmierung. Anstelle eine bessere theoretische KomplexitĂ€t anzustreben, konzentrieren wir uns darauf, eine praktische Lösung auf Basis eines iterativen Ansatzes zu finden. Wie entwickeln einen Werte-iterierenden Algorithmus der in unserer empirischen Evaluation um mehrere GrĂ¶ĂŸenordnungen besser als der auf linearer Programmierung basierende Ansatz skaliert

    The Complexity of POMDPs with Long-run Average Objectives

    Full text link
    We study the problem of approximation of optimal values in partially-observable Markov decision processes (POMDPs) with long-run average objectives. POMDPs are a standard model for dynamic systems with probabilistic and nondeterministic behavior in uncertain environments. In long-run average objectives rewards are associated with every transition of the POMDP and the payoff is the long-run average of the rewards along the executions of the POMDP. We establish strategy complexity and computational complexity results. Our main result shows that finite-memory strategies suffice for approximation of optimal values, and the related decision problem is recursively enumerable complete

    One-Counter Stochastic Games

    Get PDF
    We study the computational complexity of basic decision problems for one-counter simple stochastic games (OC-SSGs), under various objectives. OC-SSGs are 2-player turn-based stochastic games played on the transition graph of classic one-counter automata. We study primarily the termination objective, where the goal of one player is to maximize the probability of reaching counter value 0, while the other player wishes to avoid this. Partly motivated by the goal of understanding termination objectives, we also study certain "limit" and "long run average" reward objectives that are closely related to some well-studied objectives for stochastic games with rewards. Examples of problems we address include: does player 1 have a strategy to ensure that the counter eventually hits 0, i.e., terminates, almost surely, regardless of what player 2 does? Or that the liminf (or limsup) counter value equals infinity with a desired probability? Or that the long run average reward is >0 with desired probability? We show that the qualitative termination problem for OC-SSGs is in NP intersection coNP, and is in P-time for 1-player OC-SSGs, or equivalently for one-counter Markov Decision Processes (OC-MDPs). Moreover, we show that quantitative limit problems for OC-SSGs are in NP intersection coNP, and are in P-time for 1-player OC-MDPs. Both qualitative limit problems and qualitative termination problems for OC-SSGs are already at least as hard as Condon's quantitative decision problem for finite-state SSGs.Comment: 20 pages, 1 figure. This is a full version of a paper accepted for publication in proceedings of FSTTCS 201

    Discrete-time rewards model-checked

    Get PDF
    This paper presents a model-checking approach for analyzing discrete-time Markov reward models. For this purpose, the temporal logic probabilistic CTL is extended with reward constraints. This allows to formulate complex measures – involving expected as well as accumulated rewards – in a precise and succinct way. Algorithms to efficiently analyze such formulae are introduced. The approach is illustrated by model-checking a probabilistic cost model of the IPv4 zeroconf protocol for distributed address assignment in ad-hoc networks

    POMDPs under Probabilistic Semantics

    Full text link
    We consider partially observable Markov decision processes (POMDPs) with limit-average payoff, where a reward value in the interval [0,1] is associated to every transition, and the payoff of an infinite path is the long-run average of the rewards. We consider two types of path constraints: (i) quantitative constraint defines the set of paths where the payoff is at least a given threshold lambda_1 in (0,1]; and (ii) qualitative constraint which is a special case of quantitative constraint with lambda_1=1. We consider the computation of the almost-sure winning set, where the controller needs to ensure that the path constraint is satisfied with probability 1. Our main results for qualitative path constraint are as follows: (i) the problem of deciding the existence of a finite-memory controller is EXPTIME-complete; and (ii) the problem of deciding the existence of an infinite-memory controller is undecidable. For quantitative path constraint we show that the problem of deciding the existence of a finite-memory controller is undecidable.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013
