291 research outputs found

    What is known about the Value 1 Problem for Probabilistic Automata?

    Get PDF
    The value 1 problem is a decision problem for probabilistic automata over finite words: are there words accepted by the automaton with arbitrarily high probability? Although undecidable, this problem attracted a lot of attention over the last few years. The aim of this paper is to review and relate the results pertaining to the value 1 problem. In particular, several algorithms have been proposed to partially solve this problem. We show the relations between them, leading to the following conclusion: the Markov Monoid Algorithm is the most correct algorithm known to (partially) solve the value 1 problem

    Limit Synchronization in Markov Decision Processes

    Full text link
    Markov decision processes (MDP) are finite-state systems with both strategic and probabilistic choices. After fixing a strategy, an MDP produces a sequence of probability distributions over states. The sequence is eventually synchronizing if the probability mass accumulates in a single state, possibly in the limit. Precisely, for 0 <= p <= 1 the sequence is p-synchronizing if a probability distribution in the sequence assigns probability at least p to some state, and we distinguish three synchronization modes: (i) sure winning if there exists a strategy that produces a 1-synchronizing sequence; (ii) almost-sure winning if there exists a strategy that produces a sequence that is, for all epsilon > 0, a (1-epsilon)-synchronizing sequence; (iii) limit-sure winning if for all epsilon > 0, there exists a strategy that produces a (1-epsilon)-synchronizing sequence. We consider the problem of deciding whether an MDP is sure, almost-sure, limit-sure winning, and we establish the decidability and optimal complexity for all modes, as well as the memory requirements for winning strategies. Our main contributions are as follows: (a) for each winning modes we present characterizations that give a PSPACE complexity for the decision problems, and we establish matching PSPACE lower bounds; (b) we show that for sure winning strategies, exponential memory is sufficient and may be necessary, and that in general infinite memory is necessary for almost-sure winning, and unbounded memory is necessary for limit-sure winning; (c) along with our results, we establish new complexity results for alternating finite automata over a one-letter alphabet

    A distance for probability spaces, and long-term values in Markov Decision Processes and Repeated Games

    Full text link
    Given a finite set KK, we denote by X=Δ(K)X=\Delta(K) the set of probabilities on KK and by Z=Δf(X)Z=\Delta_f(X) the set of Borel probabilities on XX with finite support. Studying a Markov Decision Process with partial information on KK naturally leads to a Markov Decision Process with full information on XX. We introduce a new metric d∗d_* on ZZ such that the transitions become 1-Lipschitz from (X,∥.∥1)(X, \|.\|_1) to (Z,d∗)(Z,d_*). In the first part of the article, we define and prove several properties of the metric d∗d_*. Especially, d∗d_* satisfies a Kantorovich-Rubinstein type duality formula and can be characterized by using disintegrations. In the second part, we characterize the limit values in several classes of "compact non expansive" Markov Decision Processes. In particular we use the metric d∗d_* to characterize the limit value in Partial Observation MDP with finitely many states and in Repeated Games with an informed controller with finite sets of states and actions. Moreover in each case we can prove the existence of a generalized notion of uniform value where we consider not only the Ces\`aro mean when the number of stages is large enough but any evaluation function θ∈Δ(N∗)\theta \in \Delta(\N^*) when the impatience I(θ)=∑t≥1∣θt+1−θt∣I(\theta)=\sum_{t\geq 1} |\theta_{t+1}-\theta_t| is small enough

    Parameterised verification of randomised distributed systems using state-based models

    Get PDF
    Model checking is a powerful technique for the verification of distributed systems but is limited to verifying systems with a fixed number of processes. The verification of a system for an arbitrary number of processes is known as the parameterised model checking problem and is, in general, undecidable. Parameterised model checking has been studied in depth for non-probabilistic distributed systems. We extend some of this work in order to tackle the parameterised model checking problem for distributed protocols that exhibit probabilistic behaviour, a problem that has not been widely addressed to date. In particular, we consider the application of network invariants and explicit induction to the parameterised verification of state-based models of randomised distributed systems. We demonstrate the use of network invariants by constructing invariant models for non-probabilistic and probabilistic forms of a simple counter token ring protocol. We show that proving properties of the invariants equates to proving properties of the token ring protocol for any number of processes. The use of induction is considered for the verification of a class of randomised distributed systems. These systems, termed degenerative, have the property that a model of a system with given communication graph eventually behaves like a model of a system with a reduced graph, where reduction is by removal of a set of nodes. We distinguish between deterministically, probabilistically and semi-degenerative systems, according to the manner in which a system degenerates. For the former two classes we describe induction schemas for reasoning about models of these systems over arbitrary communication graphs. We show that certain properties hold for models of such systems with any graph if they hold for all models of a system with some base graph and demonstrate this via case studies: two randomised leader election protocols. We illustrate how induction can also be employed to prove properties of semi-degenerative systems by considering a simple gossip protocol

    Certified Reinforcement Learning with Logic Guidance

    Full text link
    This paper proposes the first model-free Reinforcement Learning (RL) framework to synthesise policies for unknown, and continuous-state Markov Decision Processes (MDPs), such that a given linear temporal property is satisfied. We convert the given property into a Limit Deterministic Buchi Automaton (LDBA), namely a finite-state machine expressing the property. Exploiting the structure of the LDBA, we shape a synchronous reward function on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces that probabilistically satisfy the linear temporal property. This probability (certificate) is also calculated in parallel with policy learning when the state space of the MDP is finite: as such, the RL algorithm produces a policy that is certified with respect to the property. Under the assumption of finite state space, theoretical guarantees are provided on the convergence of the RL algorithm to an optimal policy, maximising the above probability. We also show that our method produces ''best available'' control policies when the logical property cannot be satisfied. In the general case of a continuous state space, we propose a neural network architecture for RL and we empirically show that the algorithm finds satisfying policies, if there exist such policies. The performance of the proposed framework is evaluated via a set of numerical examples and benchmarks, where we observe an improvement of one order of magnitude in the number of iterations required for the policy synthesis, compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782

    Symbolic Verification and Strategy Synthesis for Linearly-Priced Probabilistic Timed Automata

    Get PDF
    Probabilistic timed automata are a formalism for modelling systems whose dynamics includes probabilistic, nondeterministic and timed aspects including real-time systems. A variety of techniques have been proposed for the analysis of this formalism and successfully employed to analyse, for example, wireless communication protocols and computer security systems. Augmenting the model with prices (or, equivalently, costs or rewards) provides a means to verify more complex quantitative properties, such as the expected energy usage of a device or the expected number of messages sent during a protocol’s execution. However, the analysis of these properties on probabilistic timed automata currently relies on a technique based on integer discretisation of real-valued clocks, which can be expensive in some cases. In this paper, we propose symbolic techniques for verification and optimal strategy synthesis for priced probabilistic timed automata which avoid this discretisation. We build upon recent work for the special case of expected time properties, using value iteration over a zone-based abstraction of the model
    • …
    corecore