291 research outputs found
What is known about the Value 1 Problem for Probabilistic Automata?
The value 1 problem is a decision problem for probabilistic automata over
finite words: are there words accepted by the automaton with arbitrarily high
probability? Although undecidable, this problem attracted a lot of attention
over the last few years. The aim of this paper is to review and relate the
results pertaining to the value 1 problem. In particular, several algorithms
have been proposed to partially solve this problem. We show the relations
between them, leading to the following conclusion: the Markov Monoid Algorithm
is the most correct algorithm known to (partially) solve the value 1 problem
Limit Synchronization in Markov Decision Processes
Markov decision processes (MDP) are finite-state systems with both strategic
and probabilistic choices. After fixing a strategy, an MDP produces a sequence
of probability distributions over states. The sequence is eventually
synchronizing if the probability mass accumulates in a single state, possibly
in the limit. Precisely, for 0 <= p <= 1 the sequence is p-synchronizing if a
probability distribution in the sequence assigns probability at least p to some
state, and we distinguish three synchronization modes: (i) sure winning if
there exists a strategy that produces a 1-synchronizing sequence; (ii)
almost-sure winning if there exists a strategy that produces a sequence that
is, for all epsilon > 0, a (1-epsilon)-synchronizing sequence; (iii) limit-sure
winning if for all epsilon > 0, there exists a strategy that produces a
(1-epsilon)-synchronizing sequence.
We consider the problem of deciding whether an MDP is sure, almost-sure,
limit-sure winning, and we establish the decidability and optimal complexity
for all modes, as well as the memory requirements for winning strategies. Our
main contributions are as follows: (a) for each winning modes we present
characterizations that give a PSPACE complexity for the decision problems, and
we establish matching PSPACE lower bounds; (b) we show that for sure winning
strategies, exponential memory is sufficient and may be necessary, and that in
general infinite memory is necessary for almost-sure winning, and unbounded
memory is necessary for limit-sure winning; (c) along with our results, we
establish new complexity results for alternating finite automata over a
one-letter alphabet
A distance for probability spaces, and long-term values in Markov Decision Processes and Repeated Games
Given a finite set , we denote by the set of probabilities
on and by the set of Borel probabilities on with finite
support. Studying a Markov Decision Process with partial information on
naturally leads to a Markov Decision Process with full information on . We
introduce a new metric on such that the transitions become
1-Lipschitz from to . In the first part of the article,
we define and prove several properties of the metric . Especially,
satisfies a Kantorovich-Rubinstein type duality formula and can be
characterized by using disintegrations. In the second part, we characterize the
limit values in several classes of "compact non expansive" Markov Decision
Processes. In particular we use the metric to characterize the limit
value in Partial Observation MDP with finitely many states and in Repeated
Games with an informed controller with finite sets of states and actions.
Moreover in each case we can prove the existence of a generalized notion of
uniform value where we consider not only the Ces\`aro mean when the number of
stages is large enough but any evaluation function
when the impatience is small
enough
Parameterised verification of randomised distributed systems using state-based models
Model checking is a powerful technique for the verification of distributed systems but is limited to verifying systems with a fixed number of processes. The verification of a system for an arbitrary number of processes is known as the parameterised model checking problem and is, in general, undecidable. Parameterised model checking has been studied in depth for non-probabilistic distributed systems. We extend some of this work in order to tackle the parameterised model checking problem for distributed protocols that exhibit probabilistic behaviour, a problem that has not been widely addressed to date.
In particular, we consider the application of network invariants and explicit induction to the parameterised verification of state-based models of randomised distributed systems. We demonstrate the use of network invariants by constructing invariant models for non-probabilistic and probabilistic forms of a simple counter token ring protocol. We show that proving properties of the invariants equates to proving properties of the token ring protocol for any number of processes.
The use of induction is considered for the verification of a class of randomised distributed systems. These systems, termed degenerative, have the property that a model of a system with given communication graph eventually behaves like a model of a system with a reduced graph, where reduction is by removal of a set of nodes. We distinguish between deterministically, probabilistically and semi-degenerative systems, according to the manner in which a system degenerates. For the former two classes we describe induction schemas for reasoning about models of these systems over arbitrary communication graphs. We show that certain properties hold for models of such systems with any graph if they hold for all models of a system with some base graph and demonstrate this via case studies: two randomised leader election protocols. We illustrate how induction can also be employed to prove properties of semi-degenerative systems by considering a simple gossip protocol
Certified Reinforcement Learning with Logic Guidance
This paper proposes the first model-free Reinforcement Learning (RL)
framework to synthesise policies for unknown, and continuous-state Markov
Decision Processes (MDPs), such that a given linear temporal property is
satisfied. We convert the given property into a Limit Deterministic Buchi
Automaton (LDBA), namely a finite-state machine expressing the property.
Exploiting the structure of the LDBA, we shape a synchronous reward function
on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces
that probabilistically satisfy the linear temporal property. This probability
(certificate) is also calculated in parallel with policy learning when the
state space of the MDP is finite: as such, the RL algorithm produces a policy
that is certified with respect to the property. Under the assumption of finite
state space, theoretical guarantees are provided on the convergence of the RL
algorithm to an optimal policy, maximising the above probability. We also show
that our method produces ''best available'' control policies when the logical
property cannot be satisfied. In the general case of a continuous state space,
we propose a neural network architecture for RL and we empirically show that
the algorithm finds satisfying policies, if there exist such policies. The
performance of the proposed framework is evaluated via a set of numerical
examples and benchmarks, where we observe an improvement of one order of
magnitude in the number of iterations required for the policy synthesis,
compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782
Symbolic Verification and Strategy Synthesis for Linearly-Priced Probabilistic Timed Automata
Probabilistic timed automata are a formalism for modelling systems whose dynamics includes probabilistic, nondeterministic and timed aspects including real-time systems. A variety of techniques have been proposed for the analysis of this formalism and successfully employed to analyse, for example, wireless communication protocols and computer security systems. Augmenting the model with prices (or, equivalently, costs or rewards) provides a means to verify more complex quantitative properties, such as the expected energy usage of a device or the expected number of messages sent during a protocol’s execution. However, the analysis of these properties on probabilistic timed automata currently relies on a technique based on integer discretisation of real-valued clocks, which can be expensive in some cases. In this paper, we propose symbolic techniques for verification and optimal strategy synthesis for priced probabilistic timed automata which avoid this discretisation. We build upon recent work for the special case of expected time properties, using value iteration over a zone-based abstraction of the model
- …