17,960 research outputs found
Scalable Verification of Markov Decision Processes
Markov decision processes (MDP) are useful to model concurrent process
optimisation problems, but verifying them with numerical methods is often
intractable. Existing approximative approaches do not scale well and are
limited to memoryless schedulers. Here we present the basis of scalable
verification for MDPSs, using an O(1) memory representation of
history-dependent schedulers. We thus facilitate scalable learning techniques
and the use of massively parallel verification.Comment: V4: FMDS version, 12 pages, 4 figure
Probabilistic Guarantees for Safe Deep Reinforcement Learning
Deep reinforcement learning has been successfully applied to many control
tasks, but the application of such agents in safety-critical scenarios has been
limited due to safety concerns. Rigorous testing of these controllers is
challenging, particularly when they operate in probabilistic environments due
to, for example, hardware faults or noisy sensors. We propose MOSAIC, an
algorithm for measuring the safety of deep reinforcement learning agents in
stochastic settings. Our approach is based on the iterative construction of a
formal abstraction of a controller's execution in an environment, and leverages
probabilistic model checking of Markov decision processes to produce
probabilistic guarantees on safe behaviour over a finite time horizon. It
produces bounds on the probability of safe operation of the controller for
different initial configurations and identifies regions where correct behaviour
can be guaranteed. We implement and evaluate our approach on agents trained for
several benchmark control problems
Should We Learn Probabilistic Models for Model Checking? A New Approach and An Empirical Study
Many automated system analysis techniques (e.g., model checking, model-based
testing) rely on first obtaining a model of the system under analysis. System
modeling is often done manually, which is often considered as a hindrance to
adopt model-based system analysis and development techniques. To overcome this
problem, researchers have proposed to automatically "learn" models based on
sample system executions and shown that the learned models can be useful
sometimes. There are however many questions to be answered. For instance, how
much shall we generalize from the observed samples and how fast would learning
converge? Or, would the analysis result based on the learned model be more
accurate than the estimation we could have obtained by sampling many system
executions within the same amount of time? In this work, we investigate
existing algorithms for learning probabilistic models for model checking,
propose an evolution-based approach for better controlling the degree of
generalization and conduct an empirical study in order to answer the questions.
One of our findings is that the effectiveness of learning may sometimes be
limited.Comment: 15 pages, plus 2 reference pages, accepted by FASE 2017 in ETAP
Learning Markov Decision Processes for Model Checking
Constructing an accurate system model for formal model verification can be
both resource demanding and time-consuming. To alleviate this shortcoming,
algorithms have been proposed for automatically learning system models based on
observed system behaviors. In this paper we extend the algorithm on learning
probabilistic automata to reactive systems, where the observed system behavior
is in the form of alternating sequences of inputs and outputs. We propose an
algorithm for automatically learning a deterministic labeled Markov decision
process model from the observed behavior of a reactive system. The proposed
learning algorithm is adapted from algorithms for learning deterministic
probabilistic finite automata, and extended to include both probabilistic and
nondeterministic transitions. The algorithm is empirically analyzed and
evaluated by learning system models of slot machines. The evaluation is
performed by analyzing the probabilistic linear temporal logic properties of
the system as well as by analyzing the schedulers, in particular the optimal
schedulers, induced by the learned models.Comment: In Proceedings QFM 2012, arXiv:1212.345
Stochastic Shortest Path with Energy Constraints in POMDPs
We consider partially observable Markov decision processes (POMDPs) with a
set of target states and positive integer costs associated with every
transition. The traditional optimization objective (stochastic shortest path)
asks to minimize the expected total cost until the target set is reached. We
extend the traditional framework of POMDPs to model energy consumption, which
represents a hard constraint. The energy levels may increase and decrease with
transitions, and the hard constraint requires that the energy level must remain
positive in all steps till the target is reached. First, we present a novel
algorithm for solving POMDPs with energy levels, developing on existing POMDP
solvers and using RTDP as its main method. Our second contribution is related
to policy representation. For larger POMDP instances the policies computed by
existing solvers are too large to be understandable. We present an automated
procedure based on machine learning techniques that automatically extracts
important decisions of the policy allowing us to compute succinct human
readable policies. Finally, we show experimentally that our algorithm performs
well and computes succinct policies on a number of POMDP instances from the
literature that were naturally enhanced with energy levels.Comment: Technical report accompanying a paper published in proceedings of
AAMAS 201
Smart Sampling for Lightweight Verification of Markov Decision Processes
Markov decision processes (MDP) are useful to model optimisation problems in
concurrent systems. To verify MDPs with efficient Monte Carlo techniques
requires that their nondeterminism be resolved by a scheduler. Recent work has
introduced the elements of lightweight techniques to sample directly from
scheduler space, but finding optimal schedulers by simple sampling may be
inefficient. Here we describe "smart" sampling algorithms that can make
substantial improvements in performance.Comment: IEEE conference style, 11 pages, 5 algorithms, 11 figures, 1 tabl
- …