18,089 research outputs found
Automated Experiment Design for Data-Efficient Verification of Parametric Markov Decision Processes
We present a new method for statistical verification of quantitative
properties over a partially unknown system with actions, utilising a
parameterised model (in this work, a parametric Markov decision process) and
data collected from experiments performed on the underlying system. We obtain
the confidence that the underlying system satisfies a given property, and show
that the method uses data efficiently and thus is robust to the amount of data
available. These characteristics are achieved by firstly exploiting parameter
synthesis to establish a feasible set of parameters for which the underlying
system will satisfy the property; secondly, by actively synthesising
experiments to increase amount of information in the collected data that is
relevant to the property; and finally propagating this information over the
model parameters, obtaining a confidence that reflects our belief whether or
not the system parameters lie in the feasible set, thereby solving the
verification problem.Comment: QEST 2017, 18 pages, 7 figure
Certified Reinforcement Learning with Logic Guidance
This paper proposes the first model-free Reinforcement Learning (RL)
framework to synthesise policies for unknown, and continuous-state Markov
Decision Processes (MDPs), such that a given linear temporal property is
satisfied. We convert the given property into a Limit Deterministic Buchi
Automaton (LDBA), namely a finite-state machine expressing the property.
Exploiting the structure of the LDBA, we shape a synchronous reward function
on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces
that probabilistically satisfy the linear temporal property. This probability
(certificate) is also calculated in parallel with policy learning when the
state space of the MDP is finite: as such, the RL algorithm produces a policy
that is certified with respect to the property. Under the assumption of finite
state space, theoretical guarantees are provided on the convergence of the RL
algorithm to an optimal policy, maximising the above probability. We also show
that our method produces ''best available'' control policies when the logical
property cannot be satisfied. In the general case of a continuous state space,
we propose a neural network architecture for RL and we empirically show that
the algorithm finds satisfying policies, if there exist such policies. The
performance of the proposed framework is evaluated via a set of numerical
examples and benchmarks, where we observe an improvement of one order of
magnitude in the number of iterations required for the policy synthesis,
compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782
Toward Specification-Guided Active Mars Exploration for Cooperative Robot Teams
As a step towards achieving autonomy in space exploration missions, we consider a cooperative robotics system consisting of a copter and a rover. The goal of the copter is to explore an unknown environment so as to maximize knowledge about a science mission expressed in linear temporal logic that is to be executed by the rover. We model environmental uncertainty as a belief space Markov decision process and formulate the problem as a two-step stochastic dynamic program that we solve in a way that leverages the decomposed nature of the overall system. We demonstrate in simulations that the robot team makes intelligent decisions in the face of uncertainty
Verification of Uncertain POMDPs Using Barrier Certificates
We consider a class of partially observable Markov decision processes
(POMDPs) with uncertain transition and/or observation probabilities. The
uncertainty takes the form of probability intervals. Such uncertain POMDPs can
be used, for example, to model autonomous agents with sensors with limited
accuracy, or agents undergoing a sudden component failure, or structural damage
[1]. Given an uncertain POMDP representation of the autonomous agent, our goal
is to propose a method for checking whether the system will satisfy an optimal
performance, while not violating a safety requirement (e.g. fuel level,
velocity, and etc.). To this end, we cast the POMDP problem into a switched
system scenario. We then take advantage of this switched system
characterization and propose a method based on barrier certificates for
optimality and/or safety verification. We then show that the verification task
can be carried out computationally by sum-of-squares programming. We illustrate
the efficacy of our method by applying it to a Mars rover exploration example.Comment: 8 pages, 4 figure
Computing Nash Equilibrium in Wireless Ad Hoc Networks: A Simulation-Based Approach
This paper studies the problem of computing Nash equilibrium in wireless
networks modeled by Weighted Timed Automata. Such formalism comes together with
a logic that can be used to describe complex features such as timed energy
constraints. Our contribution is a method for solving this problem using
Statistical Model Checking. The method has been implemented in UPPAAL model
checker and has been applied to the analysis of Aloha CSMA/CD and IEEE 802.15.4
CSMA/CA protocols.Comment: In Proceedings IWIGP 2012, arXiv:1202.422
- …