2,070 research outputs found
A Learning Based Approach to Control Synthesis of Markov Decision Processes for Linear Temporal Logic Specifications
We propose to synthesize a control policy for a Markov decision process (MDP)
such that the resulting traces of the MDP satisfy a linear temporal logic (LTL)
property. We construct a product MDP that incorporates a deterministic Rabin
automaton generated from the desired LTL property. The reward function of the
product MDP is defined from the acceptance condition of the Rabin automaton.
This construction allows us to apply techniques from learning theory to the
problem of synthesis for LTL specifications even when the transition
probabilities are not known a priori. We prove that our method is guaranteed to
find a controller that satisfies the LTL property with probability one if such
a policy exists, and we suggest empirically with a case study in traffic
control that our method produces reasonable control strategies even when the
LTL property cannot be satisfied with probability one
Parametric LTL on Markov Chains
This paper is concerned with the verification of finite Markov chains against
parametrized LTL (pLTL) formulas. In pLTL, the until-modality is equipped with
a bound that contains variables; e.g., asserts that
holds within time steps, where is a variable on natural
numbers. The central problem studied in this paper is to determine the set of
parameter valuations for which the probability to
satisfy pLTL-formula in a Markov chain meets a given threshold , where is a comparison on reals and a probability. As for pLTL
determining the emptiness of is undecidable, we consider
several logic fragments. We consider parametric reachability properties, a
sub-logic of pLTL restricted to next and , parametric B\"uchi
properties and finally, a maximal subclass of pLTL for which emptiness of is decidable.Comment: TCS Track B 201
An Iterative Abstraction Algorithm for Reactive Correct-by-Construction Controller Synthesis
In this paper, we consider the problem of synthesizing
correct-by-construction controllers for discrete-time dynamical systems. A
commonly adopted approach in the literature is to abstract the dynamical system
into a Finite Transition System (FTS) and thus convert the problem into a two
player game between the environment and the system on the FTS. The controller
design problem can then be solved using synthesis tools for general linear
temporal logic or generalized reactivity(1) specifications. In this article, we
propose a new abstraction algorithm. Instead of generating a single FTS to
represent the system, we generate two FTSs, which are under- and
over-approximations of the original dynamical system. We further develop an
iterative abstraction scheme by exploiting the concept of winning sets, i.e.,
the sets of states for which there exists a winning strategy for the system.
Finally, the efficiency of the new abstraction algorithm is illustrated by
numerical examples.Comment: A shorter version has been accepted for publication in the 54th IEEE
Conference on Decision and Control (held Tuesday through Friday, December
15-18, 2015 at the Osaka International Convention Center, Osaka, Japan
Certified Reinforcement Learning with Logic Guidance
This paper proposes the first model-free Reinforcement Learning (RL)
framework to synthesise policies for unknown, and continuous-state Markov
Decision Processes (MDPs), such that a given linear temporal property is
satisfied. We convert the given property into a Limit Deterministic Buchi
Automaton (LDBA), namely a finite-state machine expressing the property.
Exploiting the structure of the LDBA, we shape a synchronous reward function
on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces
that probabilistically satisfy the linear temporal property. This probability
(certificate) is also calculated in parallel with policy learning when the
state space of the MDP is finite: as such, the RL algorithm produces a policy
that is certified with respect to the property. Under the assumption of finite
state space, theoretical guarantees are provided on the convergence of the RL
algorithm to an optimal policy, maximising the above probability. We also show
that our method produces ''best available'' control policies when the logical
property cannot be satisfied. In the general case of a continuous state space,
we propose a neural network architecture for RL and we empirically show that
the algorithm finds satisfying policies, if there exist such policies. The
performance of the proposed framework is evaluated via a set of numerical
examples and benchmarks, where we observe an improvement of one order of
magnitude in the number of iterations required for the policy synthesis,
compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782
- …