1,171 research outputs found
Discounting in LTL
In recent years, there is growing need and interest in formalizing and
reasoning about the quality of software and hardware systems. As opposed to
traditional verification, where one handles the question of whether a system
satisfies, or not, a given specification, reasoning about quality addresses the
question of \emph{how well} the system satisfies the specification. One
direction in this effort is to refine the "eventually" operators of temporal
logic to {\em discounting operators}: the satisfaction value of a specification
is a value in , where the longer it takes to fulfill eventuality
requirements, the smaller the satisfaction value is.
In this paper we introduce an augmentation by discounting of Linear Temporal
Logic (LTL), and study it, as well as its combination with propositional
quality operators. We show that one can augment LTL with an arbitrary set of
discounting functions, while preserving the decidability of the model-checking
problem. Further augmenting the logic with unary propositional quality
operators preserves decidability, whereas adding an average-operator makes some
problems undecidable. We also discuss the complexity of the problem, as well as
various extensions
Policy Synthesis and Reinforcement Learning for Discounted LTL
The difficulty of manually specifying reward functions has led to an interest
in using linear temporal logic (LTL) to express objectives for reinforcement
learning (RL). However, LTL has the downside that it is sensitive to small
perturbations in the transition probabilities, which prevents probably
approximately correct (PAC) learning without additional assumptions. Time
discounting provides a way of removing this sensitivity, while retaining the
high expressivity of the logic. We study the use of discounted LTL for policy
synthesis in Markov decision processes with unknown transition probabilities,
and show how to reduce discounted LTL to discounted-sum reward via a reward
machine when all discount factors are identical
Near-Optimal Scheduling for LTL with Future Discounting
We study the search problem for optimal schedulers for the linear temporal
logic (LTL) with future discounting. The logic, introduced by Almagor, Boker
and Kupferman, is a quantitative variant of LTL in which an event in the far
future has only discounted contribution to a truth value (that is a real number
in the unit interval [0, 1]). The precise problem we study---it naturally
arises e.g. in search for a scheduler that recovers from an internal error
state as soon as possible---is the following: given a Kripke frame, a formula
and a number in [0, 1] called a margin, find a path of the Kripke frame that is
optimal with respect to the formula up to the prescribed margin (a truly
optimal path may not exist). We present an algorithm for the problem; it works
even in the extended setting with propositional quality operators, a setting
where (threshold) model-checking is known to be undecidable
Model checking Quantitative Linear Time Logic
This paper considers QLtl, a quantitative analagon of Ltl and presents algorithms for model checking QLtl over quantitative versions of Kripke structures and Markov chains
Sample Efficient Model-free Reinforcement Learning from LTL Specifications with Optimality Guarantees
Linear Temporal Logic (LTL) is widely used to specify high-level objectives
for system policies, and it is highly desirable for autonomous systems to learn
the optimal policy with respect to such specifications. However, learning the
optimal policy from LTL specifications is not trivial. We present a model-free
Reinforcement Learning (RL) approach that efficiently learns an optimal policy
for an unknown stochastic system, modelled using Markov Decision Processes
(MDPs). We propose a novel and more general product MDP, reward structure and
discounting mechanism that, when applied in conjunction with off-the-shelf
model-free RL algorithms, efficiently learn the optimal policy that maximizes
the probability of satisfying a given LTL specification with optimality
guarantees. We also provide improved theoretical results on choosing the key
parameters in RL to ensure optimality. To directly evaluate the learned policy,
we adopt probabilistic model checker PRISM to compute the probability of the
policy satisfying such specifications. Several experiments on various tabular
MDP environments across different LTL tasks demonstrate the improved sample
efficiency and optimal policy convergence.Comment: Accepted at the International Joint Conference on Artificial
Intelligence 2023 (IJCAI
- …