1,171 research outputs found

    Discounting in LTL

    Full text link
    In recent years, there is growing need and interest in formalizing and reasoning about the quality of software and hardware systems. As opposed to traditional verification, where one handles the question of whether a system satisfies, or not, a given specification, reasoning about quality addresses the question of \emph{how well} the system satisfies the specification. One direction in this effort is to refine the "eventually" operators of temporal logic to {\em discounting operators}: the satisfaction value of a specification is a value in [0,1][0,1], where the longer it takes to fulfill eventuality requirements, the smaller the satisfaction value is. In this paper we introduce an augmentation by discounting of Linear Temporal Logic (LTL), and study it, as well as its combination with propositional quality operators. We show that one can augment LTL with an arbitrary set of discounting functions, while preserving the decidability of the model-checking problem. Further augmenting the logic with unary propositional quality operators preserves decidability, whereas adding an average-operator makes some problems undecidable. We also discuss the complexity of the problem, as well as various extensions

    Policy Synthesis and Reinforcement Learning for Discounted LTL

    Full text link
    The difficulty of manually specifying reward functions has led to an interest in using linear temporal logic (LTL) to express objectives for reinforcement learning (RL). However, LTL has the downside that it is sensitive to small perturbations in the transition probabilities, which prevents probably approximately correct (PAC) learning without additional assumptions. Time discounting provides a way of removing this sensitivity, while retaining the high expressivity of the logic. We study the use of discounted LTL for policy synthesis in Markov decision processes with unknown transition probabilities, and show how to reduce discounted LTL to discounted-sum reward via a reward machine when all discount factors are identical

    Near-Optimal Scheduling for LTL with Future Discounting

    Full text link
    We study the search problem for optimal schedulers for the linear temporal logic (LTL) with future discounting. The logic, introduced by Almagor, Boker and Kupferman, is a quantitative variant of LTL in which an event in the far future has only discounted contribution to a truth value (that is a real number in the unit interval [0, 1]). The precise problem we study---it naturally arises e.g. in search for a scheduler that recovers from an internal error state as soon as possible---is the following: given a Kripke frame, a formula and a number in [0, 1] called a margin, find a path of the Kripke frame that is optimal with respect to the formula up to the prescribed margin (a truly optimal path may not exist). We present an algorithm for the problem; it works even in the extended setting with propositional quality operators, a setting where (threshold) model-checking is known to be undecidable

    Model checking Quantitative Linear Time Logic

    Get PDF
    This paper considers QLtl, a quantitative analagon of Ltl and presents algorithms for model checking QLtl over quantitative versions of Kripke structures and Markov chains

    Sample Efficient Model-free Reinforcement Learning from LTL Specifications with Optimality Guarantees

    Full text link
    Linear Temporal Logic (LTL) is widely used to specify high-level objectives for system policies, and it is highly desirable for autonomous systems to learn the optimal policy with respect to such specifications. However, learning the optimal policy from LTL specifications is not trivial. We present a model-free Reinforcement Learning (RL) approach that efficiently learns an optimal policy for an unknown stochastic system, modelled using Markov Decision Processes (MDPs). We propose a novel and more general product MDP, reward structure and discounting mechanism that, when applied in conjunction with off-the-shelf model-free RL algorithms, efficiently learn the optimal policy that maximizes the probability of satisfying a given LTL specification with optimality guarantees. We also provide improved theoretical results on choosing the key parameters in RL to ensure optimality. To directly evaluate the learned policy, we adopt probabilistic model checker PRISM to compute the probability of the policy satisfying such specifications. Several experiments on various tabular MDP environments across different LTL tasks demonstrate the improved sample efficiency and optimal policy convergence.Comment: Accepted at the International Joint Conference on Artificial Intelligence 2023 (IJCAI
    • …
    corecore