Search CORE

77 research outputs found

Reinforcement Learning With Temporal Logic Rewards

Author: Belta Calin
Li Xiao
Vasile Cristian-Ioan
Publication venue
Publication date: 01/01/2017
Field of study

Reinforcement learning (RL) depends critically on the choice of reward functions used to capture the de- sired behavior and constraints of a robot. Usually, these are handcrafted by a expert designer and represent heuristics for relatively simple tasks. Real world applications typically involve more complex tasks with rich temporal and logical structure. In this paper we take advantage of the expressive power of temporal logic (TL) to specify complex rules the robot should follow, and incorporate domain knowledge into learning. We propose Truncated Linear Temporal Logic (TLTL) as specifications language, that is arguably well suited for the robotics applications, together with quantitative semantics, i.e., robustness degree. We propose a RL approach to learn tasks expressed as TLTL formulae that uses their associated robustness degree as reward functions, instead of the manually crafted heuristics trying to capture the same specifications. We show in simulated trials that learning is faster and policies obtained using the proposed approach outperform the ones learned using heuristic rewards in terms of the robustness degree, i.e., how well the tasks are satisfied. Furthermore, we demonstrate the proposed RL approach in a toast-placing task learned by a Baxter robot

arXiv.org e-Print Archive

Crossref

Boston University Institutional Repository (OpenBU)

A Metric for Linear Temporal Logic

Author: Iannopollo Antonio
Lee Edward A.
Lohstroh Marten
Romeo Íñigo Íncer
Sangiovanni-Vincentelli Alberto
Publication venue
Publication date: 30/11/2018
Field of study

We propose a measure and a metric on the sets of infinite traces generated by a set of atomic propositions. To compute these quantities, we first map properties to subsets of the real numbers and then take the Lebesgue measure of the resulting sets. We analyze how this measure is computed for Linear Temporal Logic (LTL) formulas. An implementation for computing the measure of bounded LTL properties is provided and explained. This implementation leverages SAT model counting and effects independence checks on subexpressions to compute the measure and metric compositionally

arXiv.org e-Print Archive

eScholarship - University of California

Service composition in stochastic settings

Author: B Medjahed
D Berardi
D Menasce
D Wu
G Giacomo De
G Giacomo De
G Giacomo De
HJ Levesque
J Bronsted
J Cardoso
J Yang
L Zeng
M Pistore
MJ Fischer
R Hull
S Thiébaux
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

With the growth of the Internet-of-Things and online Web services, more services with more capabilities are available to us. The ability to generate new, more useful services from existing ones has been the focus of much research for over a decade. The goal is, given a specification of the behavior of the target service, to build a controller, known as an orchestrator, that uses existing services to satisfy the requirements of the target service. The model of services and requirements used in most work is that of a finite state machine. This implies that the specification can either be satisfied or not, with no middle ground. This is a major drawback, since often an exact solution cannot be obtained. In this paper we study a simple stochastic model for service composition: we annotate the tar- get service with probabilities describing the likelihood of requesting each action in a state, and rewards for being able to execute actions. We show how to solve the resulting problem by solving a certain Markov Decision Process (MDP) derived from the service and requirement specifications. The solution to this MDP induces an orchestrator that coincides with the exact solution if a composition exists. Otherwise it provides an approximate solution that maximizes the expected sum of values of user requests that can be serviced. The model studied although simple shades light on composition in stochastic settings and indeed we discuss several possible extensions

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Clock specifications for temporal tasks in planning and learning

Author: De Giacomo Giuseppe
Favorito Marco
Patrizi Fabio
Publication venue: CEUR Workshop Proceedings
Publication date: 29/01/2024
Field of study

Recently, Linear Temporal Logics on finite traces, such as LTL (or LDL ), have been advocated as high-level formalisms to express dynamic properties, such as goals in planning domains or rewards in Reinforcement Learning (RL). This paper addresses the challenge of separating high-level temporal specifications from the low-level details of the underlying environment (domain or MDP), by allowing for expressing the specifications at a different time granularity than the environment. We study the notion of a clock which progresses the high-level LTL specification, whose ticks are triggered by dynamic (low-level) properties defined on the underlying environment. The obtained separation enables terse high-level specifications while allowing for very expressive forms of clock expressed as general LTL properties over low-level features, such as counting or occurrence/alternation of special events. We devise an automata-based construction to compile away the clock into a deterministic automaton that is polynomial in the size of the automata characterizing the high-level and clock specifications. We show the correctness of the approach and discuss its application in several contexts, including FOND planning, RL with LTL Restraining Bolts, and Reward Machines

Oxford University Research Archive

LTLf/LDLf Non-Markovian Rewards

Author: Brafman RONEN ISRAEL
DE GIACOMO Giuseppe
Patrizi Fabio
Publication venue: AAAI Press
Publication date: 01/01/2018
Field of study

In Markov Decision Processes (MDPs), the reward obtained in a state is Markovian, i.e., depends on the last state and action. This dependency makes it difficult to reward more interesting long-term behaviors, such as always closing a door after it has been opened, or providing coffee only following a request. Extending MDPs to handle non-Markovian reward functions was the subject of two previous lines of work. Both use LTL variants to specify the reward function and then compile the new model back into a Markovian model. Building on recent progress in temporal logics over finite traces, we adopt LDLf for specifying non-Markovian rewards and provide an elegant automata construction for building a Markovian model, which extends that of previous work and offers strong minimality and compositionality guarantees

Archivio della ricerca- Università di Roma La Sapienza

Association for the Advancement of Artificial Intelligence: AAAI Publications

LTLf and LDLf Monitoring: A Technical Report

Author: De Giacomo Giuseppe
De Masellis Riccardo
Grasso Marco
Maggi Fabrizio
Montali Marco
Publication venue
Publication date: 01/01/2014
Field of study

Runtime monitoring is one of the central tasks to provide operational decision support to running business processes, and check on-the-fly whether they comply with constraints and rules. We study runtime monitoring of properties expressed in LTL on finite traces (LTLf) and in its extension LDLf. LDLf is a powerful logic that captures all monadic second order logic on finite traces, which is obtained by combining regular expressions and LTLf, adopting the syntax of propositional dynamic logic (PDL). Interestingly, in spite of its greater expressivity, LDLf has exactly the same computational complexity of LTLf. We show that LDLf is able to capture, in the logic itself, not only the constraints to be monitored, but also the de-facto standard RV-LTL monitors. This makes it possible to declaratively capture monitoring metaconstraints, and check them by relying on usual logical services instead of ad-hoc algorithms. This, in turn, enables to flexibly monitor constraints depending on the monitoring state of other constraints, e.g., "compensation" constraints that are only checked when others are detected to be violated. In addition, we devise a direct translation of LDLf formulas into nondeterministic automata, avoiding to detour to Buechi automata or alternating automata, and we use it to implement a monitoring plug-in for the PROM suite

arXiv.org e-Print Archive

Pure OAI Repository

Interestingness of traces in declarative process mining: The janus LTLPf Approach

Author: C Ciccio Di
C Ciccio Di
C Ciccio Di
D Gabbay
FM Maggi
FM Maggi
FM Maggi
J Adamo
JG Henriksen
LT Ly
M Leoni de
M Reynolds
N Markey
O Kupferman
O Lichtenstein
P Gastin
T Hildebrandt
YS Ramakrishna
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Declarative process mining is the set of techniques aimed at extracting behavioural constraints from event logs. These constraints are inherently of a reactive nature, in that their activation restricts the occurrence of other activities. In this way, they are prone to the principle of ex falso quod libet: they can be satisfied even when not activated. As a consequence, constraints can be mined that are hardly interesting to users or even potentially misleading. In this paper, we build on the observation that users typically read and write temporal constraints as if-statements with an explicit indication of the activation condition. Our approach is called Janus, because it permits the specification and verification of reactive constraints that, upon activation, look forward into the future and backwards into the past of a trace. Reactive constraints are expressed using Linear-time Temporal Logic with Past on Finite Traces (LTLp f). To mine them out of event logs, we devise a time bi-directional valuation technique based on triplets of automata operating in an on-line fashion. Our solution proves efficient, being at most quadratic w.r.t. trace length, and effective in recognising interestingness of discovered constraints

Crossref

Archivio della ricerca- Università di Roma La Sapienza