Recently, Linear Temporal Logics on finite traces, such as LTL (or LDL ), have been advocated as high-level formalisms to express dynamic properties, such as goals in planning domains or rewards in Reinforcement Learning (RL). This paper addresses the challenge of separating high-level temporal specifications from the low-level details of the underlying environment (domain or MDP), by allowing for expressing the specifications at a different time granularity than the environment. We study the notion of a clock which progresses the high-level LTL specification, whose ticks are triggered by dynamic (low-level) properties defined on the underlying environment. The obtained separation enables terse high-level specifications while allowing for very expressive forms of clock expressed as general LTL properties over low-level features, such as counting or occurrence/alternation of special events. We devise an automata-based construction to compile away the clock into a deterministic automaton that is polynomial in the size of the automata characterizing the high-level and clock specifications. We show the correctness of the approach and discuss its application in several contexts, including FOND planning, RL with LTL Restraining Bolts, and Reward Machines

De Giacomo, Giuseppe

Favorito, Marco

Patrizi, Fabio

English

Oxford University Research Archive

Clock Specifications for Temporal Tasks in Planningand LearningGiuseppe De Giacomo1,2, Marco Favorito3 and Fabio Patrizi21University of Oxford, UK2Sapienza University of Rome, Italy3Banca d’Italia, ItalyAbstractRecently, Linear Temporal Logics on finite traces, such as ltl𝑓 (or ldl𝑓 ), have been advocated ashigh-level formalisms to express dynamic properties, such as goals in planning domains or rewardsin Reinforcement Learning (RL). This paper addresses the challenge of separating high-level temporalspecifications from the low-level details of the underlying environment (domain or MDP), by allowing forexpressing the specifications at a different time granularity than the environment. We study the notionof a clock which progresses the high-level ltl𝑓 specification, whose ticks are triggered by dynamic(low-level) properties defined on the underlying environment. The obtained separation enables tersehigh-level specifications while allowing for very expressive forms of clock expressed as general ltl𝑓properties over low-level features, such as counting or occurrence/alternation of special events. Wedevise an automata-based construction to compile away the clock into a deterministic automaton that ispolynomial in the size of the automata characterizing the high-level and clock specifications. We showthe correctness of the approach and discuss its application in several contexts, including FOND planning,RL with ltl𝑓 Restraining Bolts, and Reward Machines.KeywordsTemporal Logics, Automata Theory, Planning and Learning for Temporal Tasks1. IntroductionLinear Temporal Logic on finite traces (ltl𝑓 ) [1] has been advocated as a proper variant of ltlinterpreted over finite traces. Moreover, at no cost of computational complexity but higherexpressive power, the authors propose a novel formalism, Linear Dynamic Logic on finite traces(ldl𝑓 ); it is as expressive as regular expressions, while retaining the declarative nature andintuitive appeal of ltl𝑓 . Both ltl𝑓 and ldl𝑓 have been quite successful in the AI and FormalMethods communities in recent years. For example, they have been used for finite temporalsynthesis [2, 3, 4, 5], in Fully-Observable Non-Deterministic (FOND) Planning for ltl𝑓 Goals[6, 7, 8, 9], for reward function specification in the theory of Markov Decision Processes (MDP)[10, 11] and in Reinforcement Learning (RL) [12] with temporal logic rewards [13, 14].OVERLAY 2023: 5th Workshop on Artificial Intelligence and Formal Verification, Logic, Automata, and Synthesis,November 7, 2023, Rome, Italy$ giuseppe.degiacomo@cs.ox.ac.uk (G. D. Giacomo); marco.favorito@bancaditalia.it (M. Favorito);patrizi@diag.uniroma1.it (F. Patrizi) 0000-0001-9680-7658 (G. D. Giacomo); 0000-0001-9566-3576 (M. Favorito); 0000-0002-9116-251X (F. Patrizi)© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).CEURWorkshopProceedingshttp://ceur-ws.orgISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org)CEURWorkshopProceedingsceur-ws.orgISSN 1613-0073The use of task specification languages, e.g., in the form of ltl𝑓 /ldl𝑓 formulas, allowedgreater richness in goal specifications, and improved modularity of the AI system by providinga clear separation between the goal and the environment. However, despite their successes,there is a crucial issue that, to the best of our knowledge, has not been studied yet. So far, ithas been implicitly assumed that the time granularity of the task specification and the timegranularity of the acting of the agent in the world are synchronized. In other words, each agenttimestep is in one-to-one correspondence with each task timestep. While this assumption is notlimiting in terms of what specifications can be expressed, we argue that it is limiting in termsof how. Conceptually, the synchrony assumption between the designer and the agent is notrealistic, as these are two different entities which might have different cognitive systems, andtherefore different perceptions of the world. In particular, the designer and the agent mighthave different temporal processing capacities. The task desired by the designer is expressed fromthe designer’s perspective but has to be executed by the agent, which has its own understandingof the world and the task.Consider the following scenario: a RL agent (a computer program) playing the Atari gameBreakout [15], and a human designer that assigns the task of breaking the columns of bricks fromleft to right (as in [14]). The designer task can be expressed in ltl𝑓 in terms of “next” operator,denoted with “∘”: ∘(𝑐1 ∧∘(𝑐2 ∧ . . . )). However, these two entities have completely differentperceptions of the world. On the one hand, the RL agent observes the pixels of the game screen,and has access to the Atari Breakout simulator; hence, the timestep of the environment is undercontrol of the agent itself. On the other hand, the designer has a common-sense understandingof the environment, and has proposed a task based on its perception. In particular, here wefocus on the notion of what is the “next timestep” for such entities. While for the agent, the“next timestep” coincides with the “next frame”, for the designer it makes more sense to considermore abstract or higher-level timesteps, such as “the next removed brick”, or “the next removedcolumn”. Given this unavoidable discrepancy, the designer should instruct the agent abouthow to interpret the designer’s task according to the time resolution of the agent’s perception.Without any further instruction, the original designer’s task cannot be correctly interpreted bythe agent, because the meaning of the “next” operator is based on the agent’s timestep resolution,i.e. the next frame. Therefore, the designer is forced to express the goal specification in termsof stutter-invariant operators [16], e.g. in terms of eventually operators: ◇(𝑐1∧◇(𝑐2∧. . . )).The task specification might be more naturally expressed in a different time granularity thanthe agent’s, but there must still be a sort of “glue” between the two granularities.Related works. The topic of different temporal abstractions within the same informationsystem has been investigated for decades in computer science. Several different formalismsto finitely represent infinite-time granularities have been proposed in the literature, basedon algebraic [17, 18, 19, 20], logical [21, 22, 23, 24], string- based [25], and automaton-based[26, 27, 28, 29] approaches; see [30] for a survey on the topic. However, instead of devisingad-hoc temporal goal specification languages, or specific automata-based techniques, as thereferences above, we would like to keep intact both the ltl𝑓 formalism and rely on classicautomata theory, while allowing the designer to specify the clock specification and automatedtechniques to use it. This would give us broader impact in the wide community that is usingltl𝑓 , and better reliance on the wide availability of supporting tools. Another line of researchaimed to extend temporal logic with the so-called clock operator is described in [31, 32]. The𝑡𝑡′𝜙clock 𝜙clock ¬𝜙clock 𝜙clock ¬𝜙clock¬𝜙clock 𝜙clockFigure 1: How clock specifications work.𝑞𝑔0𝑞𝑔1𝑏⊤¬𝑏𝑞𝑐0𝑞𝑐1𝑎¬𝑎𝑎¬𝑎𝑞0𝑞1𝑎 ∧ 𝑏¬𝑎 ∨ ¬𝑏⊤Figure 2: An example of clock product. From leftto right: DFA for 𝜙goal = ◇𝑏, DFA for 𝜙clock =◇(𝑎 ∧ last), and minimized 𝒜goal×clock.clock operator was proposed in the context of modern hardware design, in which there is nonotion of a single clock. Such an operator allows us to disambiguate which clock to use inorder to evaluate a temporal formula or, in other words, what is the “next timestep”. Both LTL@[31] and PSL [32] extend LTL to support clock operators. Again, our purpose is not to changethe amenable syntax of ltl𝑓 , but to provide a tool for AI designers to specify the timestepgranularity for semantic evaluation. Moreover, in their case, the clock only depends on thecurrent instant, while we consider the clock specified using temporal logic formulas.Contributions. In this work, we are interested in the notion of clock specification, i.e. theexplicit specification of what is “the next step” for the task given by the designer. The corecontribution of this paper is to formalize and study the properties and expressivity of clockspecifications in the context of temporal goal specifications. We formalize our approach byintroducing a clock specification formula for a temporal goal, we show how these two can becompiled together in order to change the time granularity for the evaluation of the goal formula,by means of an automata-based construction. This technique can be used to solve the problemof temporal goal satisfaction, both in planning and in learning, in the presence of clocks.2. Clock SpecificationsLet 𝒫 be a set of propositions that capture facets of interest. In the context of clock specifications,we have a ltl𝑓 /ldl𝑓 formula 𝜙goal specifying the desired temporal task, i.e. the goal formula. Inaddition, we have a ltl𝑓 /ldl𝑓 formula 𝜙clock, called clock formula, describing the timesteps toconsider when evaluating the goal formula. We call the pair (𝜙goal, 𝜙clock) clocked specificationand say that 𝜙goal is under clock specification 𝜙clock. We assume, without loss of generality,that both 𝜙clock and 𝜙goal are defined over 𝒫 . Figure 1 intuitively explains the scenario we areconsidering. Circles represent trace timesteps. The bottom trace has finest time granularity 𝑡.The formula 𝜙clock is evaluated on every prefix of the trace. If the trace prefix at some time𝑡𝑖 makes the formula 𝜙clock true, then the timestep is passed to the evaluation of 𝜙goal, andbecomes a timestep of the coarser-grained timestep sequence 𝑡′. On the other hand, if for sometimestep 𝑡𝑖, the trace prefix up to that timestep does not satisfy 𝜙clock, then the current timestepis ignored at the higher level 𝑡′.We now formalize the semantics of the evaluation of 𝜙goal under clock formula 𝜙clock. Westart with the notion of trace projection. The projection of 𝜋 onto clock formula 𝜙clock is the trace𝜋|𝜙clock = 𝑝0, 𝑝1, . . . , 𝑝𝑛, where 𝑝𝑖 = 𝜋[𝑖], if 𝜋(0, 𝑖 + 1) |= 𝜙clock, and 𝑝𝑖 = 𝜖, otherwise. Wedefine the clocked semantics of a ltl𝑓 /ldl𝑓 formula 𝜙 under clock formula 𝜙clock in terms ofthe original semantics but considering projection of a trace 𝜋 onto clock formula 𝜙clock. That is,we say that 𝜋 models 𝜙 under clock formula 𝜙clock, written 𝜋 |=𝜙clock 𝜙, iff 𝜋|𝜙clock |= 𝜙.Now we introduce an automata-based construction to reason over clocked ltl𝑓 /ldl𝑓 specifi-cations. This technique will be useful for automata-based construction in planning and learningfor ltl𝑓 /ldl𝑓 goals. Let (𝜙goal, 𝜙clock) be a ltl𝑓 /ldl𝑓 clocked specification. Firstly, we com-pute the dfas 𝒜goal = ⟨𝑄𝑔, 2𝒫 , 𝑞𝑔0 , 𝛿𝑔, 𝐹𝑔⟩ and 𝒜clock = ⟨𝑄𝑐, 2𝒫 , 𝑞𝑐0, 𝛿𝑐, 𝐹𝑐⟩ of 𝜙goal and 𝜙clock,respectively. Then, we compute the clocked product 𝒜goal×clock = ⟨𝑄′, 2𝒫 , 𝑞′0, 𝛿′, 𝐹 ′⟩, definedas follows: 𝑄′ = 𝑄𝑔 ×𝑄𝑐, 𝑞′0 = (𝑞𝑔0 , 𝑞𝑐0), 𝐹′ = 𝐹𝑔 ×𝑄𝑐, 𝛿′((𝑞𝑔, 𝑞𝑐), 𝑎) = (𝛿𝑔(𝑞𝑔, 𝑎), 𝛿𝑐(𝑞𝑐, 𝑎))if 𝛿𝑐(𝑞𝑐, 𝑎) ∈ 𝐹𝑐, otherwise (𝑞𝑔, 𝛿𝑐(𝑞𝑐, 𝑎)). Intuitively, the clocked product is like the classicalsynchronous product between DFAs, except that the state component coming from the goalautomaton 𝑞𝑔 is progressed only if the clock component 𝑞𝑐 transitions into an accepting stateof 𝒜clock. An example is shown in Figure 2. We have the following result:Theorem 1. Let (𝜙goal, 𝜙clock) be a clocked specification, and 𝒜goal×clock be clocked product of𝒜goal and 𝒜clock. For any finite trace 𝜋, 𝜋 |=𝜙clock 𝜙goal iff 𝜋 ∈ ℒ(𝒜goal×clock)Theorem 1 tells us that clocked ldl𝑓 specifications are not more expressive than regularexpressions and, therefore, than ldl𝑓 . On the other hand, it is easy to see that ldl𝑓 is not moreexpressive than clocked ldl𝑓 specifications:Theorem 2. Given a ltl𝑓 /ldl𝑓 formula 𝜙, the clocked specification (𝜙, tt) is equivalent.We say that a formula 𝜓 is unclocked-equivalent to 𝜙 under clock formula 𝜙clock if, for everytrace 𝜋, we have 𝜋 |=𝜙clock 𝜙 iff 𝜋 |= 𝜓. Here we show that we can automatically find “unclocked”ltl𝑓 /ldl𝑓 formulas that are semantically equivalent to clocked ltl𝑓 /ldl𝑓 specifications.Theorem 3. Given a clocked specification (𝜙goal, 𝜙clock), there exists a ldl𝑓 formula 𝜓 that isunclocked-equivalent to (𝜙goal, 𝜙clock).Proof sketch. Compute the regular expression 𝑟 equivalent to 𝒜goal×clock, and take 𝜓 = ⟨𝑟⟩𝑒𝑛𝑑.Correctness follows by construction and by Theorem 1.3. DiscussionWe have sketched the theoretical bases for clock specifications for temporal tasks. This frame-work can be applied to FOND planning for ltl𝑓 /ldl𝑓 goals [6], by using 𝒜goal×clock (insteadof 𝒜goal) in the cross-product with the DFA of the domain, or for specifying non-Markovian“clocked” rewards in Non-Markovian Reward Decision Processes (NMRDP) [10], by means ofthe usual product construction between the MDP and the reward specification represented by𝒜goal×clock. The same approach can be combined with logic-based reward specifications in aReinforcement Learning setting, as in RL with Restraining Bolts [14, 33]; the reward is givenonly when both the goal formula and the clock formula are satisfied. A similar construction canbe obtained when dealing with Reward Machines [34].AcknowledgementsThis work has been partially supported by the EU H2020 project AIPlan4EU (No. 101016442),the ERC-ADG White- Mech (No. 834228), the EU ICT-48 2020 project TAILOR (No. 952215), thePRIN project RIPER (No. 20203FFYLK), and the PNRR MUR project FAIR (No. PE0000013).References[1] G. De Giacomo, M. Y. Vardi, Linear temporal logic and linear dynamic logic on finite traces,in: IJCAI, IJCAI/AAAI, 2013, pp. 854–860.[2] G. De Giacomo, M. Y. Vardi, Synthesis for LTL and LDL on finite traces, in: IJCAI, AAAIPress, 2015, pp. 1558–1564.[3] G. De Giacomo, M. Y. Vardi, Ltlf and ldlf synthesis under partial observability, in: IJCAI,IJCAI/AAAI Press, 2016, pp. 1044–1050.[4] A. Camacho, J. A. Baier, C. J. Muise, S. A. McIlraith, Finite LTL synthesis as planning, in:ICAPS, AAAI Press, 2018, pp. 29–38.[5] S. Zhu, L. M. Tabajara, J. Li, G. Pu, M. Y. Vardi, Symbolic ltlf synthesis, in: IJCAI, 2017.[6] G. De Giacomo, S. Rubin, Automata-theoretic foundations of FOND planning for ltlf andldlf goals, in: IJCAI, ijcai.org, 2018, pp. 4729–4735.[7] R. I. Brafman, G. De Giacomo, Planning for ltlf /ldlf goals in non-markovian fully observablenondeterministic domains, in: IJCAI, ijcai.org, 2019, pp. 1602–1608.[8] A. Camacho, S. A. McIlraith, Strong fully observable non-deterministic planning with LTLand ltlf goals, in: IJCAI, ijcai.org, 2019, pp. 5523–5531.[9] G. De Giacomo, M. Favorito, F. Fuggitti, Planning for temporally extended goals inpure-past linear temporal logic: A polynomial reduction to standard planning, CoRRabs/2204.09960 (2022).[10] R. I. Brafman, G. De Giacomo, F. Patrizi, Ltlf/ldlf non-markovian rewards, in: AAAI, AAAIPress, 2018, pp. 1771–1778.[11] R. I. Brafman, G. D. Giacomo, Regular decision processes: A model for non-markoviandomains, in: IJCAI, ijcai.org, 2019, pp. 5516–5522.[12] R. S. Sutton, A. G. Barto, Reinforcement learning - an introduction, Adaptive computationand machine learning, MIT Press, 1998.[13] A. Camacho, R. T. Icarte, T. Q. Klassen, R. A. Valenzano, S. A. McIlraith, LTL and beyond:Formal languages for reward function specification in reinforcement learning, in: IJCAI,ijcai.org, 2019, pp. 6065–6073.[14] G. De Giacomo, L. Iocchi, M. Favorito, F. Patrizi, Foundations for restraining bolts:Reinforcement learning with ltlf/ldlf restraining specifications, in: ICAPS, AAAI Press,2019, pp. 128–136.[15] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves,M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., Human-level control through deepreinforcement learning, nature 518 (2015) 529–533.[16] L. Lamport, What good is temporal logic?, in: IFIP Congress, North-Holland/IFIP, 1983,pp. 657–668.[17] C. Bettini, S. Jajodia, X. S. Wang, Time granularities in databases, data mining, and temporalreasoning, Springer, 2000.[18] B. Leban, D. McDonald, D. Forster, A representation for collections of temporal intervals,in: AAAI, Morgan Kaufmann, 1986, pp. 367–371.[19] M. Niezette, J. Stevenne, An efficient symbolic representation of periodic time, in: Pro-ceedings of the International Conference on Information and Knowledge Management(CIKM), 1992, pp. 161–168.[20] P. Ning, X. S. Wang, S. Jajodia, An algebraic representation of calendars, Ann. Math. Artif.Intell. 36 (2002) 5–38.[21] C. Combi, M. Franceschet, A. Peron, Representing and reasoning about temporal granular-ities, J. Log. Comput. 14 (2004) 51–77.[22] S. Demri, LTL over integer periodicity constraints, Theor. Comput. Sci. 360 (2006) 96–123.[23] H. Bowman, S. J. Thompson, A decision procedure and complete axiomatization of finiteinterval temporal logic with projection, J. Log. Comput. 13 (2003) 195–239.[24] G. Hariharan, B. Kempa, T. Wongpiromsarn, P. H. Jones, K. Y. Rozier, MLTL multi-type(MLTLM): A logic for reasoning about signals of different types, in: NSV/FoMLAS@CAV,volume 13466 of Lecture Notes in Computer Science, Springer, 2022, pp. 187–204.[25] J. Wijsen, A string-based model for infinite granularities, in: Proceedings of the AAAIWorkshop on Spatial and Temporal Granularities, 2000, pp. 9–16.[26] U. D. Lago, A. Montanari, Calendars, time granularities, and automata, in: SSTD, volume2121 of Lecture Notes in Computer Science, Springer, 2001, pp. 279–298.[27] D. Bresolin, A. Montanari, G. Puppis, Time granularities and ultimately periodic automata,in: JELIA, volume 3229 of Lecture Notes in Computer Science, Springer, 2004, pp. 513–525.[28] U. D. Lago, A. Montanari, G. Puppis, Compact and tractable automaton-based representa-tions of time granularities, Theor. Comput. Sci. 373 (2007) 115–141.[29] U. D. Lago, A. Montanari, G. Puppis, On the equivalence of automaton-based representa-tions of time granularities, in: TIME, IEEE Computer Society, 2007, pp. 82–93.[30] J. Euzenat, A. Montanari, Time granularity, in: Handbook of Temporal Reasoning inArtificial Intelligence, volume 1 of Foundations of Artificial Intelligence, Elsevier, 2005, pp.59–118.[31] C. Eisner, D. Fisman, J. Havlicek, A. McIsaac, D. V. Campenhout, The definition of atemporal clock operator, in: ICALP, volume 2719 of Lecture Notes in Computer Science,Springer, 2003, pp. 857–870.[32] C. Eisner, D. Fisman, A Practical Introduction to PSL, Series on Integrated Circuits andSystems, Springer, 2006.[33] G. De Giacomo, M. Favorito, L. Iocchi, F. Patrizi, A. Ronca, Temporal logic monitoringrewards via transducers, in: KR, 2020, pp. 860–870.[34] R. T. Icarte, T. Q. Klassen, R. A. Valenzano, S. A. McIlraith, Teaching multiple tasks to anRL agent using LTL, in: AAMAS, International Foundation for Autonomous Agents andMultiagent Systems Richland, SC, USA / ACM, 2018, pp. 452–461.

Clock specifications for temporal tasks in planning and learning

https://ora.ox.ac.uk/objects/uuid:14fc64fc-2aa8-4c5e-8dfb-e3efc4944eac/files/sc247dt936

Clock specifications for temporal tasks in planning and learning

Abstract

Similar works

Full text

Available Versions

Oxford University Research Archive