2 research outputs found
LTLf/LDLf Non-Markovian Rewards
In Markov Decision Processes (MDPs), the reward obtained in a state is Markovian, i.e., depends on the last state and action. This dependency makes it difficult to reward more interesting long-term behaviors, such as always closing a door after it has been opened, or providing coffee only following a request. Extending MDPs to handle non-Markovian reward functions was the subject of two previous lines of work. Both use LTL variants to specify the reward function and then compile the new model back into a Markovian model. Building on recent progress in temporal logics over finite traces, we adopt LDLf for specifying non-Markovian rewards and provide an elegant automata construction for building a Markovian model, which extends that of previous work and offers strong minimality and compositionality guarantees
Semipositive LTL with an uninterpreted past operator
intended to be true just at the end of some behaviour of interest - that is, to mark the end of the accepted (finite) words of some language. There is an effectively recognisable class of $LTL formulae which express behaviours, but in a sense different from the standard one of temporal logics like LTL or CTL. This representation is useful for solving a class of decision processes with temporally extended goals, which in turn are useful for representing an important class of AI planning problems