Search CORE

25 research outputs found

Decision-theoretic planning with non-Markovian rewards

Author: Gretton Charles
Kabanza Froduald
Price David
Slaney John K
Thiebaux Sylvie M
Publication venue: Morgan Kauffman Publishers
Publication date: 08/12/2015
Field of study

Decision-Theoretic Planning with non-Markovian Rewards

Author: Gretton C.
Kabanza F.
Price D.
Slaney J.
Thiebaux S.
Publication venue: 'AI Access Foundation'
Publication date: 11/09/2011
Field of study

A decision process in which rewards depend on history rather than merely on the current state is called a decision process with non-Markovian rewards (NMRDP). In decision-theoretic planning, where many desirable behaviours are more naturally expressed as properties of execution sequences rather than as properties of states, NMRDPs form a more natural model than the commonly adopted fully Markovian decision process (MDP) model. While the more tractable solution methods developed for MDPs do not directly apply in the presence of non-Markovian rewards, a number of solution methods for NMRDPs have been proposed in the literature. These all exploit a compact specification of the non-Markovian reward function in temporal logic, to automatically translate the NMRDP into an equivalent MDP which is solved using efficient MDP solution methods. This paper presents NMRDPP (Non-Markovian Reward Decision Process Planner), a software platform for the development and experimentation of methods for decision-theoretic planning with non-Markovian rewards. The current version of NMRDPP implements, under a single interface, a family of methods based on existing as well as new approaches which we describe in detail. These include dynamic programming, heuristic search, and structured methods. Using NMRDPP, we compare the methods and identify certain problem features that affect their performance. NMRDPPs treatment of non-Markovian rewards is inspired by the treatment of domain-specific search control knowledge in the TLPlan planner, which it incorporates as a special case. In the First International Probabilistic Planning Competition, NMRDPP was able to compete and perform well in both the domain-independent and hand-coded tracks, using search control knowledge in the latter

arXiv.org e-Print Archive

Crossref

Service composition in stochastic settings

Author: B Medjahed
D Berardi
D Menasce
D Wu
G Giacomo De
G Giacomo De
G Giacomo De
HJ Levesque
J Bronsted
J Cardoso
J Yang
L Zeng
M Pistore
MJ Fischer
R Hull
S Thiébaux
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

With the growth of the Internet-of-Things and online Web services, more services with more capabilities are available to us. The ability to generate new, more useful services from existing ones has been the focus of much research for over a decade. The goal is, given a specification of the behavior of the target service, to build a controller, known as an orchestrator, that uses existing services to satisfy the requirements of the target service. The model of services and requirements used in most work is that of a finite state machine. This implies that the specification can either be satisfied or not, with no middle ground. This is a major drawback, since often an exact solution cannot be obtained. In this paper we study a simple stochastic model for service composition: we annotate the tar- get service with probabilities describing the likelihood of requesting each action in a state, and rewards for being able to execute actions. We show how to solve the resulting problem by solving a certain Markov Decision Process (MDP) derived from the service and requirement specifications. The solution to this MDP induces an orchestrator that coincides with the exact solution if a composition exists. Otherwise it provides an approximate solution that maximizes the expected sum of values of user requests that can be serviced. The model studied although simple shades light on composition in stochastic settings and indeed we discuss several possible extensions

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Engineering a Conformant Probabilistic Planner

Author: Li L.
Onder N.
Whelan G. C.
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2006
Field of study

We present a partial-order, conformant, probabilistic planner, Probapop which competed in the blind track of the Probabilistic Planning Competition in IPC-4. We explain how we adapt distance based heuristics for use with probabilistic domains. Probapop also incorporates heuristics based on probability of success. We explain the successes and difficulties encountered during the design and implementation of Probapop

arXiv.org e-Print Archive

Crossref

Michigan Technological University

LTLf/LDLf Non-Markovian Rewards

Author: Brafman RONEN ISRAEL
DE GIACOMO Giuseppe
Patrizi Fabio
Publication venue: AAAI Press
Publication date: 01/01/2018
Field of study

In Markov Decision Processes (MDPs), the reward obtained in a state is Markovian, i.e., depends on the last state and action. This dependency makes it difficult to reward more interesting long-term behaviors, such as always closing a door after it has been opened, or providing coffee only following a request. Extending MDPs to handle non-Markovian reward functions was the subject of two previous lines of work. Both use LTL variants to specify the reward function and then compile the new model back into a Markovian model. Building on recent progress in temporal logics over finite traces, we adopt LDLf for specifying non-Markovian rewards and provide an elegant automata construction for building a Markovian model, which extends that of previous work and offers strong minimality and compositionality guarantees

Archivio della ricerca- Università di Roma La Sapienza

Association for the Advancement of Artificial Intelligence: AAAI Publications

Using Experience Classification for Training Non-Markovian Tasks

Author: Duan Zhenhua
Lu Xu
Miao Ruixuan
Tian Cong
Yu Bin
Publication venue
Publication date: 17/10/2023
Field of study

Unlike the standard Reinforcement Learning (RL) model, many real-world tasks are non-Markovian, whose rewards are predicated on state history rather than solely on the current state. Solving a non-Markovian task, frequently applied in practical applications such as autonomous driving, financial trading, and medical diagnosis, can be quite challenging. We propose a novel RL approach to achieve non-Markovian rewards expressed in temporal logic LTL

_f

(Linear Temporal Logic over Finite Traces). To this end, an encoding of linear complexity from LTL

_f

into MDPs (Markov Decision Processes) is introduced to take advantage of advanced RL algorithms. Then, a prioritized experience replay technique based on the automata structure (semantics equivalent to LTL

_f

specification) is utilized to improve the training process. We empirically evaluate several benchmark problems augmented with non-Markovian tasks to demonstrate the feasibility and effectiveness of our approach

arXiv.org e-Print Archive

Pure-Past Linear Temporal and Dynamic Logic on Finite Traces

Author: De Giacomo Giuseppe
Di Stasio Antonio
Fuggitti Francesco
Rubin Sasha
Publication venue: 'International Joint Conferences on Artificial Intelligence'
Publication date: 01/01/2020
Field of study

LTLf and LDLf are well-known logics on finite traces. We review PLTLf and PLDLf, their pure- past versions. These are interpreted backward from the end of the trace towards the beginning. Because of this, we can exploit a foundational result on reverse languages to get an exponential improvement, wrt LTLf /LDLf, in computing the corresponding DFA. This exponential improvement is reflected in several forms sequential decision making involving temporal specifications, such as planning and decision problems in non-deterministic and non-Markovian domains. Interestingly, PLTLf (resp. PLDLf ) has the same expressive power as LTLf (resp. LDLf ), but transforming a PLTLf (resp. PLDLf ) formula into its equivalent in LTLf (resp. LDLf ) is quite expensive. Hence, to take advantage of the exponential improvement, properties of interest must be directly expressed in PLTLf /PLTLf

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Reinforcement Learning with Non-Markovian Rewards

Author: Brafman Ronen I.
Gaon Maor
Publication venue
Publication date: 05/12/2019
Field of study

The standard RL world model is that of a Markov Decision Process (MDP). A basic premise of MDPs is that the rewards depend on the last state and action only. Yet, many real-world rewards are non-Markovian. For example, a reward for bringing coffee only if requested earlier and not yet served, is non-Markovian if the state only records current requests and deliveries. Past work considered the problem of modeling and solving MDPs with non-Markovian rewards (NMR), but we know of no principled approaches for RL with NMR. Here, we address the problem of policy learning from experience with such rewards. We describe and evaluate empirically four combinations of the classical RL algorithm Q-learning and R-max with automata learning algorithms to obtain new RL algorithms for domains with NMR. We also prove that some of these variants converge to an optimal policy in the limit.Comment: To Appear in AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications