Search CORE

7,805 research outputs found

Sound Abstraction of Probabilistic Actions in The Constraint Mass Assignment Framework

Author: Doan AnHai
Haddawy Peter
Publication venue
Publication date: 13/02/2013
Field of study

This paper provides a formal and practical framework for sound abstraction of probabilistic actions. We start by precisely defining the concept of sound abstraction within the context of finite-horizon planning (where each plan is a finite sequence of actions). Next we show that such abstraction cannot be performed within the traditional probabilistic action representation, which models a world with a single probability distribution over the state space. We then present the constraint mass assignment representation, which models the world with a set of probability distributions and is a generalization of mass assignment representations. Within this framework, we present sound abstraction procedures for three types of action abstraction. We end the paper with discussions and related work on sound and approximate abstraction. We give pointers to papers in which we discuss other sound abstraction-related issues, including applications, estimating loss due to abstraction, and automatically generating abstraction hierarchies.Comment: Appears in Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence (UAI1996

arXiv.org e-Print Archive

Structured Reachability Analysis for Markov Decision Processes

Author: Boutilier Craig
Brafman Ronen I.
Geib Christopher W.
Publication venue
Publication date: 23/04/2013
Field of study

Recent research in decision theoretic planning has focussed on making the solution of Markov decision processes (MDPs) more feasible. We develop a family of algorithms for structured reachability analysis of MDPs that are suitable when an initial state (or set of states) is known. Using compact, structured representations of MDPs (e.g., Bayesian networks), our methods, which vary in the tradeoff between complexity and accuracy, produce structured descriptions of (estimated) reachable states that can be used to eliminate variables or variable values from the problem description, reducing the size of the MDP and making it easier to solve. One contribution of our work is the extension of ideas from GRAPHPLAN to deal with the distributed nature of action representations typically embodied within Bayes nets and the problem of correlated action effects. We also demonstrate that our algorithm can be made more complete by using k-ary constraints instead of binary constraints. Another contribution is the illustration of how the compact representation of reachability constraints can be exploited by several existing (exact and approximate) abstraction algorithms for MDPs.Comment: Appears in Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI1998

arXiv.org e-Print Archive

Theoretical Foundations for Abstraction-Based Probabilistic Planning

Author: Ha Vu A.
Haddawy Peter
Publication venue
Publication date: 13/02/2013
Field of study

Modeling worlds and actions under uncertainty is one of the central problems in the framework of decision-theoretic planning. The representation must be general enough to capture real-world problems but at the same time it must provide a basis upon which theoretical results can be derived. The central notion in the framework we propose here is that of the affine-operator, which serves as a tool for constructing (convex) sets of probability distributions, and which can be considered as a generalization of belief functions and interval mass assignments. Uncertainty in the state of the worlds is modeled with sets of probability distributions, represented by affine-trees while actions are defined as tree-manipulators. A small set of key properties of the affine-operator is presented, forming the basis for most existing operator-based definitions of probabilistic action projection and action abstraction. We derive and prove correct three projection rules, which vividly illustrate the precision-complexity tradeoff in plan projection. Finally, we show how the three types of action abstraction identified by Haddawy and Doan are manifested in the present framework.Comment: Appears in Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence (UAI1996

arXiv.org e-Print Archive

Information-Theoretic Considerations in Batch Reinforcement Learning

Author: Chen Jinglin
Jiang Nan
Publication venue
Publication date: 01/05/2019
Field of study

Value-function approximation methods that operate in batch mode have foundational importance to reinforcement learning (RL). Finite sample guarantees for these methods often crucially rely on two types of assumptions: (1) mild distribution shift, and (2) representation conditions that are stronger than realizability. However, the necessity ("why do we need them?") and the naturalness ("when do they hold?") of such assumptions have largely eluded the literature. In this paper, we revisit these assumptions and provide theoretical results towards answering the above questions, and make steps towards a deeper understanding of value-function approximation.Comment: Published in ICML 201

arXiv.org e-Print Archive

Proximity-Based Non-uniform Abstractions for Approximate Planning

Author: Baum Jiri
Dix Trevor I.
Nicholson Ann E.
Publication venue: 'AI Access Foundation'
Publication date: 18/01/2014
Field of study

In a deterministic world, a planning agent can be certain of the consequences of its planned sequence of actions. Not so, however, in dynamic, stochastic domains where Markov decision processes are commonly used. Unfortunately these suffer from the curse of dimensionality: if the state space is a Cartesian product of many small sets (dimensions), planning is exponential in the number of those dimensions. Our new technique exploits the intuitive strategy of selectively ignoring various dimensions in different parts of the state space. The resulting non-uniformity has strong implications, since the approximation is no longer Markovian, requiring the use of a modified planner. We also use a spatial and temporal proximity measure, which responds to continued planning as well as movement of the agent through the state space, to dynamically adapt the abstraction as planning progresses. We present qualitative and quantitative results across a range of experimental domains showing that an agent exploiting this novel approximation method successfully finds solutions to the planning problem using much less than the full state space. We assess and analyse the features of domains which our method can exploit

arXiv.org e-Print Archive

State-Continuity Approximation of Markov Decision Processes via Finite Element Methods for Autonomous System Planning

Author: Liu Lantao
Xu Junhong
Yin Kai
Publication venue
Publication date: 30/06/2020
Field of study

Motion planning under uncertainty for an autonomous system can be formulated as a Markov Decision Process with a continuous state space. In this paper, we propose a novel solution to this decision-theoretic planning problem that directly obtains the continuous value function with only the first and second moments of the transition probabilities, alleviating the requirement for an explicit transition model in the literature. We achieve this by expressing the value function as a linear combination of basis functions and approximating the Bellman equation by a partial differential equation, where the value function can be naturally constructed using a finite element method. We have validated our approach via extensive simulations, and the evaluations reveal that to baseline methods, our solution leads to in terms of path smoothness, travel distance, and time costs.Comment: 9 pages, 6 figures, accepted by RA-

arXiv.org e-Print Archive

A Method for Planning Given Uncertain and Incomplete Information

Author: Mansell Todd Michael
Publication venue
Publication date: 06/03/2013
Field of study

This paper describes ongoing research into planning in an uncertain environment. In particular, it introduces U-Plan, a planning system that constructs quantitatively ranked plans given an incomplete description of the state of the world. U-Plan uses a DempsterShafer interval to characterise uncertain and incomplete information about the state of the world. The planner takes as input what is known about the world, and constructs a number of possible initial states with representations at different abstraction levels. A plan is constructed for the initial state with the greatest support, and this plan is tested to see if it will work for other possible initial states. All, part, or none of the existing plans may be used in the generation of the plans for the remaining possible worlds. Planning takes place in an abstraction hierarchy where strategic decisions are made before tactical decisions. A super-plan is then constructed, based on merging the set of plans and the appropriately timed acquisition of essential knowledge, which is used to decide between plan alternatives. U-Plan usually produces a super-plan in less time than a classical planner would take to produce a set of plans, one for each possible world.Comment: Appears in Proceedings of the Ninth Conference on Uncertainty in Artificial Intelligence (UAI1993

arXiv.org e-Print Archive

Practical Linear Value-approximation Techniques for First-order MDPs

Author: Boutilier Craig
Sanner Scott
Publication venue
Publication date: 27/06/2012
Field of study

Recent work on approximate linear programming (ALP) techniques for first-order Markov Decision Processes (FOMDPs) represents the value function linearly w.r.t. a set of first-order basis functions and uses linear programming techniques to determine suitable weights. This approach offers the advantage that it does not require simplification of the first-order value function, and allows one to solve FOMDPs independent of a specific domain instantiation. In this paper, we address several questions to enhance the applicability of this work: (1) Can we extend the first-order ALP framework to approximate policy iteration to address performance deficiencies of previous approaches? (2) Can we automatically generate basis functions and evaluate their impact on value function quality? (3) How can we decompose intractable problems with universally quantified rewards into tractable subproblems? We propose answers to these questions along with a number of novel optimizations and provide a comparative empirical evaluation on logistics problems from the ICAPS 2004 Probabilistic Planning Competition.Comment: Appears in Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence (UAI2006

arXiv.org e-Print Archive

A Graph-Theoretic Analysis of Information Value

Author: Horvitz Eric J.
Poh Kim-Leng
Publication venue
Publication date: 13/02/2013
Field of study

We derive qualitative relationships about the informational relevance of variables in graphical decision models based on a consideration of the topology of the models. Specifically, we identify dominance relations for the expected value of information on chance variables in terms of their position and relationships in influence diagrams. The qualitative relationships can be harnessed to generate nonnumerical procedures for ordering uncertain variables in a decision model by their informational relevance.Comment: Appears in Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence (UAI1996

arXiv.org e-Print Archive

Learning What Information to Give in Partially Observed Domains

Author: Chitnis Rohan
Kaelbling Leslie Pack
Lozano-Pérez Tomás
Publication venue
Publication date: 27/09/2018
Field of study

In many robotic applications, an autonomous agent must act within and explore a partially observed environment that is unobserved by its human teammate. We consider such a setting in which the agent can, while acting, transmit declarative information to the human that helps them understand aspects of this unseen environment. In this work, we address the algorithmic question of how the agent should plan out what actions to take and what information to transmit. Naturally, one would expect the human to have preferences, which we model information-theoretically by scoring transmitted information based on the change it induces in weighted entropy of the human's belief state. We formulate this setting as a belief MDP and give a tractable algorithm for solving it approximately. Then, we give an algorithm that allows the agent to learn the human's preferences online, through exploration. We validate our approach experimentally in simulated discrete and continuous partially observed search-and-recover domains. Visit http://tinyurl.com/chitnis-corl-18 for a supplementary video.Comment: CoRL 2018 final versio

arXiv.org e-Print Archive