17 research outputs found
Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes
Partially observable Markov decision processes (POMDPs) have recently become
popular among many AI researchers because they serve as a natural model for
planning under uncertainty. Value iteration is a well-known algorithm for
finding optimal policies for POMDPs. It typically takes a large number of
iterations to converge. This paper proposes a method for accelerating the
convergence of value iteration. The method has been evaluated on an array of
benchmark problems and was found to be very effective: It enabled value
iteration to converge after only a few iterations on all the test problems
LTL Control in Uncertain Environments with Probabilistic Satisfaction Guarantees
We present a method to generate a robot control strategy that maximizes the
probability to accomplish a task. The task is given as a Linear Temporal Logic
(LTL) formula over a set of properties that can be satisfied at the regions of
a partitioned environment. We assume that the probabilities with which the
properties are satisfied at the regions are known, and the robot can determine
the truth value of a proposition only at the current region. Motivated by
several results on partitioned-based abstractions, we assume that the motion is
performed on a graph. To account for noisy sensors and actuators, we assume
that a control action enables several transitions with known probabilities. We
show that this problem can be reduced to the problem of generating a control
policy for a Markov Decision Process (MDP) such that the probability of
satisfying an LTL formula over its states is maximized. We provide a complete
solution for the latter problem that builds on existing results from
probabilistic model checking. We include an illustrative case study.Comment: Technical Report accompanying IFAC 201
ρ-POMDPs have Lipschitz-Continuous ϵ-Optimal Value Functions
International audienceMany state-of-the-art algorithms for solving Partially Observable Markov Decision Processes (POMDPs) rely on turning the problem into a "fully observable" problem---a belief MDP---and exploiting the piece-wise linearity and convexity (PWLC) of the optimal value function in this new state space (the belief simplex ∆). This approach has been extended to solving ρ-POMDPs---i.e., for information-oriented criteria-when the reward ρ is convex in ∆. General ρ-POMDPs can also be turned into "fully observable" problems, but with no means to exploit the PWLC property. In this paper, we focus on POMDPs and ρ-POMDPs with λ ρ-Lipschitz reward function, and demonstrate that, for finite horizons, the optimal value function is Lipschitz-continuous. Then, value function approximators are proposed for both upper-and lower-bounding the optimal value function, which are shown to provide uniformly improvable bounds. This allows proposing two algorithms derived from HSVI which are empirically evaluated on various benchmark problems
Anytime Point-Based Approximations for Large POMDPs
The Partially Observable Markov Decision Process has long been recognized as
a rich framework for real-world planning and control problems, especially in
robotics. However exact solutions in this framework are typically
computationally intractable for all but the smallest problems. A well-known
technique for speeding up POMDP solving involves performing value backups at
specific belief points, rather than over the entire belief simplex. The
efficiency of this approach, however, depends greatly on the selection of
points. This paper presents a set of novel techniques for selecting informative
belief points which work well in practice. The point selection procedure is
combined with point-based value backups to form an effective anytime POMDP
algorithm called Point-Based Value Iteration (PBVI). The first aim of this
paper is to introduce this algorithm and present a theoretical analysis
justifying the choice of belief selection technique. The second aim of this
paper is to provide a thorough empirical comparison between PBVI and other
state-of-the-art POMDP methods, in particular the Perseus algorithm, in an
effort to highlight their similarities and differences. Evaluation is performed
using both standard POMDP domains and realistic robotic tasks
Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes
Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a well-known algorithm for finding optimal policies for POMDPs. It typically takes a large number of iterations to converge. This paper proposes a method for accelerating the convergence of value iteration. The method has been evaluated on an array of benchmark problems and was found to be very effective: It enabled value iteration to converge after only a few iterations on all the test problems. 1. Introduction POMDPs model sequential decision making problems where effects of actions are nondeterministic and the state of the world is not known with certainty. They have attracted many researchers in Operations Research and Artificial Intelligence because of their potential applications in a wide range of areas (Monahan 1982, Cassandra 1998b), one of which is planning under uncertai..
On Polynomial Sized MDP Succinct Policies
Policies of Markov Decision Processes (MDPs) determine the next action to
execute from the current state and, possibly, the history (the past states).
When the number of states is large, succinct representations are often used to
compactly represent both the MDPs and the policies in a reduced amount of
space. In this paper, some problems related to the size of succinctly
represented policies are analyzed. Namely, it is shown that some MDPs have
policies that can only be represented in space super-polynomial in the size of
the MDP, unless the polynomial hierarchy collapses. This fact motivates the
study of the problem of deciding whether a given MDP has a policy of a given
size and reward. Since some algorithms for MDPs work by finding a succinct
representation of the value function, the problem of deciding the existence of
a succinct representation of a value function of a given size and reward is
also considered
Restricted Value Iteration: Theory and Algorithms
Value iteration is a popular algorithm for finding near optimal policies for
POMDPs. It is inefficient due to the need to account for the entire belief
space, which necessitates the solution of large numbers of linear programs. In
this paper, we study value iteration restricted to belief subsets. We show
that, together with properly chosen belief subsets, restricted value iteration
yields near-optimal policies and we give a condition for determining whether a
given belief subset would bring about savings in space and time. We also apply
restricted value iteration to two interesting classes of POMDPs, namely
informative POMDPs and near-discernible POMDPs
On knowledge representation and decision making under uncertainty
Designing systems with the ability to make optimal decisions under uncertainty is one of the goals of artificial intelligence. However, in many applications the design of optimal planners is complicated due to imprecise inputs and uncertain outputs resulting from stochastic dynamics. Partially Observable Markov Decision Processes (POMDPs) provide a rich mathematical framework to model these kinds of problems. However, the high computational demand of solution methods for POMDPs is a drawback for applying them in practice.In this thesis, we present a two-fold approach for improving the tractability of POMDP planning. First, we focus on designing good heuristics for POMDP approximation algorithms. We aim to scale up the efficiency of a class of POMDP approximations called point-based planning methods by designing a good planning space. We study the effect of three properties of reachable belief state points that may influence the performance of point-based approximation methods. Second, we investigate approaches to designing good controllers using an alternative representation of systems with partial observability called Predictive State Representation (PSR). This part of the thesis advocates the usefulness and practicality of PSRs in planning under uncertainty. We also attempt to move some useful characteristics of the PSR model, which has a predictive view of the world, to the POMDP model, which has a probabilistic view of the hidden states of the world. We propose a planning algorithm motivated by the connections between the two models