1,500 research outputs found

    Restricted Value Iteration: Theory and Algorithms

    Full text link
    Value iteration is a popular algorithm for finding near optimal policies for POMDPs. It is inefficient due to the need to account for the entire belief space, which necessitates the solution of large numbers of linear programs. In this paper, we study value iteration restricted to belief subsets. We show that, together with properly chosen belief subsets, restricted value iteration yields near-optimal policies and we give a condition for determining whether a given belief subset would bring about savings in space and time. We also apply restricted value iteration to two interesting classes of POMDPs, namely informative POMDPs and near-discernible POMDPs

    Decision-Theoretic Planning with Person Trajectory Prediction for Social Navigation

    Get PDF
    Robots navigating in a social way should reason about people intentions when acting. For instance, in applications like robot guidance or meeting with a person, the robot has to consider the goals of the people. Intentions are inherently nonobservable, and thus we propose Partially Observable Markov Decision Processes (POMDPs) as a decision-making tool for these applications. One of the issues with POMDPs is that the prediction models are usually handcrafted. In this paper, we use machine learning techniques to build prediction models from observations. A novel technique is employed to discover points of interest (goals) in the environment, and a variant of Growing Hidden Markov Models (GHMMs) is used to learn the transition probabilities of the POMDP. The approach is applied to an autonomous telepresence robot

    Influence-Optimistic Local Values for Multiagent Planning --- Extended Version

    Get PDF
    Recent years have seen the development of methods for multiagent planning under uncertainty that scale to tens or even hundreds of agents. However, most of these methods either make restrictive assumptions on the problem domain, or provide approximate solutions without any guarantees on quality. Methods in the former category typically build on heuristic search using upper bounds on the value function. Unfortunately, no techniques exist to compute such upper bounds for problems with non-factored value functions. To allow for meaningful benchmarking through measurable quality guarantees on a very general class of problems, this paper introduces a family of influence-optimistic upper bounds for factored decentralized partially observable Markov decision processes (Dec-POMDPs) that do not have factored value functions. Intuitively, we derive bounds on very large multiagent planning problems by subdividing them in sub-problems, and at each of these sub-problems making optimistic assumptions with respect to the influence that will be exerted by the rest of the system. We numerically compare the different upper bounds and demonstrate how we can achieve a non-trivial guarantee that a heuristic solution for problems with hundreds of agents is close to optimal. Furthermore, we provide evidence that the upper bounds may improve the effectiveness of heuristic influence search, and discuss further potential applications to multiagent planning.Comment: Long version of IJCAI 2015 paper (and extended abstract at AAMAS 2015
    • …
    corecore