14,251 research outputs found

    Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming

    Get PDF
    Real-world planning problems frequently involve mixtures of continuous and discrete state variables and actions, and are formulated in environments with an unknown number of objects. In recent years, probabilistic programming has emerged as a natural approach to capture and characterize such complex probability distributions with general-purpose inference methods. While it is known that a probabilistic programming language can be easily extended to represent Markov Decision Processes (MDPs) for planning tasks, solving such tasks is challenging. Building on related efforts in reinforcement learning, we introduce a conceptually simple but powerful planning algorithm for MDPs realized as a probabilistic program. This planner constructs approximations to the optimal policy by importance sampling, while exploiting the knowledge of the MDP model.status: publishe

    Perseus: Randomized Point-based Value Iteration for POMDPs

    Full text link
    Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agents belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems

    Verification of Uncertain POMDPs Using Barrier Certificates

    Full text link
    We consider a class of partially observable Markov decision processes (POMDPs) with uncertain transition and/or observation probabilities. The uncertainty takes the form of probability intervals. Such uncertain POMDPs can be used, for example, to model autonomous agents with sensors with limited accuracy, or agents undergoing a sudden component failure, or structural damage [1]. Given an uncertain POMDP representation of the autonomous agent, our goal is to propose a method for checking whether the system will satisfy an optimal performance, while not violating a safety requirement (e.g. fuel level, velocity, and etc.). To this end, we cast the POMDP problem into a switched system scenario. We then take advantage of this switched system characterization and propose a method based on barrier certificates for optimality and/or safety verification. We then show that the verification task can be carried out computationally by sum-of-squares programming. We illustrate the efficacy of our method by applying it to a Mars rover exploration example.Comment: 8 pages, 4 figure

    Computational Approaches for Stochastic Shortest Path on Succinct MDPs

    Full text link
    We consider the stochastic shortest path (SSP) problem for succinct Markov decision processes (MDPs), where the MDP consists of a set of variables, and a set of nondeterministic rules that update the variables. First, we show that several examples from the AI literature can be modeled as succinct MDPs. Then we present computational approaches for upper and lower bounds for the SSP problem: (a)~for computing upper bounds, our method is polynomial-time in the implicit description of the MDP; (b)~for lower bounds, we present a polynomial-time (in the size of the implicit description) reduction to quadratic programming. Our approach is applicable even to infinite-state MDPs. Finally, we present experimental results to demonstrate the effectiveness of our approach on several classical examples from the AI literature
    • …
    corecore