28 research outputs found

    A Scalable Method for Solving High-Dimensional Continuous POMDPs Using Local Approximation

    Get PDF
    Partially-Observable Markov Decision Processes (POMDPs) are typically solved by finding an approximate global solution to a corresponding belief-MDP. In this paper, we offer a new planning algorithm for POMDPs with continuous state, action and observation spaces. Since such domains have an inherent notion of locality, we can find an approximate solution using local optimization methods. We parameterize the belief distribution as a Gaussian mixture, and use the Extended Kalman Filter (EKF) to approximate the belief update. Since the EKF is a first-order filter, we can marginalize over the observations analytically. By using feedback control and state estimation during policy execution, we recover a behavior that is effectively conditioned on incoming observations despite the unconditioned planning. Local optimization provides no guarantees of global optimality, but it allows us to tackle domains that are at least an order of magnitude larger than the current state-of-the-art. We demonstrate the scalability of our algorithm by considering a simulated hand-eye coordination domain with 16 continuous state dimensions and 6 continuous action dimensions

    Which States Matter? An Application of an Intelligent Discretization Method to Solve a Continuous POMDP in Conservation Biology

    Get PDF
    When managing populations of threatened species, conservation managers seek to make the best conservation decisions to avoid extinction. Making the best decision is difficult because the true population size and the effects of management are uncertain. Managers must allocate limited resources between actively protecting the species and monitoring. Resources spent on monitoring reduce expenditure on management that could be used to directly improve species persistence. However monitoring may prevent sub-optimal management actions being taken as a result of observation error. Partially observable Markov decision processes (POMDPs) can optimize management for populations with partial detectability, but the solution methods can only be applied when there are few discrete states. We use the Continuous U-Tree (CU-Tree) algorithm to discretely represent a continuous state space by using only the states that are necessary to maintain an optimal management policy. We exploit the compact discretization created by CU-Tree to solve a POMDP on the original continuous state space. We apply our method to a population of sea otters and explore the trade-off between allocating resources to management and monitoring. We show that accurately discovering the population size is less important than management for the long term survival of our otter population

    A hypothesis-based algorithm for planning and control in non-Gaussian belief spaces

    Get PDF
    We consider the partially observable control problem where it is potentially necessary to perform complex information-gathering operations in order to localize state. One approach to solving these problems is to create plans in belief-space, the space of probability distributions over the underlying state of the system. The belief-space plan encodes a strategy for performing a task while gaining information as necessary. Most approaches to belief-space planning rely upon representing belief state in a particular way (typically as a Gaussian). Unfortunately, this can lead to large errors between the assumed density representation and the true belief state. We propose a new computationally efficient algorithm for planning in non-Gaussian belief spaces. We propose a receding horizon re-planning approach where planning occurs in a low-dimensional sampled representation of belief state while the true belief state of the system is monitored using an arbitrary accurate high-dimensional representation. Our key contribution is a planning problem that, when solved optimally on each re-planning step, is guaranteed, under certain conditions, to enable the system to gain information. We prove that when these conditions are met, the algorithm converges with probability one. We characterize algorithm performance for different parameter settings in simulation and report results from a robot experiment that illustrates the application of the algorithm to robot grasping
    corecore