34 research outputs found
A POMDP Extension with Belief-dependent Rewards
International audiencePartially Observable Markov Decision Processes (POMDPs) model sequential decision-making problems under uncertainty and partial observability. Unfortunately, some problems cannot be modeled with state-dependent reward functions, e.g., problems whose objective explicitly implies reducing the uncertainty on the state. To that end, we introduce ρPOMDPs, an extension of POMDPs where the reward function ρ depends on the belief state. We show that, under the common assumption that ρ is convex, the value function is also convex, what makes it possible to (1) approximate ρ arbitrarily well with a piecewise linear and convex (PWLC) function, and (2) use state-of-the-art exact or approximate solving algorithms with limited changes
Online algorithms for POMDPs with continuous state, action, and observation spaces
Online solvers for partially observable Markov decision processes have been
applied to problems with large discrete state spaces, but continuous state,
action, and observation spaces remain a challenge. This paper begins by
investigating double progressive widening (DPW) as a solution to this
challenge. However, we prove that this modification alone is not sufficient
because the belief representations in the search tree collapse to a single
particle causing the algorithm to converge to a policy that is suboptimal
regardless of the computation time. This paper proposes and evaluates two new
algorithms, POMCPOW and PFT-DPW, that overcome this deficiency by using
weighted particle filtering. Simulation results show that these modifications
allow the algorithms to be successful where previous approaches fail.Comment: Added Multilane sectio
Measurement Simplification in \rho-POMDP with Performance Guarantees
Decision making under uncertainty is at the heart of any autonomous system
acting with imperfect information. The cost of solving the decision making
problem is exponential in the action and observation spaces, thus rendering it
unfeasible for many online systems. This paper introduces a novel approach to
efficient decision-making, by partitioning the high-dimensional observation
space. Using the partitioned observation space, we formulate analytical bounds
on the expected information-theoretic reward, for general belief distributions.
These bounds are then used to plan efficiently while keeping performance
guarantees. We show that the bounds are adaptive, computationally efficient,
and that they converge to the original solution. We extend the partitioning
paradigm and present a hierarchy of partitioned spaces that allows greater
efficiency in planning. We then propose a specific variant of these bounds for
Gaussian beliefs and show a theoretical performance improvement of at least a
factor of 4. Finally, we compare our novel method to other state of the art
algorithms in active SLAM scenarios, in simulation and in real experiments. In
both cases we show a significant speed-up in planning with performance
guarantees