60 research outputs found
Anytime Point-Based Approximations for Large POMDPs
The Partially Observable Markov Decision Process has long been recognized as
a rich framework for real-world planning and control problems, especially in
robotics. However exact solutions in this framework are typically
computationally intractable for all but the smallest problems. A well-known
technique for speeding up POMDP solving involves performing value backups at
specific belief points, rather than over the entire belief simplex. The
efficiency of this approach, however, depends greatly on the selection of
points. This paper presents a set of novel techniques for selecting informative
belief points which work well in practice. The point selection procedure is
combined with point-based value backups to form an effective anytime POMDP
algorithm called Point-Based Value Iteration (PBVI). The first aim of this
paper is to introduce this algorithm and present a theoretical analysis
justifying the choice of belief selection technique. The second aim of this
paper is to provide a thorough empirical comparison between PBVI and other
state-of-the-art POMDP methods, in particular the Perseus algorithm, in an
effort to highlight their similarities and differences. Evaluation is performed
using both standard POMDP domains and realistic robotic tasks
Memory Bounded Open-Loop Planning in Large POMDPs using Thompson Sampling
State-of-the-art approaches to partially observable planning like POMCP are
based on stochastic tree search. While these approaches are computationally
efficient, they may still construct search trees of considerable size, which
could limit the performance due to restricted memory resources. In this paper,
we propose Partially Observable Stacked Thompson Sampling (POSTS), a memory
bounded approach to open-loop planning in large POMDPs, which optimizes a fixed
size stack of Thompson Sampling bandits. We empirically evaluate POSTS in four
large benchmark problems and compare its performance with different tree-based
approaches. We show that POSTS achieves competitive performance compared to
tree-based open-loop planning and offers a performance-memory tradeoff, making
it suitable for partially observable planning with highly restricted
computational and memory resources.Comment: Presented at AAAI 201
Active Sensing as Bayes-Optimal Sequential Decision Making
Sensory inference under conditions of uncertainty is a major problem in both
machine learning and computational neuroscience. An important but poorly
understood aspect of sensory processing is the role of active sensing. Here, we
present a Bayes-optimal inference and control framework for active sensing,
C-DAC (Context-Dependent Active Controller). Unlike previously proposed
algorithms that optimize abstract statistical objectives such as information
maximization (Infomax) [Butko & Movellan, 2010] or one-step look-ahead accuracy
[Najemnik & Geisler, 2005], our active sensing model directly minimizes a
combination of behavioral costs, such as temporal delay, response error, and
effort. We simulate these algorithms on a simple visual search task to
illustrate scenarios in which context-sensitivity is particularly beneficial
and optimization with respect to generic statistical objectives particularly
inadequate. Motivated by the geometric properties of the C-DAC policy, we
present both parametric and non-parametric approximations, which retain
context-sensitivity while significantly reducing computational complexity.
These approximations enable us to investigate the more complex problem
involving peripheral vision, and we notice that the difference between C-DAC
and statistical policies becomes even more evident in this scenario.Comment: Scheduled to appear in UAI 201
- …