16,902 research outputs found
Anytime Point-Based Approximations for Large POMDPs
The Partially Observable Markov Decision Process has long been recognized as
a rich framework for real-world planning and control problems, especially in
robotics. However exact solutions in this framework are typically
computationally intractable for all but the smallest problems. A well-known
technique for speeding up POMDP solving involves performing value backups at
specific belief points, rather than over the entire belief simplex. The
efficiency of this approach, however, depends greatly on the selection of
points. This paper presents a set of novel techniques for selecting informative
belief points which work well in practice. The point selection procedure is
combined with point-based value backups to form an effective anytime POMDP
algorithm called Point-Based Value Iteration (PBVI). The first aim of this
paper is to introduce this algorithm and present a theoretical analysis
justifying the choice of belief selection technique. The second aim of this
paper is to provide a thorough empirical comparison between PBVI and other
state-of-the-art POMDP methods, in particular the Perseus algorithm, in an
effort to highlight their similarities and differences. Evaluation is performed
using both standard POMDP domains and realistic robotic tasks
Hierarchical Policy Search via Return-Weighted Density Estimation
Learning an optimal policy from a multi-modal reward function is a
challenging problem in reinforcement learning (RL). Hierarchical RL (HRL)
tackles this problem by learning a hierarchical policy, where multiple option
policies are in charge of different strategies corresponding to modes of a
reward function and a gating policy selects the best option for a given
context. Although HRL has been demonstrated to be promising, current
state-of-the-art methods cannot still perform well in complex real-world
problems due to the difficulty of identifying modes of the reward function. In
this paper, we propose a novel method called hierarchical policy search via
return-weighted density estimation (HPSDE), which can efficiently identify the
modes through density estimation with return-weighted importance sampling. Our
proposed method finds option policies corresponding to the modes of the return
function and automatically determines the number and the location of option
policies, which significantly reduces the burden of hyper-parameters tuning.
Through experiments, we demonstrate that the proposed HPSDE successfully learns
option policies corresponding to modes of the return function and that it can
be successfully applied to a challenging motion planning problem of a redundant
robotic manipulator.Comment: The 32nd AAAI Conference on Artificial Intelligence (AAAI 2018), 9
page
Sampling-Based Methods for Factored Task and Motion Planning
This paper presents a general-purpose formulation of a large class of
discrete-time planning problems, with hybrid state and control-spaces, as
factored transition systems. Factoring allows state transitions to be described
as the intersection of several constraints each affecting a subset of the state
and control variables. Robotic manipulation problems with many movable objects
involve constraints that only affect several variables at a time and therefore
exhibit large amounts of factoring. We develop a theoretical framework for
solving factored transition systems with sampling-based algorithms. The
framework characterizes conditions on the submanifold in which solutions lie,
leading to a characterization of robust feasibility that incorporates
dimensionality-reducing constraints. It then connects those conditions to
corresponding conditional samplers that can be composed to produce values on
this submanifold. We present two domain-independent, probabilistically complete
planning algorithms that take, as input, a set of conditional samplers. We
demonstrate the empirical efficiency of these algorithms on a set of
challenging task and motion planning problems involving picking, placing, and
pushing
- …