74,205 research outputs found
The Complexity of Planning Revisited - A Parameterized Analysis
The early classifications of the computational complexity of planning under
various restrictions in STRIPS (Bylander) and SAS+ (Baeckstroem and Nebel) have
influenced following research in planning in many ways. We go back and
reanalyse their subclasses, but this time using the more modern tool of
parameterized complexity analysis. This provides new results that together with
the old results give a more detailed picture of the complexity landscape. We
demonstrate separation results not possible with standard complexity theory,
which contributes to explaining why certain cases of planning have seemed
simpler in practice than theory has predicted. In particular, we show that
certain restrictions of practical interest are tractable in the parameterized
sense of the term, and that a simple heuristic is sufficient to make a
well-known partial-order planner exploit this fact.Comment: (author's self-archived copy
Active Markov Information-Theoretic Path Planning for Robotic Environmental Sensing
Recent research in multi-robot exploration and mapping has focused on
sampling environmental fields, which are typically modeled using the Gaussian
process (GP). Existing information-theoretic exploration strategies for
learning GP-based environmental field maps adopt the non-Markovian problem
structure and consequently scale poorly with the length of history of
observations. Hence, it becomes computationally impractical to use these
strategies for in situ, real-time active sampling. To ease this computational
burden, this paper presents a Markov-based approach to efficient
information-theoretic path planning for active sampling of GP-based fields. We
analyze the time complexity of solving the Markov-based path planning problem,
and demonstrate analytically that it scales better than that of deriving the
non-Markovian strategies with increasing length of planning horizon. For a
class of exploration tasks called the transect sampling task, we provide
theoretical guarantees on the active sampling performance of our Markov-based
policy, from which ideal environmental field conditions and sampling task
settings can be established to limit its performance degradation due to
violation of the Markov assumption. Empirical evaluation on real-world
temperature and plankton density field data shows that our Markov-based policy
can generally achieve active sampling performance comparable to that of the
widely-used non-Markovian greedy policies under less favorable realistic field
conditions and task settings while enjoying significant computational gain over
them.Comment: 10th International Conference on Autonomous Agents and Multiagent
Systems (AAMAS 2011), Extended version with proofs, 11 page
Influence-Optimistic Local Values for Multiagent Planning --- Extended Version
Recent years have seen the development of methods for multiagent planning
under uncertainty that scale to tens or even hundreds of agents. However, most
of these methods either make restrictive assumptions on the problem domain, or
provide approximate solutions without any guarantees on quality. Methods in the
former category typically build on heuristic search using upper bounds on the
value function. Unfortunately, no techniques exist to compute such upper bounds
for problems with non-factored value functions. To allow for meaningful
benchmarking through measurable quality guarantees on a very general class of
problems, this paper introduces a family of influence-optimistic upper bounds
for factored decentralized partially observable Markov decision processes
(Dec-POMDPs) that do not have factored value functions. Intuitively, we derive
bounds on very large multiagent planning problems by subdividing them in
sub-problems, and at each of these sub-problems making optimistic assumptions
with respect to the influence that will be exerted by the rest of the system.
We numerically compare the different upper bounds and demonstrate how we can
achieve a non-trivial guarantee that a heuristic solution for problems with
hundreds of agents is close to optimal. Furthermore, we provide evidence that
the upper bounds may improve the effectiveness of heuristic influence search,
and discuss further potential applications to multiagent planning.Comment: Long version of IJCAI 2015 paper (and extended abstract at AAMAS
2015
Experimental results : Reinforcement Learning of POMDPs using Spectral Methods
We propose a new reinforcement learning algorithm for partially observable
Markov decision processes (POMDP) based on spectral decomposition methods.
While spectral methods have been previously employed for consistent learning of
(passive) latent variable models such as hidden Markov models, POMDPs are more
challenging since the learner interacts with the environment and possibly
changes the future observations in the process. We devise a learning algorithm
running through epochs, in each epoch we employ spectral techniques to learn
the POMDP parameters from a trajectory generated by a fixed policy. At the end
of the epoch, an optimization oracle returns the optimal memoryless planning
policy which maximizes the expected reward based on the estimated POMDP model.
We prove an order-optimal regret bound with respect to the optimal memoryless
policy and efficient scaling with respect to the dimensionality of observation
and action spaces.Comment: 30th Conference on Neural Information Processing Systems (NIPS 2016),
Barcelona, Spai
- …