74,205 research outputs found

    The Complexity of Planning Revisited - A Parameterized Analysis

    Full text link
    The early classifications of the computational complexity of planning under various restrictions in STRIPS (Bylander) and SAS+ (Baeckstroem and Nebel) have influenced following research in planning in many ways. We go back and reanalyse their subclasses, but this time using the more modern tool of parameterized complexity analysis. This provides new results that together with the old results give a more detailed picture of the complexity landscape. We demonstrate separation results not possible with standard complexity theory, which contributes to explaining why certain cases of planning have seemed simpler in practice than theory has predicted. In particular, we show that certain restrictions of practical interest are tractable in the parameterized sense of the term, and that a simple heuristic is sufficient to make a well-known partial-order planner exploit this fact.Comment: (author's self-archived copy

    Active Markov Information-Theoretic Path Planning for Robotic Environmental Sensing

    Full text link
    Recent research in multi-robot exploration and mapping has focused on sampling environmental fields, which are typically modeled using the Gaussian process (GP). Existing information-theoretic exploration strategies for learning GP-based environmental field maps adopt the non-Markovian problem structure and consequently scale poorly with the length of history of observations. Hence, it becomes computationally impractical to use these strategies for in situ, real-time active sampling. To ease this computational burden, this paper presents a Markov-based approach to efficient information-theoretic path planning for active sampling of GP-based fields. We analyze the time complexity of solving the Markov-based path planning problem, and demonstrate analytically that it scales better than that of deriving the non-Markovian strategies with increasing length of planning horizon. For a class of exploration tasks called the transect sampling task, we provide theoretical guarantees on the active sampling performance of our Markov-based policy, from which ideal environmental field conditions and sampling task settings can be established to limit its performance degradation due to violation of the Markov assumption. Empirical evaluation on real-world temperature and plankton density field data shows that our Markov-based policy can generally achieve active sampling performance comparable to that of the widely-used non-Markovian greedy policies under less favorable realistic field conditions and task settings while enjoying significant computational gain over them.Comment: 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2011), Extended version with proofs, 11 page

    Influence-Optimistic Local Values for Multiagent Planning --- Extended Version

    Get PDF
    Recent years have seen the development of methods for multiagent planning under uncertainty that scale to tens or even hundreds of agents. However, most of these methods either make restrictive assumptions on the problem domain, or provide approximate solutions without any guarantees on quality. Methods in the former category typically build on heuristic search using upper bounds on the value function. Unfortunately, no techniques exist to compute such upper bounds for problems with non-factored value functions. To allow for meaningful benchmarking through measurable quality guarantees on a very general class of problems, this paper introduces a family of influence-optimistic upper bounds for factored decentralized partially observable Markov decision processes (Dec-POMDPs) that do not have factored value functions. Intuitively, we derive bounds on very large multiagent planning problems by subdividing them in sub-problems, and at each of these sub-problems making optimistic assumptions with respect to the influence that will be exerted by the rest of the system. We numerically compare the different upper bounds and demonstrate how we can achieve a non-trivial guarantee that a heuristic solution for problems with hundreds of agents is close to optimal. Furthermore, we provide evidence that the upper bounds may improve the effectiveness of heuristic influence search, and discuss further potential applications to multiagent planning.Comment: Long version of IJCAI 2015 paper (and extended abstract at AAMAS 2015

    Experimental results : Reinforcement Learning of POMDPs using Spectral Methods

    Get PDF
    We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. While spectral methods have been previously employed for consistent learning of (passive) latent variable models such as hidden Markov models, POMDPs are more challenging since the learner interacts with the environment and possibly changes the future observations in the process. We devise a learning algorithm running through epochs, in each epoch we employ spectral techniques to learn the POMDP parameters from a trajectory generated by a fixed policy. At the end of the epoch, an optimization oracle returns the optimal memoryless planning policy which maximizes the expected reward based on the estimated POMDP model. We prove an order-optimal regret bound with respect to the optimal memoryless policy and efficient scaling with respect to the dimensionality of observation and action spaces.Comment: 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spai
    corecore