8 research outputs found
Decentralized Control of Partially Observable Markov Decision Processes using Belief Space Macro-actions
The focus of this paper is on solving multi-robot planning problems in
continuous spaces with partial observability. Decentralized partially
observable Markov decision processes (Dec-POMDPs) are general models for
multi-robot coordination problems, but representing and solving Dec-POMDPs is
often intractable for large problems. To allow for a high-level representation
that is natural for multi-robot problems and scalable to large discrete and
continuous problems, this paper extends the Dec-POMDP model to the
decentralized partially observable semi-Markov decision process (Dec-POSMDP).
The Dec-POSMDP formulation allows asynchronous decision-making by the robots,
which is crucial in multi-robot domains. We also present an algorithm for
solving this Dec-POSMDP which is much more scalable than previous methods since
it can incorporate closed-loop belief space macro-actions in planning. These
macro-actions are automatically constructed to produce robust solutions. The
proposed method's performance is evaluated on a complex multi-robot package
delivery problem under uncertainty, showing that our approach can naturally
represent multi-robot problems and provide high-quality solutions for
large-scale problems
Efficient POMDP Forward Search by Predicting the Posterior Belief Distribution
Online, forward-search techniques have demonstrated promising results for solving problems in partially observable environments. These techniques depend on the ability to efficiently search and evaluate the set of beliefs reachable from the current belief. However, enumerating or sampling action-observation sequences to compute the reachable beliefs is computationally demanding; coupled with the need to satisfy real-time constraints, existing online solvers can only search to a limited depth. In this paper, we propose that policies can be generated directly from the distribution of the agent's posterior belief. When the underlying state distribution is Gaussian, and the observation function is an exponential family distribution, we can calculate this distribution of beliefs without enumerating the possible observations. This property not only enables us to plan in problems with large observation spaces, but also allows us to search deeper by considering policies composed of multi-step action sequences. We present the Posterior Belief Distribution (PBD) algorithm, an efficient forward-search POMDP planner for continuous domains, demonstrating that better policies are generated when we can perform deeper forward search
APPSSAT: Approximate probabilistic planning using stochastic satisfiability
AbstractWe describe appssat, an anytime probabilistic contingent planner based on zander, a probabilistic contingent planner that operates by converting the planning problem to a stochastic satisfiability (Ssat) problem and solving that problem instead [S.M. Majercik, M.L. Littman, Contingent planning under uncertainty via stochastic satisfiability, Artificial Intelligence 147 (2003) 119–162]. The values of some of the variables in an Ssat instance are probabilistically determined; appssat considers the most likely instantiations of these variables (the most probable situations facing the agent) and attempts to construct an approximation of the optimal plan that succeeds under those circumstances, improving that plan as time permits. Given more time, less likely instantiations/situations are considered and the plan is revised as necessary. In some cases, a plan constructed to address a relatively low percentage of possible situations will succeed for situations not explicitly considered as well, and may return an optimal or near-optimal plan. We describe experimental results showing that appssat can find suboptimal plans in cases in which zander is unable to find the optimal (or any) plan. Although the test problems are small, the anytime quality of appssat means that it has the potential to efficiently derive suboptimal plans in larger, time-critical domains in which zander might not have sufficient time to calculate any plan. We also suggest further work needed to bring appssat closer to attacking real-world problems
Multiscale Markov Decision Problems: Compression, Solution, and Transfer Learning
Many problems in sequential decision making and stochastic control often have
natural multiscale structure: sub-tasks are assembled together to accomplish
complex goals. Systematically inferring and leveraging hierarchical structure,
particularly beyond a single level of abstraction, has remained a longstanding
challenge. We describe a fast multiscale procedure for repeatedly compressing,
or homogenizing, Markov decision processes (MDPs), wherein a hierarchy of
sub-problems at different scales is automatically determined. Coarsened MDPs
are themselves independent, deterministic MDPs, and may be solved using
existing algorithms. The multiscale representation delivered by this procedure
decouples sub-tasks from each other and can lead to substantial improvements in
convergence rates both locally within sub-problems and globally across
sub-problems, yielding significant computational savings. A second fundamental
aspect of this work is that these multiscale decompositions yield new transfer
opportunities across different problems, where solutions of sub-tasks at
different levels of the hierarchy may be amenable to transfer to new problems.
Localized transfer of policies and potential operators at arbitrary scales is
emphasized. Finally, we demonstrate compression and transfer in a collection of
illustrative domains, including examples involving discrete and continuous
statespaces.Comment: 86 pages, 15 figure
Processos de Decisão de Markov: um tutorial
Há situações em que decisões devem ser tomadas em seqüência, e o resultado de cada decisão não é claro para o tomador de decisões. Estas situações podem ser formuladas matematicamente como processos de decisão de Markov, e dadas as probabilidades dos valores resultantes das decisões, é possÃvel determinar uma polÃtica que maximize o valor esperado da seqüência de decisões. Este tutorial descreve os processos de decisão de Markov (tanto o caso completamente observável como o parcialmente observável) e discute brevemente alguns métodos para a sua solução. Processos semi-Markovianos não são discutidos
A POMDP approach to the hide and seek game
Projecte final de Mà ster Oficial fet en col.laboració amb Institut de Robà tica i Informà tica IndustrialPartially observable Markov decision processes (POMDPs) provide an elegant
mathematical framework for modeling complex decision and planning problems
in uncertain and dynamic environments. They have been successfully applied to
various robotic tasks. The modeling advantage of POMDPs, however, comes at
a price exact methods for solving them are computationally very expensive and
thus applicable in practice only to simple problems. A major challenge is to scale
up POMDP algorithms for more complex robotic systems. Our goal is to make
an autonomous mobile robot to learn and play the children's game hide and seek
with opponent a human agent. Motion planning in uncertain and dynamic envi-
ronments is an essential capability for autonomous robots. We focus on an e cient
point-based POMDP algorithm, SARSOP, that exploits the notion of optimally
reachable belief spaces to improve computational efficiency. Moreover we explore
the mixed observability MDPs (MOMDPs) model, a special class of POMDPs.
Robotic systems often have mixed observability: even when a robots state is not
fully observable, some components of the state may still be fully observable. Ex-
ploiting this, we use the factored model, proposed in the literature, to represent
separately the fully and partially observable components of a robots state and derive a compact lower dimensional representation of its belief space. We then use
this factored representation in conjunction with the point-based algorithm to com-
pute approximate POMDP solutions. Experiments show that on our problem, the
new algorithm is many times faster than a leading point-based POMDP algorithm
without important losses in the quality of the solutio
Efficient planning under uncertainty with macro-actions
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 163-168).Planning in large, partially observable domains is challenging, especially when good performance requires considering situations far in the future. Existing planners typically construct a policy by performing fully conditional planning, where each future action is conditioned on a set of possible observations that could be obtained at every timestep. Unfortunately, fully-conditional planning can be computationally expensive, and state-of-the-art solvers are either limited in the size of problems that can be solved, or can only plan out to a limited horizon. We propose that for a large class of real-world, planning under uncertainty problems, it is necessary to perform far-lookahead decision-making, but unnecessary to construct policies that condition all actions on observations obtained at the previous timestep. Instead, these problems can be solved by performing semi conditional planning, where the constructed policy only conditions actions on observations at certain key points. Between these key points, the policy assumes that a macro-action - a temporally-extended, fixed length, open-loop action sequence, comprising a series of primitive actions, is executed. These macro-actions are evaluated within a forward-search framework, which only considers beliefs that are reachable from the agent's current belief under different actions and observations; a belief summarizes an agent's past history of actions and observations. Together, semi-conditional planning in a forward search manner restricts the policy space in exchange for conditional planning out to a longer-horizon. Two technical challenges have to be overcome in order to perform semi-conditional planning efficiently - how the macro-actions can be automatically generated, as well as how to efficiently incorporate the macro action into the forward search framework. We propose an algorithm which automatically constructs the macro-actions that are evaluated within a forward search planning framework, iteratively refining the macro actions as more computation time is made available for planning. In addition, we show that for a subset of problem domains, it is possible to analytically compute the distribution over posterior beliefs that result from a single macro-action. This ability to directly compute a distribution over posterior beliefs enables us to enjoy computational savings when performing macro-action forward search. Performance and computational analysis for the algorithms proposed in this thesis are presented, as well as simulation experiments that demonstrate superior performance relative to existing state-of-the-art solvers on large planning under uncertainty domains. We also demonstrate our planning under uncertainty algorithms on target-tracking applications for an actual autonomous helicopter, highlighting the practical potential for planning in real-world, long-horizon, partially observable domains.by Ruijie He.Ph.D