5 research outputs found

    Optimally solving Dec-POMDPs as Continuous-State MDPs: Theory and Algorithms

    Get PDF
    Decentralized partially observable Markov decision processes (Dec-POMDPs) provide a general model for decision-making under uncertainty in cooperative decentralized settings, but are difficult to solve optimally (NEXP-Complete). As a new way of solving these problems, we introduce the idea of transforming a Dec-POMDP into a continuous-state deterministic MDP with a piecewise-linear and convex value function. This approach makes use of the fact that planning can be accomplished in a centralized offline manner, while execution can still be distributed. This new Dec-POMDP formulation, which we call an occupancy MDP, allows powerful POMDP and continuous-state MDP methods to be used for the first time. When the curse of dimensionality becomes too prohibitive, we refine this basic approach and present ways to combine heuristic search and compact representations that exploit the structure present in multi-agent domains, without losing the ability to eventually converge to an optimal solution. In particular, we introduce feature-based heuristic search that relies on feature-based compact representations, point-based updates and efficient action selection. A theoretical analysis demonstrates that our feature-based heuristic search algorithms terminate in finite time with an optimal solution. We include an extensive empirical analysis using well known benchmarks, thereby demonstrating our approach provides significant scalability improvements compared to the state of the art.Les processus de décision markoviens partiellement observables décentralisés (Dec-POMDP) fournissent un modèle général pour la prise de décision dans l'incertain dans des cadres coopératifs décentralisés. En guise de nouvelle approche de résolution de ces problèmes, nous introduisons l'idée de transformer un Dec-POMDP en un MDP déterministe à espace d'états continu dont la fonction de valeur est linéaire par morceaux et convexe. Cette approche exploite le fait que la planification peut être effectuée d'une manière centralisée hors ligne, alors que l'exécution peut toujours être distribuée. Cette nouvelle formulation des Dec-POMDP, que nous appelons un occupancy MDP, permet pour la première fois d'employer de puissantes méthodes de résolution de POMDP et MDP à états continus. La malédiction de la dimensionalité devenant prohibitive, nous raffinons cette approche basique et présentons des façons de combiner la recherche heuristique et des représentations compactes qui exploitent la structure présente dans les domaines multi-agents, sans perdre la capacité de converger à terme vers une solution optimale. En particulier, nous introduisons une recherche heuristique qui repose sur des représentations compactes fondées sur des features, sur des mises-à-jour à base de points, et une sélection d'action efficace. Une analyse théorique démontre que nos algorithmes de recherche heuristique fondés sur des features se terminent en temps fini avec une solution optimale. Nous incluons une analyse empirique extensive utilisant des bancs d'essai bien connus, démontrant ainsi que notre approche améliore significativement le passage à l'échelle en comparaison de l'état de l'art

    A Novel Point-based Algorithm for Multi-agent Control Using the Common Information Approach

    Full text link
    The Common Information (CI) approach provides a systematic way to transform a multi-agent stochastic control problem to a single-agent partially observed Markov decision problem (POMDP) called the coordinator's POMDP. However, such a POMDP can be hard to solve due to its extraordinarily large action space. We propose a new algorithm for multi-agent stochastic control problems, called coordinator's heuristic search value iteration (CHSVI), that combines the CI approach and point-based POMDP algorithms for large action spaces. We demonstrate the algorithm through optimally solving several benchmark problems.Comment: 11 pages, 4 figure

    Generic Reword Model of Partially Observed Markov decision processes (POMDP) for pattern detection

    Get PDF
    Research-based on deep reinforcement learning and stochastic modelization for bottleneck phenomenon optimization is the motivation for this development, by using big data technology and IoT-based sensors. In this paper we propose a generic representation of bottleneck phenomenon who narrows (limit) the possible actions in the observed field, such as the impact of the dangerous epidemics on human activity, economic, social and many other areas, which disturb the related schedule process, where the activity threshold must be included in an interval of actions in order to not enter a bottleneck phenomenon. On the other hand, a powerful reinforcement learning model, who handle tough situations that approach real-world complexity, in this level the data of the previous level well allow a better new action that may yield higher rewards in the next transitions, as well as the precise representation of the reward during the studied situation level, allows more wisdom for the future examination

    Online risk-aware conditional planning with qualitative autonomous driving applications

    Get PDF
    Thesis: S.M., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2018.Cataloged from PDF version of thesis.Includes bibliographical references (pages 89-91).Driving is often stressful and dangerous due to uncertainty in the actions of nearby vehicles. Having the ability to model driving maneuvers qualitatively and guarantee safety bounds in uncertain traffic scenarios are two steps towards building trust in vehicle autonomy. In this thesis, we present an approach to the problem of Qualitative Autonomous Driving (QAD) using risk-bounded conditional planning. First, we present Incremental Risk-aware AO* (iRAO*), an online conditional planning algorithm that builds off of RAO* for use in larger dynamic systems like driving. An illustrative example is included to better explain the behavior and performance of the algorithm. Second, we present a Chance-Constrained Hybrid Multi-Agent MDP as a framework for modeling our autonomous vehicle in traffic scenarios using qualitative driving maneuvers. Third, we extend our driving model by adding variable duration to maneuvers and develop two approaches to the resulting complexity. We present planning results from various driving scenarios, as well as from scaled instances of the illustrative example, that show the potential for further applications. Finally, we propose a QAD system, using the different tools developed in the context of this thesis, and show how it would fit within an autonomous driving architecture.by Matthew Quinn Deyo.S.M

    Decision uncertainty minimization and autonomous information gathering

    Get PDF
    Thesis: Ph. D., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 272-283).Over the past several decades, technologies for remote sensing and exploration have become increasingly powerful but continue to face limitations in the areas of information gathering and analysis. These limitations affect technologies that use autonomous agents, which are devices that can make routine decisions independent of operator instructions. Bandwidth and other communications limitation require that autonomous differentiate between relevant and irrelevant information in a computationally efficient manner. This thesis presents a novel approach to this problem by framing it as an adaptive sensing problem. Adaptive sensing allows agents to modify their information collection strategies in response to the information gathered in real time. We developed and tested optimization algorithms that apply information guides to Monte Carlo planners. Information guides provide a mechanism by which the algorithms may blend online (realtime) and offline (previously simulated) planning in order to incorporate uncertainty into the decisionmaking process. This greatly reduces computational operations as well as decisional and communications overhead. We begin by introducing a 3-level hierarchy that visualizes adaptive sensing at synoptic (global), mesocale (intermediate) and microscale (close-up) levels (a spatial hierarchy). We then introduce new algorithms for decision uncertainty minimization (DUM) and representational uncertainty minimization (RUM). Finally, we demonstrate the utility of this approach to real-world sensing problems, including bathymetric mapping and disaster relief. We also examine its potential in space exploration tasks by describing its use in a hypothetical aerial exploration of Mars. Our ultimate goal is to facilitate future large-scale missions to extraterrestrial objects for the purposes of scientific advancement and human exploration.by Lawrence A. M. Bush.Ph. D
    corecore