11 research outputs found

    Stochastic Shortest Path with Energy Constraints in POMDPs

    Full text link
    We consider partially observable Markov decision processes (POMDPs) with a set of target states and positive integer costs associated with every transition. The traditional optimization objective (stochastic shortest path) asks to minimize the expected total cost until the target set is reached. We extend the traditional framework of POMDPs to model energy consumption, which represents a hard constraint. The energy levels may increase and decrease with transitions, and the hard constraint requires that the energy level must remain positive in all steps till the target is reached. First, we present a novel algorithm for solving POMDPs with energy levels, developing on existing POMDP solvers and using RTDP as its main method. Our second contribution is related to policy representation. For larger POMDP instances the policies computed by existing solvers are too large to be understandable. We present an automated procedure based on machine learning techniques that automatically extracts important decisions of the policy allowing us to compute succinct human readable policies. Finally, we show experimentally that our algorithm performs well and computes succinct policies on a number of POMDP instances from the literature that were naturally enhanced with energy levels.Comment: Technical report accompanying a paper published in proceedings of AAMAS 201

    Influence Diagrams With Memory States: Representation and Algorithms

    Get PDF
    Abstract. Influence diagrams (IDs) offer a powerful framework for decision making under uncertainty, but their applicability has been hindered by the exponential growth of runtime and memory usage—largely due to the no-forgetting assumption. We present a novel way to maintain a limited amount of memory to inform each decision and still obtain near-optimal policies. The approach is based on augmenting the graphical model with memory states that represent key aspects of previous observations—a method that has proved useful in POMDP solvers. We also derive an efficient EM-based message-passing algorithm to compute the policy. Experimental results show that this approach produces highquality approximate polices and offers better scalability than existing methods.

    Processos de Decisão de Markov: um tutorial

    Get PDF
    Há situações em que decisões devem ser tomadas em seqüência, e o resultado de cada decisão não é claro para o tomador de decisões. Estas situações podem ser formuladas matematicamente como processos de decisão de Markov, e dadas as probabilidades dos valores resultantes das decisões, é possível determinar uma política que maximize o valor esperado da seqüência de decisões. Este tutorial descreve os processos de decisão de Markov (tanto o caso completamente observável como o parcialmente observável) e discute brevemente alguns métodos para a sua solução. Processos semi-Markovianos não são discutidos

    On the relationship between satisfiability and partially observable Markov decision processes

    Get PDF
    Stochastic satisfiability (SSAT), Quantified Boolean Satisfiability (QBF) and decision-theoretic planning in finite horizon partially observable Markov decision processes (POMDPs) are all PSPACE-Complete problems. Since they are all complete for the same complexity class, I show how to convert them into one another in polynomial time and space. I discuss various properties of each encoding and how they get translated into equivalent constructs in the other encodings. An important lesson of these reductions is that the states in SSAT and flat POMDPs do not match. Therefore, comparing the scalability of satisfiability and flat POMDP solvers based on the size of the state spaces they can tackle is misleading. A new SSAT solver called SSAT-Prime is proposed and implemented. It includes improvements to watch literals, component caching and detecting symmetries with upper and lower bounds under certain conditions. SSAT-Prime is compared against a state of the art solver for probabilistic inference and a native POMDP solver on challenging benchmarks

    Efficient model learning for dialog management

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 118-122).Partially Observable Markov Decision Processes (POMDPs) have succeeded in many planning domains because they can optimally trade between actions that will increase an agent's knowledge about its environment and actions that will increase an agent's reward. However, POMDPs are defined with a large number of parameters which are difficult to specify from domain knowledge, and gathering enough data to specify the parameters a priori may be expensive. This work develops several efficient algorithms for learning the POMDP parameters online and demonstrates them on dialog manager for a robotic wheelchair. In particular, we show how a combination of specialized queries ("meta-actions") can enable us to create a robust dialog manager that avoids the pitfalls in other POMDP-learning approaches. The dialog manager's ability to reason about its uncertainty -- and take advantage of low-risk opportunities to reduce that uncertainty -- leads to more robust policy learning.by Final Doshi.S.M

    Value of information in decision systems

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Operational Decision Making under Uncertainty: Inferential, Sequential, and Adversarial Approaches

    Get PDF
    Modern security threats are characterized by a stochastic, dynamic, partially observable, and ambiguous operational environment. This dissertation addresses such complex security threats using operations research techniques for decision making under uncertainty in operations planning, analysis, and assessment. First, this research develops a new method for robust queue inference with partially observable, stochastic arrival and departure times, motivated by cybersecurity and terrorism applications. In the dynamic setting, this work develops a new variant of Markov decision processes and an algorithm for robust information collection in dynamic, partially observable and ambiguous environments, with an application to a cybersecurity detection problem. In the adversarial setting, this work presents a new application of counterfactual regret minimization and robust optimization to a multi-domain cyber and air defense problem in a partially observable environment

    Timed model-based programming : executable specifications for robust mission-critical sequences

    Get PDF
    Thesis (Sc. D.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2003.Includes bibliographical references (p. 195-204).There is growing demand for high-reliability embedded systems that operate robustly and autonomously in the presence of tight real-time constraints. For robotic spacecraft, robust plan execution is essential during time-critical mission sequences, due to the very short time available for recovery from anomalies. Traditional approaches to encoding these sequences can lead to brittle behavior under off-nominal execution conditions, due to the high level of complexity in the control specification required to manage the complex spacecraft system interactions. This work describes timed model-based programming, a novel approach for encoding and robustly executing mission-critical spacecraft sequences. The timed model-based programming approach addresses the issues of sequence complexity and unanticipated low-level system interactions by allowing control programs to directly read or write "hidden" states of the plant, that is, states that are not directly observable or controllable. It is then the responsibility of the program's execution kernel to map between hidden states and the plant sensors and control variables. This mapping is performed automatically by a deductive controller using a common-sense plant model, freeing the programmer from the error-prone process of reasoning through a complex set of interactions under a range of possible failure situations. Time is central to the execution of mission-critical sequences; a robust executive must consider time in its control and behavior models, in addition to reactively managing complexity.(cont.) In timed model-based programming, control programs express goals and constraints in terms of both system state and time. Plant models capture the underlying behavior of the system components, including nominal and off-nominal modes, probabilistic transitions, and timed effects such as state transition latency. The contributions of this work are threefold. First, a semantic specification of the timed model-based programming approach is provided. The execution semantics of a timed model-based program are defined in terms of legal state evolutions of a physical plant, represented as a factored Partially Observable Semi-Markov Decision Process. The second contribution is the definition of graphical and textual languages for encoding timed control programs and plant models. The adoption of a visual programming paradigm allows timed model-based programs to be specified and readily inspected by the systems engineers in charge of designing the mission-critical sequences. The third contribution is the development of a Timed Model-based Executive, which takes as input a timed control program and executes it, using timed plant models to track states, diagnose faults and generate control actions. The Timed Model-based Executive has been implemented and demonstrated on a representative spacecraft scenario for Mars entry, descent and landing.by Michel Donald Ingham.Sc.D
    corecore