127 research outputs found

    Monte Carlo Tree Search Guided by Symbolic Advice for MDPs

    Get PDF
    In this paper, we consider the online computation of a strategy that aims at optimizing the expected average reward in a Markov decision process. The strategy is computed with a receding horizon and using Monte Carlo tree search (MCTS). We augment the MCTS algorithm with the notion of symbolic advice, and show that its classical theoretical guarantees are maintained. Symbolic advice are used to bias the selection and simulation strategies of MCTS. We describe how to use QBF and SAT solvers to implement symbolic advice in an efficient way. We illustrate our new algorithm using the popular game Pac-Man and show that the performances of our algorithm exceed those of plain MCTS as well as the performances of human players

    A Review of Symbolic, Subsymbolic and Hybrid Methods for Sequential Decision Making

    Full text link
    The field of Sequential Decision Making (SDM) provides tools for solving Sequential Decision Processes (SDPs), where an agent must make a series of decisions in order to complete a task or achieve a goal. Historically, two competing SDM paradigms have view for supremacy. Automated Planning (AP) proposes to solve SDPs by performing a reasoning process over a model of the world, often represented symbolically. Conversely, Reinforcement Learning (RL) proposes to learn the solution of the SDP from data, without a world model, and represent the learned knowledge subsymbolically. In the spirit of reconciliation, we provide a review of symbolic, subsymbolic and hybrid methods for SDM. We cover both methods for solving SDPs (e.g., AP, RL and techniques that learn to plan) and for learning aspects of their structure (e.g., world models, state invariants and landmarks). To the best of our knowledge, no other review in the field provides the same scope. As an additional contribution, we discuss what properties an ideal method for SDM should exhibit and argue that neurosymbolic AI is the current approach which most closely resembles this ideal method. Finally, we outline several proposals to advance the field of SDM via the integration of symbolic and subsymbolic AI

    Reinforcement Learning and Tree Search Methods for the Unit Commitment Problem

    Get PDF
    The unit commitment (UC) problem, which determines operating schedules of generation units to meet demand, is a fundamental task in power systems operation. Existing UC methods using mixed-integer programming are not well-suited to highly stochastic systems. Approaches which more rigorously account for uncertainty could yield large reductions in operating costs by reducing spinning reserve requirements; operating power stations at higher efficiencies; and integrating greater volumes of variable renewables. A promising approach to solving the UC problem is reinforcement learning (RL), a methodology for optimal decision-making which has been used to conquer long-standing grand challenges in artificial intelligence. This thesis explores the application of RL to the UC problem and addresses challenges including robustness under uncertainty; generalisability across multiple problem instances; and scaling to larger power systems than previously studied. To tackle these issues, we develop guided tree search, a novel methodology combining model-free RL and model-based planning. The UC problem is formalised as a Markov decision process and we develop an open-source environment based on real data from Great Britain's power system to train RL agents. In problems of up to 100 generators, guided tree search is shown to be competitive with deterministic UC methods, reducing operating costs by up to 1.4\%. An advantage of RL is that the framework can be easily extended to incorporate considerations important to power systems operators such as robustness to generator failure, wind curtailment or carbon prices. When generator outages are considered, guided tree search saves over 2\% in operating costs as compared with methods using conventional NxN-x reserve criteria

    Rule-Based Policy Interpretation and Shielding for Partially Observable Monte Carlo Planning

    Get PDF
    Partially Observable Monte Carlo Planning (POMCP) is a powerful online algorithm that can generate approximate policies for large Partially Observable Markov Decision Processes. The online nature of this method supports scalability by avoiding complete policy representation. However, the lack of an explicit representation of the policy hinders interpretability. In this thesis, we propose a methodology based on Maximum Satisfiability Modulo Theory (MAX-SMT) for analyzing POMCP policies by inspecting their traces, namely, sequences of belief-action pairs generated by the algorithm. The proposed method explores local properties of the policy to build a compact and informative summary of the policy behaviour. This representation exploits a high-level description encoded using logical formulas that domain experts can provide. The final formula can be used to identify unexpected decisions, namely, decisions that violate the expert indications. We show that this identification process can be used offline (to improve the explainability of the policy and to identify anomalous behaviours) or online (to shield the decisions of the POMCP algorithm). We also present an active methodology that can effectively query a POMCP policy to build more reliable descriptions quickly. We extensively evaluate our methodologies on two standard benchmarks for POMDPs, namely, emph{tiger} and emph{rocksample}, and on a problem related to velocity regulation in mobile robot navigation. Results show that our approach achieves good performance due to its capability to exploit experts' knowledge of the domains. Specifically, our approach can be used both to identify anomalous behaviours in faulty POMCPs and to improve the performance of the system by using the shielding mechanism. In the first case, we test the methodology against a state-of-the-art anomaly detection algorithm, while in the second, we compared the performance of shielded and unshielded POMCPs. We implemented our methodology in CC, and the code is open-source and available at href{https://github.com/GiuMaz/XPOMCP}{https://github.com/GiuMaz/XPOMCP}

    On the connection of probabilistic model checking, planning, and learning for system verification

    Get PDF
    This thesis presents approaches using techniques from the model checking, planning, and learning community to make systems more reliable and perspicuous. First, two heuristic search and dynamic programming algorithms are adapted to be able to check extremal reachability probabilities, expected accumulated rewards, and their bounded versions, on general Markov decision processes (MDPs). Thereby, the problem space originally solvable by these algorithms is enlarged considerably. Correctness and optimality proofs for the adapted algorithms are given, and in a comprehensive case study on established benchmarks it is shown that the implementation, called Modysh, is competitive with state-of-the-art model checkers and even outperforms them on very large state spaces. Second, Deep Statistical Model Checking (DSMC) is introduced, usable for quality assessment and learning pipeline analysis of systems incorporating trained decision-making agents, like neural networks (NNs). The idea of DSMC is to use statistical model checking to assess NNs resolving nondeterminism in systems modeled as MDPs. The versatility of DSMC is exemplified in a number of case studies on Racetrack, an MDP benchmark designed for this purpose, flexibly modeling the autonomous driving challenge. In a comprehensive scalability study it is demonstrated that DSMC is a lightweight technique tackling the complexity of NN analysis in combination with the state space explosion problem.Diese Arbeit präsentiert Ansätze, die Techniken aus dem Model Checking, Planning und Learning Bereich verwenden, um Systeme verlässlicher und klarer verständlich zu machen. Zuerst werden zwei Algorithmen für heuristische Suche und dynamisches Programmieren angepasst, um Extremwerte für Erreichbarkeitswahrscheinlichkeiten, Erwartungswerte für Kosten und beschränkte Varianten davon, auf generellen Markov Entscheidungsprozessen (MDPs) zu untersuchen. Damit wird der Problemraum, der ursprünglich mit diesen Algorithmen gelöst wurde, deutlich erweitert. Korrektheits- und Optimalitätsbeweise für die angepassten Algorithmen werden gegeben und in einer umfassenden Fallstudie wird gezeigt, dass die Implementierung, namens Modysh, konkurrenzfähig mit den modernsten Model Checkern ist und deren Leistung auf sehr großen Zustandsräumen sogar übertrifft. Als Zweites wird Deep Statistical Model Checking (DSMC) für die Qualitätsbewertung und Lernanalyse von Systemen mit integrierten trainierten Entscheidungsgenten, wie z.B. neuronalen Netzen (NN), eingeführt. Die Idee von DSMC ist es, statistisches Model Checking zur Bewertung von NNs zu nutzen, die Nichtdeterminismus in Systemen, die als MDPs modelliert sind, auflösen. Die Vielseitigkeit des Ansatzes wird in mehreren Fallbeispielen auf Racetrack gezeigt, einer MDP Benchmark, die zu diesem Zweck entwickelt wurde und die Herausforderung des autonomen Fahrens flexibel modelliert. In einer umfassenden Skalierbarkeitsstudie wird demonstriert, dass DSMC eine leichtgewichtige Technik ist, die die Komplexität der NN-Analyse in Kombination mit dem State Space Explosion Problem bewältigt

    Multi-task Hierarchical Reinforcement Learning for Compositional Tasks

    Full text link
    This thesis presents the algorithms for solve multiple compositional tasks with high sample efficiency and strong generalization ability. Central to this work is the subtask graph which models the structure in compositional tasks into a graph form. We formulate the compositional tasks as a multi-task and meta-RL problems using the subtask graph and discuss different approaches to tackle the problem. Specifically, we present four contributions, where the common idea is to exploit the inductive bias in the hierarchical task structure for efficien learning and strong generalization. The first part of the thesis formally introduces the subtask graph execution problem: a modeling of the compositional task as an multi-task RL problem where the agent is given a task description input in a graph form as an additional input. We present the hierarchical architecture where high-level policy determines the subtask to execute and low-level policy executes the given subtask. The high-level policy learns the modular neural network that can be dynamically assmbled according to the input task description to choose the optimal sequence of subtasks to maximize the reward. We demonstrate that the proposed method can achieve a strong zero-shot task generalization ability, and also improve the search efficiency of existing planning method when combined together. The second part studies the more general setting where the task structure is not available to agent such that the task should be inferred from the agent's own experience; ie, few-shot reinforcement learning setting. Specifically, we combine the meta-reinforcemenet learning with an inductive logic programming (ILP) method to explicitly infer the latent task structure in terms of subtask graph from agent's trajectory. Our empirical study shows that the underlying task structure can be accurately inferred from a small amount of environment interaction without any explicit supervision on complex 3D environments with high-dimensional state and actions space. The third contribution extends thesecond contribution by transfer-learning the prior over the task structure from training tasks to the unseen testing task to achieve a faster adaptation. Although the meta-policy learned the general exploration strategy over the distribution of tasks, the task structure was independently inferred from scratch for each task in the previous part. We overcome such limitation by modeling the prior of the tasks from the subtask graph inferred via ILP, and transfer-learning the learned prior when performing the inference of novel test tasks. To achieve this, we propose a novel prior sampling and posterior update method to incorporate the knowledge learned from the seen task that is most relevant to the current task. The last part investigates more indirect form of inductive bias that is implemented as a constraint on the trajectory rolled out by the policy in MDP. We present a theoretical result proving that the proposed constraint preserves the optimality while reducing the policy search space. Empirically, the proposed method improves the sample effciency of the policy gradient method on a wide range of challenging sparse-reward tasks. Overall, this work formulates the hierarchical structure in the compositional tasks and provides the evidences that such structure exists in many important problems. In addition, we present diverse principled approaches to exploit the inductive bias on the hierarchical structure in MDP in different problem settings and assumptions, and demonstrate the usefulness of such inductive bias when tackling compositional tasks.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169951/1/srsohn_1.pd

    Decision uncertainty minimization and autonomous information gathering

    Get PDF
    Thesis: Ph. D., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 272-283).Over the past several decades, technologies for remote sensing and exploration have become increasingly powerful but continue to face limitations in the areas of information gathering and analysis. These limitations affect technologies that use autonomous agents, which are devices that can make routine decisions independent of operator instructions. Bandwidth and other communications limitation require that autonomous differentiate between relevant and irrelevant information in a computationally efficient manner. This thesis presents a novel approach to this problem by framing it as an adaptive sensing problem. Adaptive sensing allows agents to modify their information collection strategies in response to the information gathered in real time. We developed and tested optimization algorithms that apply information guides to Monte Carlo planners. Information guides provide a mechanism by which the algorithms may blend online (realtime) and offline (previously simulated) planning in order to incorporate uncertainty into the decisionmaking process. This greatly reduces computational operations as well as decisional and communications overhead. We begin by introducing a 3-level hierarchy that visualizes adaptive sensing at synoptic (global), mesocale (intermediate) and microscale (close-up) levels (a spatial hierarchy). We then introduce new algorithms for decision uncertainty minimization (DUM) and representational uncertainty minimization (RUM). Finally, we demonstrate the utility of this approach to real-world sensing problems, including bathymetric mapping and disaster relief. We also examine its potential in space exploration tasks by describing its use in a hypothetical aerial exploration of Mars. Our ultimate goal is to facilitate future large-scale missions to extraterrestrial objects for the purposes of scientific advancement and human exploration.by Lawrence A. M. Bush.Ph. D

    Bounded Uncertainty Roadmaps

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Computer Aided Verification

    Get PDF
    The open access two-volume set LNCS 12224 and 12225 constitutes the refereed proceedings of the 32st International Conference on Computer Aided Verification, CAV 2020, held in Los Angeles, CA, USA, in July 2020.* The 43 full papers presented together with 18 tool papers and 4 case studies, were carefully reviewed and selected from 240 submissions. The papers were organized in the following topical sections: Part I: AI verification; blockchain and Security; Concurrency; hardware verification and decision procedures; and hybrid and dynamic systems. Part II: model checking; software verification; stochastic systems; and synthesis. *The conference was held virtually due to the COVID-19 pandemic
    corecore