10 research outputs found

    Toward Interpretable Deep Reinforcement Learning with Linear Model U-Trees

    Full text link
    Deep Reinforcement Learning (DRL) has achieved impressive success in many applications. A key component of many DRL models is a neural network representing a Q function, to estimate the expected cumulative reward following a state-action pair. The Q function neural network contains a lot of implicit knowledge about the RL problems, but often remains unexamined and uninterpreted. To our knowledge, this work develops the first mimic learning framework for Q functions in DRL. We introduce Linear Model U-trees (LMUTs) to approximate neural network predictions. An LMUT is learned using a novel on-line algorithm that is well-suited for an active play setting, where the mimic learner observes an ongoing interaction between the neural net and the environment. Empirical evaluation shows that an LMUT mimics a Q function substantially better than five baseline methods. The transparent tree structure of an LMUT facilitates understanding the network's learned knowledge by analyzing feature influence, extracting rules, and highlighting the super-pixels in image inputs.Comment: This paper is accepted by ECML-PKDD 201

    Identifying Critical States by the Action-Based Variance of Expected Return

    Full text link
    The balance of exploration and exploitation plays a crucial role in accelerating reinforcement learning (RL). To deploy an RL agent in human society, its explainability is also essential. However, basic RL approaches have difficulties in deciding when to choose exploitation as well as in extracting useful points for a brief explanation of its operation. One reason for the difficulties is that these approaches treat all states the same way. Here, we show that identifying critical states and treating them specially is commonly beneficial to both problems. These critical states are the states at which the action selection changes the potential of success and failure substantially. We propose to identify the critical states using the variance in the Q-function for the actions and to perform exploitation with high probability on the identified states. These simple methods accelerate RL in a grid world with cliffs and two baseline tasks of deep RL. Our results also demonstrate that the identified critical states are intuitively interpretable regarding the crucial nature of the action selection. Furthermore, our analysis of the relationship between the timing of the identification of especially critical states and the rapid progress of learning suggests there are a few especially critical states that have important information for accelerating RL rapidly.Comment: 12 pages, 6 figure

    Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback

    Get PDF
    We propose a method to capture the handling abilities of fast jet pilots in a software model via reinforcement learning (RL) from human preference feedback. We use pairwise preferences over simulated flight trajectories to learn an interpretable rule-based model called a reward tree, which enables the automated scoring of trajectories alongside an explanatory rationale. We train an RL agent to execute high-quality handling behaviour by using the reward tree as the objective, and thereby generate data for iterative preference collection and further refinement of both tree and agent. Experiments with synthetic preferences show reward trees to be competitive with uninterpretable neural network reward models on quantitative and qualitative evaluations

    Redefining Counterfactual Explanations for Reinforcement Learning: Overview, Challenges and Opportunities

    Full text link
    While AI algorithms have shown remarkable success in various fields, their lack of transparency hinders their application to real-life tasks. Although explanations targeted at non-experts are necessary for user trust and human-AI collaboration, the majority of explanation methods for AI are focused on developers and expert users. Counterfactual explanations are local explanations that offer users advice on what can be changed in the input for the output of the black-box model to change. Counterfactuals are user-friendly and provide actionable advice for achieving the desired output from the AI system. While extensively researched in supervised learning, there are few methods applying them to reinforcement learning (RL). In this work, we explore the reasons for the underrepresentation of a powerful explanation method in RL. We start by reviewing the current work in counterfactual explanations in supervised learning. Additionally, we explore the differences between counterfactual explanations in supervised learning and RL and identify the main challenges that prevent the adoption of methods from supervised in reinforcement learning. Finally, we redefine counterfactuals for RL and propose research directions for implementing counterfactuals in RL.Comment: 32 pages, 6 figure

    Extracting tactics learned from self-play in general games

    Get PDF
    Local, spatial state-action features can be used to effectively train linear policies from self-play in a wide variety of board games. Such policies can play games directly, or be used to bias tree search agents. However, the resulting feature sets can be large, with a significant amount of overlap and redundancies between features. This is a problem for two reasons. Firstly, large feature sets can be computationally expensive, which reduces the playing strength of agents based on them. Secondly, redundancies and correlations between features impair the ability for humans to analyse, interpret, or understand tactics learned by the policies. We look towards decision trees for their ability to perform feature selection, and serve as interpretable models. Previous work on distilling policies into decision trees uses states as inputs, and distributions over the complete action space as outputs. In contrast, we propose and evaluate a variety of decision tree types, which take state-action pairs as inputs, and provide various different types of outputs on a per-action basis. An empirical evaluation over 43 different board games is presented, and two of those games are used as case studies where we attempt to interpret the discovered features

    Policy Extraction via Online Q-Value Distillation

    Get PDF
    Recently, deep neural networks have been capable of solving complex control tasks in certain challenging environments. However, these deep learning policies continue to be hard to interpret, explain and verify which limits their practical applicability. Decision Trees lend themselves well to explanation and verification tools but are not easy to train especially in an online fashion. The aim of this thesis is to explore online tree construction algorithms and demonstrate the technique and effectiveness of distilling reinforcement learning policies into a Bayesian tree structure. We introduce Q-BSP Trees and an Ordered Sequential Monte Carlo training algorithm that helps condense the Q-function from fully trained Deep Q-Networks into the tree structure. QBSP Forests generate partitioning rules that transparently reconstruct the value function for all possible states. It convincingly beats performance benchmarks provided by earlier policy distillation methods resulting in performance closest to the original Deep Learning policy

    Systematic Approaches for Telemedicine and Data Coordination for COVID-19 in Baja California, Mexico

    Get PDF
    Conference proceedings info: ICICT 2023: 2023 The 6th International Conference on Information and Computer Technologies Raleigh, HI, United States, March 24-26, 2023 Pages 529-542We provide a model for systematic implementation of telemedicine within a large evaluation center for COVID-19 in the area of Baja California, Mexico. Our model is based on human-centric design factors and cross disciplinary collaborations for scalable data-driven enablement of smartphone, cellular, and video Teleconsul-tation technologies to link hospitals, clinics, and emergency medical services for point-of-care assessments of COVID testing, and for subsequent treatment and quar-antine decisions. A multidisciplinary team was rapidly created, in cooperation with different institutions, including: the Autonomous University of Baja California, the Ministry of Health, the Command, Communication and Computer Control Center of the Ministry of the State of Baja California (C4), Colleges of Medicine, and the College of Psychologists. Our objective is to provide information to the public and to evaluate COVID-19 in real time and to track, regional, municipal, and state-wide data in real time that informs supply chains and resource allocation with the anticipation of a surge in COVID-19 cases. RESUMEN Proporcionamos un modelo para la implementaci贸n sistem谩tica de la telemedicina dentro de un gran centro de evaluaci贸n de COVID-19 en el 谩rea de Baja California, M茅xico. Nuestro modelo se basa en factores de dise帽o centrados en el ser humano y colaboraciones interdisciplinarias para la habilitaci贸n escalable basada en datos de tecnolog铆as de teleconsulta de tel茅fonos inteligentes, celulares y video para vincular hospitales, cl铆nicas y servicios m茅dicos de emergencia para evaluaciones de COVID en el punto de atenci贸n. pruebas, y para el tratamiento posterior y decisiones de cuarentena. R谩pidamente se cre贸 un equipo multidisciplinario, en cooperaci贸n con diferentes instituciones, entre ellas: la Universidad Aut贸noma de Baja California, la Secretar铆a de Salud, el Centro de Comando, Comunicaciones y Control Inform谩tico. de la Secretar铆a del Estado de Baja California (C4), Facultades de Medicina y Colegio de Psic贸logos. Nuestro objetivo es proporcionar informaci贸n al p煤blico y evaluar COVID-19 en tiempo real y rastrear datos regionales, municipales y estatales en tiempo real que informan las cadenas de suministro y la asignaci贸n de recursos con la anticipaci贸n de un aumento de COVID-19. 19 casos.ICICT 2023: 2023 The 6th International Conference on Information and Computer Technologieshttps://doi.org/10.1007/978-981-99-3236-
    corecore