16 research outputs found

    Planning with learned ignorance-aware models

    Get PDF
    One of the goals of artificial intelligence research is to create decision-makers (i.e., agents) that improve from experience (i.e., data), collected through interaction with an environment. Models of the environment (i.e., world models) are an explicit way that agents use to represent their knowledge, enabling them to make counterfactual predictions and plans without requiring additional environment interactions. Although agents that plan with a perfect model of the environment have led to impressive demonstrations, e.g., super- human performance in board games, they are limited to problems their designer can specify a perfect model. Therefore, learning models from experience holds the promise of going beyond the scope of their designers’ reach, giving rise to a self-improving vicious circle of (i) learning a model from the past experience; (ii) planning with the learned model; and (iii) interacting with the environment, collecting new experiences. Ideally, learned models should generalise to situations beyond their training regime. Nonetheless, this is ambitious and often unrealistic when finite data is used for learning the models, leading to generally imperfect models, with which naive planning could be catastrophic in novel, out-of-training distribution situations. A more pragmatic goal is to have agents that are aware of and quantify their lack of knowledge (i.e., ignorance or epistemic uncertainty). In this thesis, we motivate and demonstrate the effectiveness of and propose novel ignorance-aware agents that plan with learned models. Naively applying powerful planning algorithms to learned models can render negative results, when the planning algorithm exploits the model imperfections in out-of-training distribution situations. This phenomenon is often termed overoptimisation and can be addressed by optimising ignorance-augmented objectives, called knowledge equivalents. We verify the validity of our ideas and methods in a number of problem settings, including learning from (i) expert demonstrations (imitation learning, §3); (ii) sub-optimal demonstrations (social learning, §4); and (iii) interacting with an environment with rewards (reinforcement learning, §5). Our empirical evidence is based on simulated autonomous driving environments, continuous control and video games from pixels and didactic small-scale grid-worlds. Throughout the thesis, we use neural networks to parameterise the (learnable) models and either use existing scalable approximate ignorance quantification deep learning methods, such as ensembles, or introduce novel planning-specific ways to quantify the agents’ ignorance. The main chapters of this thesis are based on publications (Filos et al., 2020, 2021, 2022)

    ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages

    Full text link
    In this paper, we introduce a novel method for enhancing the effectiveness of on-policy Deep Reinforcement Learning (DRL) algorithms. Current on-policy algorithms, such as Proximal Policy Optimization (PPO) and Asynchronous Advantage Actor-Critic (A3C), do not sufficiently account for cautious interaction with the environment. Our method addresses this gap by explicitly integrating cautious interaction in two critical ways: by maximizing a lower-bound on the true value function plus a constant, thereby promoting a \textit{conservative value estimation}, and by incorporating Thompson sampling for cautious exploration. These features are realized through three surprisingly simple modifications to the A3C algorithm: processing advantage estimates through a ReLU function, spectral normalization, and dropout. We provide theoretical proof that our algorithm maximizes the lower bound, which also grounds Regret Matching Policy Gradients (RMPG), a discrete-action on-policy method for multi-agent reinforcement learning. Our rigorous empirical evaluations across various benchmarks consistently demonstrates our approach's improved performance against existing on-policy algorithms. This research represents a substantial step towards more cautious and effective DRL algorithms, which has the potential to unlock application to complex, real-world problems

    Invariant Causal Prediction for Block MDPs

    Full text link
    Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges. In this paper, we consider the problem of learning abstractions that generalize in block MDPs, families of environments with a shared latent state space and dynamics structure over that latent space, but varying observations. We leverage tools from causal inference to propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting. We prove that for certain classes of environments, this approach outputs with high probability a state abstraction corresponding to the causal feature set with respect to the return. We further provide more general bounds on model error and generalization error in the multi-environment setting, in the process showing a connection between causal variable selection and the state abstraction framework for MDPs. We give empirical evidence that our methods work in both linear and nonlinear settings, attaining improved generalization over single- and multi-task baselines.Comment: Accepted to ICML 2020. 16 pages, 8 figure

    Combining Behaviors with the Successor Features Keyboard

    Full text link
    The Option Keyboard (OK) was recently proposed as a method for transferring behavioral knowledge across tasks. OK transfers knowledge by adaptively combining subsets of known behaviors using Successor Features (SFs) and Generalized Policy Improvement (GPI). However, it relies on hand-designed state-features and task encodings which are cumbersome to design for every new environment. In this work, we propose the "Successor Features Keyboard" (SFK), which enables transfer with discovered state-features and task encodings. To enable discovery, we propose the "Categorical Successor Feature Approximator" (CSFA), a novel learning algorithm for estimating SFs while jointly discovering state-features and task encodings. With SFK and CSFA, we achieve the first demonstration of transfer with SFs in a challenging 3D environment where all the necessary representations are discovered. We first compare CSFA against other methods for approximating SFs and show that only CSFA discovers representations compatible with SF&GPI at this scale. We then compare SFK against transfer learning baselines and show that it transfers most quickly to long-horizon tasks.Comment: NeurIPS 202

    QU-BraTS: MICCAI BraTS 2020 challenge on quantifying uncertainty in brain tumor segmentation -- analysis of ranking metrics and benchmarking results

    Get PDF
    Deep learning (DL) models have provided the state-of-the-art performance in a wide variety of medical imaging benchmarking challenges, including the Brain Tumor Segmentation (BraTS) challenges. However, the task of focal pathology multi-compartment segmentation (e.g., tumor and lesion sub-regions) is particularly challenging, and potential errors hinder the translation of DL models into clinical workflows. Quantifying the reliability of DL model predictions in the form of uncertainties, could enable clinical review of the most uncertain regions, thereby building trust and paving the way towards clinical translation. Recently, a number of uncertainty estimation methods have been introduced for DL medical image segmentation tasks. Developing metrics to evaluate and compare the performance of uncertainty measures will assist the end-user in making more informed decisions. In this study, we explore and evaluate a metric developed during the BraTS 2019-2020 task on uncertainty quantification (QU-BraTS), and designed to assess and rank uncertainty estimates for brain tumor multi-compartment segmentation. This metric (1) rewards uncertainty estimates that produce high confidence in correct assertions, and those that assign low confidence levels at incorrect assertions, and (2) penalizes uncertainty measures that lead to a higher percentages of under-confident correct assertions. We further benchmark the segmentation uncertainties generated by 14 independent participating teams of QUBraTS 2020, all of which also participated in the main BraTS segmentation task. Overall, our findings confirm the importance and complementary value that uncertainty estimates provide to segmentation algorithms, and hence highlight the need for uncertainty quantification in medical image analyses. Finally, in favor of transparency and reproducibility our evaluation code is made publicly available at https://github.com/RagMeh11/QU-BraTSResearch reported in this publication was partly supported by the Informatics Technology for Cancer Research (ITCR) program of the National Cancer Institute (NCI) of the National Institutes of Health (NIH), under award numbers NIH/NCI/ITCR:U01CA242871 and NIH/NCI/ITCR:U24CA189523. It was also partly supported by the National Institute of Neurological Disorders and Stroke (NINDS) of the NIH, under award number NIH/NINDS:R01NS042645.Document signat per 92 autors/autores: Raghav Mehta1 , Angelos Filos2 , Ujjwal Baid3,4,5 , Chiharu Sako3,4 , Richard McKinley6 , Michael Rebsamen6 , Katrin D¨atwyler6,53, Raphael Meier54, Piotr Radojewski6 , Gowtham Krishnan Murugesan7 , Sahil Nalawade7 , Chandan Ganesh7 , Ben Wagner7 , Fang F. Yu7 , Baowei Fei8 , Ananth J. Madhuranthakam7,9 , Joseph A. Maldjian7,9 , Laura Daza10, Catalina Gómez10, Pablo Arbeláez10, Chengliang Dai11, Shuo Wang11, Hadrien Raynaud11, Yuanhan Mo11, Elsa Angelini12, Yike Guo11, Wenjia Bai11,13, Subhashis Banerjee14,15,16, Linmin Pei17, Murat AK17, Sarahi Rosas-González18, Illyess Zemmoura18,52, Clovis Tauber18 , Minh H. Vu19, Tufve Nyholm19, Tommy L¨ofstedt20, Laura Mora Ballestar21, Veronica Vilaplana21, Hugh McHugh22,23, Gonzalo Maso Talou24, Alan Wang22,24, Jay Patel25,26, Ken Chang25,26, Katharina Hoebel25,26, Mishka Gidwani25, Nishanth Arun25, Sharut Gupta25 , Mehak Aggarwal25, Praveer Singh25, Elizabeth R. Gerstner25, Jayashree Kalpathy-Cramer25 , Nicolas Boutry27, Alexis Huard27, Lasitha Vidyaratne28, Md Monibor Rahman28, Khan M. Iftekharuddin28, Joseph Chazalon29, Elodie Puybareau29, Guillaume Tochon29, Jun Ma30 , Mariano Cabezas31, Xavier Llado31, Arnau Oliver31, Liliana Valencia31, Sergi Valverde31 , Mehdi Amian32, Mohammadreza Soltaninejad33, Andriy Myronenko34, Ali Hatamizadeh34 , Xue Feng35, Quan Dou35, Nicholas Tustison36, Craig Meyer35,36, Nisarg A. Shah37, Sanjay Talbar38, Marc-Andr Weber39, Abhishek Mahajan48, Andras Jakab47, Roland Wiest6,46 Hassan M. Fathallah-Shaykh45, Arash Nazeri40, Mikhail Milchenko140,44, Daniel Marcus40,44 , Aikaterini Kotrotsou43, Rivka Colen43, John Freymann41,42, Justin Kirby41,42, Christos Davatzikos3,4 , Bjoern Menze49,50, Spyridon Bakas∗3,4,5 , Yarin Gal∗2 , Tal Arbel∗1,51 // 1Centre for Intelligent Machines (CIM), McGill University, Montreal, QC, Canada, 2Oxford Applied and Theoretical Machine Learning (OATML) Group, University of Oxford, Oxford, England, 3Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA, USA, 4Department of Radiology, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA, 5Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA, 6Support Center for Advanced Neuroimaging (SCAN), University Institute of Diagnostic and Interventional Neuroradiology, University of Bern, Inselspital, Bern University Hospital, Bern, Switzerland, 7Department of Radiology, University of Texas Southwestern Medical Center, Dallas, TX, USA, 8Department of Bioengineering, University of Texas at Dallas, Texas, USA, 9Advanced Imaging Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, 10Universidad de los Andes, Bogotá, Colombia, 11Data Science Institute, Imperial College London, London, UK, 12NIHR Imperial BRC, ITMAT Data Science Group, Imperial College London, London, UK, 13Department of Brain Sciences, Imperial College London, London, UK, 14Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India, 15Department of CSE, University of Calcutta, Kolkata, India, 16 Division of Visual Information and Interaction (Vi2), Department of Information Technology, Uppsala University, Uppsala, Sweden, 17Department of Diagnostic Radiology, The University of Pittsburgh Medical Center, Pittsburgh, PA, USA, 18UMR U1253 iBrain, Université de Tours, Inserm, Tours, France, 19Department of Radiation Sciences, Ume˚a University, Ume˚a, Sweden, 20Department of Computing Science, Ume˚a University, Ume˚a, Sweden, 21Signal Theory and Communications Department, Universitat Politècnica de Catalunya, BarcelonaTech, Barcelona, Spain, 22Faculty of Medical and Health Sciences, University of Auckland, Auckland, New Zealand, 23Radiology Department, Auckland City Hospital, Auckland, New Zealand, 24Auckland Bioengineering Institute, University of Auckland, New Zealand, 25Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA, USA, 26Massachusetts Institute of Technology, Cambridge, MA, USA, 27EPITA Research and Development Laboratory (LRDE), France, 28Vision Lab, Electrical and Computer Engineering, Old Dominion University, Norfolk, VA 23529, USA, 29EPITA Research and Development Laboratory (LRDE), Le Kremlin-Bicˆetre, France, 30School of Science, Nanjing University of Science and Technology, 31Research Institute of Computer Vision and Robotics, University of Girona, Spain, 32Department of Electrical and Computer Engineering, University of Tehran, Iran, 33School of Computer Science, University of Nottingham, UK, 34NVIDIA, Santa Clara, CA, US, 35Biomedical Engineering, University of Virginia, Charlottesville, USA, 36Radiology and Medical Imaging, University of Virginia, Charlottesville, USA, 37Department of Electrical Engineering, Indian Institute of Technology - Jodhpur, Jodhpur, India, 38SGGS ©2021 Mehta et al.. License: CC-BY 4.0. arXiv:2112.10074v1 [eess.IV] 19 Dec 2021 Mehta et al. Institute of Engineering and Technology, Nanded, India, 39Institute of Diagnostic and Interventional Radiology, Pediatric Radiology and Neuroradiology, University Medical Center, 40Department of Radiology, Washington University, St. Louis, MO, USA, 41Leidos Biomedical Research, Inc, Frederick National Laboratory for Cancer Research, Frederick, MD, USA, 42Cancer Imaging Program, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA, 43Department of Diagnostic Radiology, University of Texas MD Anderson Cancer Center, Houston, TX, USA, 44Neuroimaging Informatics and Analysis Center, Washington University, St. Louis, MO, USA, 45Department of Neurology, The University of Alabama at Birmingham, Birmingham, AL, USA, 46Institute for Surgical Technology and Biomechanics, University of Bern, Bern, Switzerland, 47Center for MR-Research, University Children’s Hospital Zurich, Zurich, Switzerland, 48Tata Memorial Centre, Homi Bhabha National Institute, Mumbai, India, 49Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland, 50Department of Informatics, Technical University of Munich, Munich, Germany, 51MILA - Quebec Artificial Intelligence Institute, Montreal, QC, Canada, 52Neurosurgery department, CHRU de Tours, Tours, France, 53 Human Performance Lab, Schulthess Clinic, Zurich, Switzerland, 54 armasuisse S+T, Thun, Switzerland.Preprin
    corecore