318 research outputs found

    Invariant Causal Prediction for Block MDPs

    Full text link
    Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges. In this paper, we consider the problem of learning abstractions that generalize in block MDPs, families of environments with a shared latent state space and dynamics structure over that latent space, but varying observations. We leverage tools from causal inference to propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting. We prove that for certain classes of environments, this approach outputs with high probability a state abstraction corresponding to the causal feature set with respect to the return. We further provide more general bounds on model error and generalization error in the multi-environment setting, in the process showing a connection between causal variable selection and the state abstraction framework for MDPs. We give empirical evidence that our methods work in both linear and nonlinear settings, attaining improved generalization over single- and multi-task baselines.Comment: Accepted to ICML 2020. 16 pages, 8 figure

    Seeing is not Believing: Robust Reinforcement Learning against Spurious Correlation

    Full text link
    Robustness has been extensively studied in reinforcement learning (RL) to handle various forms of uncertainty such as random perturbations, rare events, and malicious attacks. In this work, we consider one critical type of robustness against spurious correlation, where different portions of the state do not have correlations induced by unobserved confounders. These spurious correlations are ubiquitous in real-world tasks, for instance, a self-driving car usually observes heavy traffic in the daytime and light traffic at night due to unobservable human activity. A model that learns such useless or even harmful correlation could catastrophically fail when the confounder in the test case deviates from the training one. Although motivated, enabling robustness against spurious correlation poses significant challenges since the uncertainty set, shaped by the unobserved confounder and causal structure, is difficult to characterize and identify. Existing robust algorithms that assume simple and unstructured uncertainty sets are therefore inadequate to address this challenge. To solve this issue, we propose Robust State-Confounded Markov Decision Processes (RSC-MDPs) and theoretically demonstrate its superiority in avoiding learning spurious correlations compared with other robust RL counterparts. We also design an empirical algorithm to learn the robust optimal policy for RSC-MDPs, which outperforms all baselines in eight realistic self-driving and manipulation tasks.Comment: Accepted to NeurIPS 202

    The Human Mental State Multiple Realizability in Silicon Thesis is False: An Argument Against Computational Theories of Consciousness

    Get PDF
    In this paper, I argue that all computational theories of consciousness (CTCs) fail. CTCs hold that the right kind of computation is sufficient for the instantiation of consciousness. Given the widely-recognized importance of his work, I will use David J. Chalmers’ Thesis of Computational Sufficiency as a paradigm case. I will argue that it fails for a reason that can be formalized as a general problem plaguing any CTC: the medium-independent properties (MIPs) constitutive of computational processes are insufficient to instantiate the medium-dependent properties (MDPs) constitutive of consciousness. MIPs, like graphemes, are properties whose causal role (e.g., symbolic meaning) does not depend upon the physical properties of the vehicles by which the relevant information is transferred (e.g., paper), while MDPs, like digestion, have causal roles (e.g., decomposition) that directly depend on physical properties of the relevant vehicles (e.g., enzymes). Since computations, as abstract descriptions, are MIPs, they must be implemented to generate MDPs. However, this makes potential implementation properties central to the feasibility of instantiating consciousness in artificial systems. The problems that arise for CTC advocates are two-fold, taxonomical and empirical. The taxonomic problem is that adverting to detailed physio-causal properties of implementation vehicles threatens to subvert the legitimacy of calling such theories “computational.” The empirical problem is the following. Given the necessary role of implementation properties, and the fact that functions supervene on structures, it follows that physical differences can legislate mental differences. After distinguishing weaker and stronger varieties of implementation requirements, and showing why a plausible CTC requires a thick theory of implementation, I will examine the implementation requirements for human consciousness. Given empirical data suggesting that consciousness depends on very specific physical properties of the brain, for which there are no known implementation surrogates, I argue that CTCs will fail to generate the relevant MDPs. I will conclude by showing why this implies that the HMSMRST is almost certainly false

    A Survey of Zero-shot Generalisation in Deep Reinforcement Learning

    Get PDF
    The study of zero-shot generalisation (ZSG) in deep Reinforcement Learning (RL) aims to produce RL algorithms whose policies generalise well to novel unseen situations at deployment time, avoiding overfitting to their training environments. Tackling this is vital if we are to deploy reinforcement learning algorithms in real world scenarios, where the environment will be diverse, dynamic and unpredictable. This survey is an overview of this nascent field. We rely on a unifying formalism and terminology for discussing different ZSG problems, building upon previous works. We go on to categorise existing benchmarks for ZSG, as well as current methods for tackling these problems. Finally, we provide a critical discussion of the current state of the field, including recommendations for future work. Among other conclusions, we argue that taking a purely procedural content generation approach to benchmark design is not conducive to progress in ZSG, we suggest fast online adaptation and tackling RL-specific problems as some areas for future work on methods for ZSG, and we recommend building benchmarks in underexplored problem settings such as offline RL ZSG and reward-function variation

    Causally-Inspired Generalizable Deep Learning Methods under Distribution Shifts

    Get PDF
    Deep learning methods achieved remarkable success in various areas of artificial intelligence, due to their powerful distribution-matching capabilities. However, these successes rely heavily on the i.i.d assumption, i.e., the data distributions in the training and test datasets are the same. In this way, current deep learning methods typically exhibit poor generalization under distribution shift, performing poorly on test data with a distribution that differs from the training data. This significantly hinders the application of deep learning methods to real-world scenarios, as the distribution of test data is not always the same as the training distribution in our rapidly evolving world. This thesis aims to discuss how to construct generalizable deep learning methods under distribution shifts. To achieve this, the thesis first models one prediction task as a structural causal model (SCM) which establishes the relationship between variables using directed acyclic graphs. In an SCM, some variables are easily changed across domains while others are not. However, deep learning methods often unintentionally mix invariant variables with easily changed variables, and thus deviate the learned model from the true one, resulting in the poor generalization ability under distribution shift. To remedy this issue, we propose specific algorithms to model such an invariant part of the SCM with deep learning methods, and experimentally show it is beneficial for the trained model to generalize well into different distributions of the same task. Last, we further propose to identify and model the variant information in the new test distribution so that we can fully adapt the trained deep learning model accordingly. We show the method can be extended for several practical applications, such as classification under label shift, image translation under semantics shift, robotics control in dynamics generalization and generalizing large language models into visual question-answer tasks
    • …
    corecore