12 research outputs found

    Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning

    Full text link
    The quintessential model-based reinforcement-learning agent iteratively refines its estimates or prior beliefs about the true underlying model of the environment. Recent empirical successes in model-based reinforcement learning with function approximation, however, eschew the true model in favor of a surrogate that, while ignoring various facets of the environment, still facilitates effective planning over behaviors. Recently formalized as the value equivalence principle, this algorithmic technique is perhaps unavoidable as real-world reinforcement learning demands consideration of a simple, computationally-bounded agent interacting with an overwhelmingly complex environment, whose underlying dynamics likely exceed the agent's capacity for representation. In this work, we consider the scenario where agent limitations may entirely preclude identifying an exactly value-equivalent model, immediately giving rise to a trade-off between identifying a model that is simple enough to learn while only incurring bounded sub-optimality. To address this problem, we introduce an algorithm that, using rate-distortion theory, iteratively computes an approximately-value-equivalent, lossy compression of the environment which an agent may feasibly target in lieu of the true model. We prove an information-theoretic, Bayesian regret bound for our algorithm that holds for any finite-horizon, episodic sequential decision-making problem. Crucially, our regret bound can be expressed in one of two possible forms, providing a performance guarantee for finding either the simplest model that achieves a desired sub-optimality gap or, alternatively, the best model given a limit on agent capacity.Comment: Accepted to Neural Information Processing Systems (NeurIPS) 202

    Temporal abstraction and generalisation in reinforcement learning

    Get PDF
    The ability of agents to generalise---to perform well when presented with previously unseen situations and data---is deeply important to the reliability, autonomy, and functionality of artificial intelligence systems. The generalisation test examines an agent's ability to reason over the world in an \emph{abstract} manner. In reinforcement learning problem settings, where an agent interacts continually with the environment, multiple notions of abstraction are possible. State-based abstraction allows for generalised behaviour across different \mccorrect{observations in the environment} that share similar properties. On the other hand, temporal abstraction is concerned with generalisation over an agent's own behaviour. This form of abstraction allows an agent to reason in a unified manner over different sequences of actions that may lead to similar outcomes. Data abstraction refers to the fact that agents may need to make use of information gleaned using data from one sampling distribution, while being evaluated on a different sampling distribution. This thesis develops algorithmic, theoretical, and empirical results on the questions of state abstraction, temporal abstraction, and finite-data generalisation performance for reinforcement learning algorithms. To focus on data abstraction, we explore an imitation learning setting. We provide a novel algorithm for completely offline imitation learning, as well as an empirical evaluation pipeline for offline reinforcement learning algorithms, encouraging honest and principled data complexity results and discouraging overfitting of algorithm hyperparameters to the environment on which test scores are reported. In order to more deeply explore state abstraction, we provide finite-sample analysis of target network performance---a key architectural element of deep reinforcement learning. By conducting our analysis in the fully nonlinear setting, we are able to help explain the strong performance of nonlinear neural-network based function approximation. Finally, we consider the question of temporal abstraction, providing an algorithm for semi-supervised (partially reward-free) learning of skills. This algorithm improves on the variational option discovery framework---solving a key under-specification problem in the domain---by defining skills which are specified in terms of a learned, reward-dependent state abstraction

    Generalization through the lens of learning dynamics

    Get PDF
    A machine learning (ML) system must learn not only to match the output of a target function on a training set, but also to generalize to novel situations in order to yield accurate predictions at deployment. In most practical applications, the user cannot exhaustively enumerate every possible input to the model; strong generalization performance is therefore crucial to the development of ML systems which are performant and reliable enough to be deployed in the real world. While generalization is well-understood theoretically in a number of hypothesis classes, the impressive generalization performance of deep neural networks has stymied theoreticians. In deep reinforcement learning (RL), our understanding of generalization is further complicated by the conflict between generalization and stability in widely-used RL algorithms. This thesis will provide insight into generalization by studying the learning dynamics of deep neural networks in both supervised and reinforcement learning tasks. We begin with a study of generalization in supervised learning. We propose new PAC-Bayes generalization bounds for invariant models and for models trained with data augmentation. We go on to consider more general forms of inductive bias, connecting a notion of training speed with Bayesian model selection. This connection yields a family of marginal likelihood estimators which require only sampled losses from an iterative gradient descent trajectory, and analogous performance estimators for neural networks. We then turn our attention to reinforcement learning, laying out the learning dynamics framework for the RL setting which will be leveraged throughout the remainder of the thesis. We identify a new phenomenon which we term capacity loss, whereby neural networks lose their ability to adapt to new target functions over the course of training in deep RL problems, for which we propose a novel regularization approach. Follow-up analysis studying more subtle forms of capacity loss reveals that deep RL agents are prone to memorization due to the unstructured form of early prediction targets, and highlights a solution in the form of distillation. We conclude by calling back to a different notion of invariance to that which started this thesis, presenting a novel representation learning method which promotes invariance to spurious factors of variation in the environment

    Generalization Through the Lens of Learning Dynamics

    Full text link
    A machine learning (ML) system must learn not only to match the output of a target function on a training set, but also to generalize to novel situations in order to yield accurate predictions at deployment. In most practical applications, the user cannot exhaustively enumerate every possible input to the model; strong generalization performance is therefore crucial to the development of ML systems which are performant and reliable enough to be deployed in the real world. While generalization is well-understood theoretically in a number of hypothesis classes, the impressive generalization performance of deep neural networks has stymied theoreticians. In deep reinforcement learning (RL), our understanding of generalization is further complicated by the conflict between generalization and stability in widely-used RL algorithms. This thesis will provide insight into generalization by studying the learning dynamics of deep neural networks in both supervised and reinforcement learning tasks.Comment: PhD Thesi
    corecore