8,692 research outputs found

    Probabilistic inference for determining options in reinforcement learning

    Get PDF
    Tasks that require many sequential decisions or complex solutions are hard to solve using conventional reinforcement learning algorithms. Based on the semi Markov decision process setting (SMDP) and the option framework, we propose a model which aims to alleviate these concerns. Instead of learning a single monolithic policy, the agent learns a set of simpler sub-policies as well as the initiation and termination probabilities for each of those sub-policies. While existing option learning algorithms frequently require manual specification of components such as the sub-policies, we present an algorithm which infers all relevant components of the option framework from data. Furthermore, the proposed approach is based on parametric option representations and works well in combination with current policy search methods, which are particularly well suited for continuous real-world tasks. We present results on SMDPs with discrete as well as continuous state-action spaces. The results show that the presented algorithm can combine simple sub-policies to solve complex tasks and can improve learning performance on simpler tasks

    Hierarchical Decomposition of Nonlinear Dynamics and Control for System Identification and Policy Distillation

    Full text link
    The control of nonlinear dynamical systems remains a major challenge for autonomous agents. Current trends in reinforcement learning (RL) focus on complex representations of dynamics and policies, which have yielded impressive results in solving a variety of hard control tasks. However, this new sophistication and extremely over-parameterized models have come with the cost of an overall reduction in our ability to interpret the resulting policies. In this paper, we take inspiration from the control community and apply the principles of hybrid switching systems in order to break down complex dynamics into simpler components. We exploit the rich representational power of probabilistic graphical models and derive an expectation-maximization (EM) algorithm for learning a sequence model to capture the temporal structure of the data and automatically decompose nonlinear dynamics into stochastic switching linear dynamical systems. Moreover, we show how this framework of switching models enables extracting hierarchies of Markovian and auto-regressive locally linear controllers from nonlinear experts in an imitation learning scenario.Comment: 2nd Annual Conference on Learning for Dynamics and Contro

    Bayesian Learning Models of Pain: A Call to Action

    Get PDF
    Learning is fundamentally about action, enabling the successful navigation of a changing and uncertain environment. The experience of pain is central to this process, indicating the need for a change in action so as to mitigate potential threat to bodily integrity. This review considers the application of Bayesian models of learning in pain that inherently accommodate uncertainty and action, which, we shall propose are essential in understanding learning in both acute and persistent cases of pain

    Determining a Role for Ventromedial Prefrontal Cortex in Encoding Action-Based Value Signals During Reward-Related Decision Making

    Get PDF
    Considerable evidence has emerged to implicate ventromedial prefrontal cortex in encoding expectations of future reward during value-based decision making. However, the nature of the learned associations upon which such representations depend is much less clear. Here, we aimed to determine whether expected reward representations in this region could be driven by action–outcome associations, rather than being dependent on the associative value assigned to particular discriminative stimuli. Subjects were scanned with functional magnetic resonance imaging while performing 2 variants of a simple reward-related decision task. In one version, subjects made choices between 2 different physical motor responses in the absence of discriminative stimuli, whereas in the other version, subjects chose between 2 different stimuli that were randomly assigned to different responses on a trial-by-trial basis. Using an extension of a reinforcement learning algorithm, we found activity in ventromedial prefrontal cortex tracked expected future reward during the action-based task as well as during the stimulus-based task, indicating that value representations in this region can be driven by action–outcome associations. These findings suggest that ventromedial prefrontal cortex may play a role in encoding the value of chosen actions irrespective of whether those actions denote physical motor responses or more abstract decision options

    DAC: The Double Actor-Critic Architecture for Learning Options

    Full text link
    We reformulate the option framework as two parallel augmented MDPs. Under this novel formulation, all policy optimization algorithms can be used off the shelf to learn intra-option policies, option termination conditions, and a master policy over options. We apply an actor-critic algorithm on each augmented MDP, yielding the Double Actor-Critic (DAC) architecture. Furthermore, we show that, when state-value functions are used as critics, one critic can be expressed in terms of the other, and hence only one critic is necessary. We conduct an empirical study on challenging robot simulation tasks. In a transfer learning setting, DAC outperforms both its hierarchy-free counterpart and previous gradient-based option learning algorithms.Comment: NeurIPS 201
    • …
    corecore