Search CORE

8,692 research outputs found

Probabilistic inference for determining options in reinforcement learning

Author: Christian Daniel
Christopher M Bishop
CJCH Watkins
E Theodorou
Gerhard Neumann
Herke van Hoof
J Morimoto
Jan Peters
LE Baum
M Lagoudakis
ML Puterman
RS Sutton
TG Dietterich
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Tasks that require many sequential decisions or complex solutions are hard to solve using conventional reinforcement learning algorithms. Based on the semi Markov decision process setting (SMDP) and the option framework, we propose a model which aims to alleviate these concerns. Instead of learning a single monolithic policy, the agent learns a set of simpler sub-policies as well as the initiation and termination probabilities for each of those sub-policies. While existing option learning algorithms frequently require manual specification of components such as the sub-policies, we present an algorithm which infers all relevant components of the option framework from data. Furthermore, the proposed approach is based on parametric option representations and works well in combination with current policy search methods, which are particularly well suited for continuous real-world tasks. We present results on SMDPs with discrete as well as continuous state-action spaces. The results show that the presented algorithm can combine simple sub-policies to solve complex tasks and can improve learning performance on simpler tasks

University of Lincoln Institutional Repository

TUbiblio

Crossref

MPG.PuRe

Hierarchical Decomposition of Nonlinear Dynamics and Control for System Identification and Policy Distillation

Author: Abdulsamad Hany
Peters Jan
Publication venue
Publication date: 01/01/2020
Field of study

The control of nonlinear dynamical systems remains a major challenge for autonomous agents. Current trends in reinforcement learning (RL) focus on complex representations of dynamics and policies, which have yielded impressive results in solving a variety of hard control tasks. However, this new sophistication and extremely over-parameterized models have come with the cost of an overall reduction in our ability to interpret the resulting policies. In this paper, we take inspiration from the control community and apply the principles of hybrid switching systems in order to break down complex dynamics into simpler components. We exploit the rich representational power of probabilistic graphical models and derive an expectation-maximization (EM) algorithm for learning a sequence model to capture the temporal structure of the data and automatically decompose nonlinear dynamics into stochastic switching linear dynamical systems. Moreover, we show how this framework of switching models enables extracting hierarchies of Markovian and auto-regressive locally linear controllers from nonlinear experts in an imitation learning scenario.Comment: 2nd Annual Conference on Learning for Dynamics and Contro

arXiv.org e-Print Archive

MPG.PuRe

Bayesian Learning Models of Pain: A Call to Action

Author: Burr Christopher
Tabor Abby
Publication venue
Publication date: 01/01/2018
Field of study

Learning is fundamentally about action, enabling the successful navigation of a changing and uncertain environment. The experience of pain is central to this process, indicating the need for a change in action so as to mitigate potential threat to bodily integrity. This review considers the application of Bayesian models of learning in pain that inherently accommodate uncertainty and action, which, we shall propose are essential in understanding learning in both acute and persistent cases of pain

PhilPapers

OPUS

UWE Bristol Research Repository

Oxford University Research Archive

Explore Bristol Research

Determining a Role for Ventromedial Prefrontal Cortex in Encoding Action-Based Value Signals During Reward-Related Decision Making

Author: Alan N. Hampton
Balleine
Bechara
Blair
Clark
Cools
Daw
Daw
Deichmann
Dorris
Friston
Hampton
Hampton
Hampton
Hoshi
Jan Gläscher
John P. O'Doherty
Kable
Kennerley
Kim
Kirkpatrick
Knutson
Matsumoto
Matsumoto
Montague
Montague
O'Doherty
O'Doherty
Ostlund
Ostlund
Padoa-Schioppa
Paton
Picard
Platt
Rushworth
Rushworth
Schoenbaum
Schoenbaum
Seo
Seymour
Seymour
Shima
Sohn
Sugrue
Sutton
Thorpe
Tremblay
Tremblay
Tzourio-Mazoyer
Valentin
Volz
von Neumann
Wallis
Walton
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2009
Field of study

Considerable evidence has emerged to implicate ventromedial prefrontal cortex in encoding expectations of future reward during value-based decision making. However, the nature of the learned associations upon which such representations depend is much less clear. Here, we aimed to determine whether expected reward representations in this region could be driven by action–outcome associations, rather than being dependent on the associative value assigned to particular discriminative stimuli. Subjects were scanned with functional magnetic resonance imaging while performing 2 variants of a simple reward-related decision task. In one version, subjects made choices between 2 different physical motor responses in the absence of discriminative stimuli, whereas in the other version, subjects chose between 2 different stimuli that were randomly assigned to different responses on a trial-by-trial basis. Using an extension of a reinforcement learning algorithm, we found activity in ventromedial prefrontal cortex tracked expected future reward during the action-based task as well as during the stimulus-based task, indicating that value representations in this region can be driven by action–outcome associations. These findings suggest that ventromedial prefrontal cortex may play a role in encoding the value of chosen actions irrespective of whether those actions denote physical motor responses or more abstract decision options

DAC: The Double Actor-Critic Architecture for Learning Options

Author: Whiteson Shimon
Zhang Shangtong
Publication venue
Publication date: 11/09/2019
Field of study

We reformulate the option framework as two parallel augmented MDPs. Under this novel formulation, all policy optimization algorithms can be used off the shelf to learn intra-option policies, option termination conditions, and a master policy over options. We apply an actor-critic algorithm on each augmented MDP, yielding the Double Actor-Critic (DAC) architecture. Furthermore, we show that, when state-value functions are used as critics, one critic can be expressed in terms of the other, and hence only one critic is necessary. We conduct an empirical study on challenging robot simulation tasks. In a transfer learning setting, DAC outperforms both its hierarchy-free counterpart and previous gradient-based option learning algorithms.Comment: NeurIPS 201

arXiv.org e-Print Archive

Oxford University Research Archive