Search CORE

315 research outputs found

Model-based and model-free learning strategies for wet clutch control

Author: De Keyser Robain
Depraetere Bruno
Dutta Abhishek
Ionescu Clara-Mihaela
Nowe Ann
Pinte Gregory
Swevers Jan
Van Vaerenbergh Kevin
Wyns Bart
Zhong Yu
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

Task complexity interacts with state-space uncertainty in the arbitration between model-based and model-free learning

Author: Kim Dongjae
Lee Sang Wan
O'Doherty John P.
Park Geon Yeong
Publication venue: Nature Publishing Group
Publication date: 16/12/2019
Field of study

It has previously been shown that the relative reliability of model-based and model-free reinforcement-learning (RL) systems plays a role in the allocation of behavioral control between them. However, the role of task complexity in the arbitration between these two strategies remains largely unknown. Here, using a combination of novel task design, computational modelling, and model-based fMRI analysis, we examined the role of task complexity alongside state-space uncertainty in the arbitration process. Participants tended to increase model-based RL control in response to increasing task complexity. However, they resorted to model-free RL when both uncertainty and task complexity were high, suggesting that these two variables interact during the arbitration process. Computational fMRI revealed that task complexity interacts with neural representations of the reliability of the two systems in the inferior prefrontal cortex

Caltech Authors

Policy Optimization with Model-based Explorations

Author: Cai Qingpeng
Da Qing
He Hualin
He Qing
Pan Chun-Xiang
Pan Feiyang
Tang Pingzhong
Zeng An-Xiang
Publication venue
Publication date: 18/11/2018
Field of study

Model-free reinforcement learning methods such as the Proximal Policy Optimization algorithm (PPO) have successfully applied in complex decision-making problems such as Atari games. However, these methods suffer from high variances and high sample complexity. On the other hand, model-based reinforcement learning methods that learn the transition dynamics are more sample efficient, but they often suffer from the bias of the transition estimation. How to make use of both model-based and model-free learning is a central problem in reinforcement learning. In this paper, we present a new technique to address the trade-off between exploration and exploitation, which regards the difference between model-free and model-based estimations as a measure of exploration value. We apply this new technique to the PPO algorithm and arrive at a new policy optimization method, named Policy Optimization with Model-based Explorations (POME). POME uses two components to predict the actions' target values: a model-free one estimated by Monte-Carlo sampling and a model-based one which learns a transition model and predicts the value of the next state. POME adds the error of these two target estimations as the additional exploration value for each state-action pair, i.e, encourages the algorithm to explore the states with larger target errors which are hard to estimate. We compare POME with PPO on Atari 2600 games, and it shows that POME outperforms PPO on 33 games out of 49 games.Comment: Accepted at AAAI-1

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Task complexity interacts with state-space uncertainty in the arbitration between model-based and model-free learning

Author: Kim Dongjae
Lee Sang Wan
O'Doherty John P.
Park Geon Yeong
Publication venue: Nature Publishing Group
Publication date: 16/12/2019
Field of study

On the Utility of Model Learning in HRI

Author: Choudhury Rohan
Dragan Anca D.
Hadfield-Menell Dylan
Swamy Gokul
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2019
Field of study

Fundamental to robotics is the debate between model-based and model-free learning: should the robot build an explicit model of the world, or learn a policy directly? In the context of HRI, part of the world to be modeled is the human. One option is for the robot to treat the human as a black box and learn a policy for how they act directly. But it can also model the human as an agent, and rely on a “theory of mind” to guide or bias the learning (grey box). We contribute a characterization of the performance of these methods under the optimistic case of having an ideal theory of mind, as well as under different scenarios in which the assumptions behind the robot's theory of mind for the human are wrong, as they inevitably will be in practice. We find that there is a significant sample complexity advantage to theory of mind methods and that they are more robust to covariate shift, but that when enough interaction data is available, black box approaches eventually dominate

Crossref

Caltech Authors

Model-free and model-based reward prediction errors in EEG

Author: Goslin Jeremy
Hardwick Ben
Sambrook Tom D
Wills Andy J
Publication venue: 'Elsevier BV'
Publication date: 01/09/2018
Field of study

Learning theorists posit two reinforcement learning systems: model-free and model-based. Model-based learning incorporates knowledge about structure and contingencies in the world to assign candidate actions with an expected value. Model-free learning is ignorant of the world’s structure; instead, actions hold a value based on prior reinforcement, with this value updated by expectancy violation in the form of a reward prediction error. Because they use such different learning mechanisms, it has been previously assumed that model-based and model-free learning are computationally dissociated in the brain. However, recent fMRI evidence suggests that the brain may compute reward prediction errors to both model-free and model-based estimates of value, signalling the possibility that these systems interact. Because of its poor temporal resolution, fMRI risks confounding reward prediction errors with other feedback-related neural activity. In the present study, EEG was used to show the presence of both model-based and model-free reward prediction errors and their place in a temporal sequence of events including state prediction errors and action value updates. This demonstration of model-based prediction errors questions a long-held assumption that model-free and model-based learning are dissociated in the brain

Crossref

Plymouth Electronic Archive and Research Library

University of East Anglia digital repository