Dopamine reward prediction errors reflect hidden state inference across time

Starkweather, Clara Kwon; Babayan, Benedicte M.; Uchida, Naoshige; Gershman, Samuel J.

oai:dash.harvard.edu:1/34492073

Dopamine reward prediction errors reflect hidden state inference across time

Authors: Clara Kwon Starkweather
Benedicte M. Babayan
Naoshige Uchida
Samuel J. Gershman
Publication date: 5 December 2017
Publisher: 'Springer Science and Business Media LLC'
Doi

Abstract

Midbrain dopamine neurons signal reward prediction error (RPE), or actual minus expected reward. The temporal difference (TD) learning model has been a cornerstone in understanding how dopamine RPEs could drive associative learning. Classically, TD learning imparts value to features that serially track elapsed time relative to observable stimuli. In the real world, however, sensory stimuli provide ambiguous information about the hidden state of the environment, leading to the proposal that TD learning might instead compute a value signal based on an inferred distribution of hidden states (a ‘belief state’). In this work, we asked whether dopaminergic signaling supports a TD learning framework that operates over hidden states. We found that dopamine signaling exhibited a striking difference between two tasks that differed only with respect to whether reward was delivered deterministically. Our results favor an associative learning rule that combines cached values with hidden state inference

Journal Article

Similar works

Full text

Harvard University - DASH

oai:dash.harvard.edu:1/3449207...

Last time updated on 17/04/2018Provided by our Sustaining member

This paper was published in Harvard University - DASH.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.