Search CORE

49 research outputs found

Temporal-Difference Reinforcement Learning with Distributed Representations

Author: A Johnson
A Johnson
A Kacelnik
A. David Redish
AD Redish
AD Redish
AD Redish
AG Barto
AG Sanfey
AL Odum
AM Graybiel
AV Beylin
B Reynolds
CD Fiorillo
CD Fiorillo
CD Fiorillo
CR Gallistel
D Read
D Self
DC Rubin
DC Rubin
DI Laibson
DW Stephens
E Pastalkova
EA Ludvig
EA Ludvig
F Wörgötter
G Ainslie
G Ainslie
G Ainslie
G Thibaudeau
GD Stuber
GE Alexander
GE Alexander
GJ Madden
HM Bayer
HM Bayer
I Pavlov
J Gibbon
J Mazur
J Mirenowicz
J Mirenowicz
JC Jackson
JE Mazur
JER Staddon
JF Cheer
JJ Day
JP O'Doherty
JP O'Doherty
JR Hollerman
JR Norris
K Doya
K Doya
K Doya
K Doya
K Samejima
K Samejima
M Bertin
M Kawato
MF Roitman
N Schweighofer
N Schweighofer
N Schweighofer
ND Daw
ND Daw
ND Daw
ND Daw
NJ Mackintosh
NM Petry
Olaf Sporns
P Brémaud
P Dayan
P Dayan
PD Sozou
PEM Phillips
PL Strick
PR Montague
PR Solomon
PS Kaplan
R Bellman
RA Rescorla
RE Suri
RE Suri
RE Vuchinich
RJ Herrnstein
RM Wightman
RN Cardinal
RS Sutton
RS Sutton
RS Zemel
S Kakade
SC Tanaka
SC Tanaka
SH Mitchell
SJ Badtke
SM Alessi
SM McClure
SN Haber
T Das
T Kalenscher
T Ljungberg
TJ Shors
W Schultz
W Schultz
W Schultz
W Schultz
W Schultz
W Schultz
WB Levy
WB Levy
WX Pan
Y Niv
Zeb Kurth-Nelson
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Temporal-difference (TD) algorithms have been proposed as models of reinforcement learning (RL). We examine two issues of distributed representation in these TD algorithms: distributed representations of belief and distributed discounting factors. Distributed representation of belief allows the believed state of the world to distribute across sets of equivalent states. Distributed exponential discounting factors produce hyperbolic discounting in the behavior of the agent itself. We examine these issues in the context of a TD RL model in which state-belief is distributed over a set of exponentially-discounting “micro-Agents”, each of which has a separate discounting factor (γ). Each µAgent maintains an independent hypothesis about the state of the world, and a separate value-estimate of taking actions within that hypothesized state. The overall agent thus instantiates a flexible representation of an evolving world-state. As with other TD models, the value-error (δ) signal within the model matches dopamine signals recorded from animals in standard conditioning reward-paradigms. The distributed representation of belief provides an explanation for the decrease in dopamine at the conditioned stimulus seen in overtrained animals, for the differences between trace and delay conditioning, and for transient bursts of dopamine seen at movement initiation. Because each µAgent also includes its own exponential discounting factor, the overall agent shows hyperbolic discounting, consistent with behavioral experiments

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

UCL Discovery

Nonspecific synaptic plasticity improves the recognition of sparse patterns degraded by local noise

Author: A Badura
A Philippides
A Radulescu
A Spanne
AL Person
B Babadi
C Clopath
CD Harvey
CD Harvey
CD Wilms
D Marr
D Willshaw
DJ Linden
DJ Willshaw
DJ Willshaw
E De Schutter
E De Schutter
EA Rancz
F Engert
F Johansson
F Johansson
F Sultan
FP Chabrol
G Billings
G Neves
H Künzle
H Ogasawara
J Mapelli
JH Shin
JP Nadal
JP Nadal
JT Walter
K Shibuki
K Shibuki
M Casado
M Chistiakova
M Iino
M Ito
M Ito
M Palkovits
M Schonewille
ML Hines
N Brunel
N Gutierrez-Castellanos
N Schweighofer
NA Hartell
P Chadderton
P Dayan
P Dean
P Husbands
RM Napper
S Ganguli
S Namiki
SR Ott
SS Wang
T Reynolds
T Tyrrell
V Lev-Ram
V Lev-Ram
V Lev-Ram
V Steuber
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/04/2017
Field of study

Safaryan, K. et al. Nonspecific synaptic plasticity improves the recognition of sparse patterns degraded by local noise. Sci. Rep. 7, 46550; doi: 10.1038/srep46550 (2017). This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ © The Author(s) 2017.Many forms of synaptic plasticity require the local production of volatile or rapidly diffusing substances such as nitric oxide. The nonspecific plasticity these neuromodulators may induce at neighboring non-active synapses is thought to be detrimental for the specificity of memory storage. We show here that memory retrieval may benefit from this non-specific plasticity when the applied sparse binary input patterns are degraded by local noise. Simulations of a biophysically realistic model of a cerebellar Purkinje cell in a pattern recognition task show that, in the absence of noise, leakage of plasticity to adjacent synapses degrades the recognition of sparse static patterns. However, above a local noise level of 20 %, the model with nonspecific plasticity outperforms the standard, specific model. The gain in performance is greatest when the spatial distribution of noise in the input matches the range of diffusion-induced plasticity. Hence non-specific plasticity may offer a benefit in noisy environments or when the pressure to generalize is strong.Peer reviewe

Crossref

University of Hertfordshire Research Archive

Recommended from our members

Modelling negative feedback networks for activating transcription factor 3 predicts a dominant role for miRNAs in immediate early gene regulation

Author: A Baccarini
A Clerk
A Clerk
A Giraldo
AE Pasquinelli
AK Marshall
Angela Clerk
B Schweighofer
B Uzonyi
CD Wolfgang
CW Ni
D Lu
E Amirak
G Liang
GJ Babu
GP Sapkota
H Zhou
JG Harrison
JR Woodgett
JT Mendell
K Tamura
Marcus J. Tindall
MP Gantier
MR Thompson
PH Sugden
PH Sugden
PV Nazarov
R Khanin
RA Kennedy
RD Everett
Satoru Miyano
SB McMahon
SI Mayer
SP Davies
SQ Wu
T Hai
T Hai
T Nagashima
TE Cullingford
Y Cai
Y Okamoto
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/05/2014
Field of study

Activating transcription factor 3 (Atf3) is rapidly and transiently upregulated in numerous systems, and is associated with various disease states. Atf3 is required for negative feedback regulation of other genes, but is itself subject to negative feedback regulation possibly by autorepression. In cardiomyocytes, Atf3 and Egr1 mRNAs are upregulated via ERK1/2 signalling and Atf3 suppresses Egr1 expression. We previously developed a mathematical model for the Atf3-Egr1 system. Here, we adjusted and extended the model to explore mechanisms of Atf3 feedback regulation. Introduction of an autorepressive loop for Atf3 tuned down its expression and inhibition of Egr1 was lost, demonstrating that negative feedback regulation of Atf3 by Atf3 itself is implausible in this context. Experimentally, signals downstream from ERK1/2 suppress Atf3 expression. Mathematical modelling indicated that this cannot occur by phosphorylation of pre-existing inhibitory transcriptional regulators because the time delay is too short. De novo synthesis of an inhibitory transcription factor (ITF) with a high affinity for the Atf3 promoter could suppress Atf3 expression, but (as with the Atf3 autorepression loop) inhibition of Egr1 was lost. Developing the model to include newly-synthesised miRNAs very efficiently terminated Atf3 protein expression and, with a 4-fold increase in the rate of degradation of mRNA from the mRNA/miRNA complex, profiles for Atf3 mRNA, Atf3 protein and Egr1 mRNA approximated to the experimental data. Combining the ITF model with that of the miRNA did not improve the profiles suggesting that miRNAs are likely to play a dominant role in switching off Atf3 expression post-induction

Central Archive at the University of Reading

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Effects of a robot-assisted training of grasp and pronation/supination in chronic stroke: a pilot study

Author: AM Krylow
AR Fugl-Meyer
BT Volpe
BT Volpe
C Butefisch
C Collin
C Winstein
CD Takahashi
Chee Leong Teo
Christopher WK Kuah
DJ Reinkensmeyer
DJ Wilson
E Burdet
Etienne Burdet
G Kwakkel
HI Krebs
Hong Yun
J Mehrholz
J van Der Lee
JH Carr
JR Carey
Karen SG Chua
KO Grice
Ludovic Dovat
N Hogan
N Schweighofer
O Lambercy
O Lambercy
O Lambercy
Olivier Lambercy
P Vasquez
PS Lum
Roger Gassert
RW Bohannon
RW Teasell
S Barreca
S Hesse
SE Fasoli
SE Fasoli
Seng Kwee Wee
T Nef
Theodore E Milner
V Mathiowetz
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Rehabilitation of hand function is challenging, and only few studies have investigated robot-assisted rehabilitation focusing on distal joints of the upper limb. This paper investigates the feasibility of using the <it>HapticKnob</it>, a table-top end-effector device, for robot-assisted rehabilitation of grasping and forearm pronation/supination, two important functions for activities of daily living involving the hand, and which are often impaired in chronic stroke patients. It evaluates the effectiveness of this device for improving hand function and the transfer of improvement to arm function. Methods A single group of fifteen chronic stroke patients with impaired arm and hand functions (Fugl-Meyer motor assessment scale (FM) 10-45/66) participated in a 6-week 3-hours/week rehabilitation program with the <it>HapticKnob</it>. Outcome measures consisted primarily of the FM and Motricity Index (MI) and their respective subsections related to distal and proximal arm function, and were assessed at the beginning, end of treatment and in a 6-weeks follow-up. Results Thirteen subjects successfully completed robot-assisted therapy, with significantly improved hand and arm motor functions, demonstrated by an average 3.00 points increase on the FM and 4.55 on the MI at the completion of the therapy (4.85 FM and 6.84 MI six weeks post-therapy). Improvements were observed both in distal and proximal components of the clinical scales at the completion of the study (2.00 FM wrist/hand, 2.55 FM shoulder/elbow, 2.23 MI hand and 4.23 MI shoulder/elbow). In addition, improvements in hand function were observed, as measured by the Motor Assessment Scale, grip force, and a decrease in arm muscle spasticity. These results were confirmed by motion data collected by the robot. Conclusions The results of this study show the feasibility of this robot-assisted therapy with patients presenting a large range of impairment levels. A significant homogeneous improvement in both hand and arm function was observed, which was maintained 6 weeks after end of the therapy.</p

Repository for Publications and Research Data

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ScholarBank@NUS

Interaction between Purkinje Cells and Inhibitory Interneurons May Create Adjustable Output Waveforms to Generate Timed Cerebellar Output

Author: A Gruart
A Rancillac
AR Gibson
AR Gibson
B Rhodes
BD Armstrong
BE McKay
BG Schreurs
BG Schreurs
BG Schreurs
BG Schreurs
C Batini
C Chen
C de'Sperati
C Hoang
C Sekirnjak
CD Aizenman
CH Yeo
CI De Zeeuw
D Aksenov
D Bullock
D Heck
D Marr
DA McCormick
DA McCormick
DG Lavond
DG Lavond
DJ Krupa
DJ Krupa
DL Ringach
DS Woodruff-Pak
DZ Wetmore
EA Finch
ES Boyden
F Jow
F Santamaria
FA Miles
FK Hoehler
G Chen
G Hesslow
GT Kenyon
H Jorntell
J Albus
J Hamori
J Steinmetz
JA Kleim
JC Callaway
JC Fiala
JE Desmond
JE Slemmer
JE Steinmetz
JF Medina
JM Billard
JR Huguenard
JR Pugh
JS Choi
JT Green
K Detweiler
K Takehara
Karl J. Friston
KM Horn
L Kreiner
Lance M. Optican
LL Sears
M Casado
M Coesmans
M Hausser
M Ito
M Stopfer
MD Mauk
MD Mauk
MD Womack
ME Scheibel
MH Karakossian
MJ Hardiman
N Ramnani
N Schneiderman
N Schweighofer
N Schweighofer
N Schweighofer
NL Cerminara
P Chadderton
P Svensson
P Svensson
P Svensson
PG Shinkman
R Feil
R Gellman
R Llinas
R Llinas
RD Traub
RF Thompson
RF Thompson
S Bao
S Hong
S Kotani
S Kotani
Simon Hong
SJ Kim
SP Perrett
T Doi
T Ohyama
TJ Gould
V Lev-Ram
V Steuber
W Mittmann
Y Manor
Y Shinoda
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

We develop a new model that explains how the cerebellum may generate the timing in classical delay eyeblink conditioning. Recent studies show that both Purkinje cells (PCs) and inhibitory interneurons (INs) have parallel signal processing streams with two time scales: an AMPA receptor-mediated fast process and a metabotropic glutamate receptor (mGluR)-mediated slow process. Moreover, one consistent finding is an increased excitability of PC dendrites (in Larsell's lobule HVI) in animals when they acquire the classical delay eyeblink conditioning naturally, in contrast to in vitro studies, where learning involves long-term depression (LTD). Our model proposes that the delayed response comes from the slow dynamics of mGluR-mediated IP3 activation, and the ensuing calcium concentration change, and not from LTP/LTD. The conditioned stimulus (tone), arriving on the parallel fibers, triggers this slow activation in INs and PC spines. These excitatory (from PC spines) and inhibitory (from INs) signals then interact at the PC dendrites to generate variable waveforms of PC activation. When the unconditioned stimulus (puff), arriving on the climbing fibers, is coupled frequently with this slow activation the waveform is amplified (due to an increased excitability) and leads to a timed pause in the PC population. The disinhibition of deep cerebellar nuclei by this timed pause causes the delayed conditioned response. This suggested PC-IN interaction emphasizes a richer role of the INs in learning and also conforms to the recent evidence that mGluR in the cerebellar cortex may participate in slow motor execution. We show that the suggested mechanism can endow the cerebellar cortex with the versatility to learn almost any temporal pattern, in addition to those that arise in classical conditioning

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Structural Analysis of a Peptide Fragment of Transmembrane Transporter Protein Bilitranslocase

Using a combination of genomic and post-genomic approaches is rapidly altering the number of identified human influx carriers. A transmembrane protein bilitranslocase (TCDB 2.A.65) has long attracted attention because of its function as an organic anion carrier. It has also been identified as a potential membrane transporter for cellular uptake of several drugs and due to its implication in drug uptake, it is extremely important to advance the knowledge about its structure. However, at present, only the primary structure of bilitranslocase is known. In our work, transmembrane subunits of bilitranslocase were predicted by a previously developed chemometrics model and the stability of these polypeptide chains were studied by molecular dynamics (MD) simulation. Furthermore, sodium dodecyl sulfate (SDS) micelles were used as a model of cell membrane and herein we present a high-resolution 3D structure of an 18 amino acid residues long peptide corresponding to the third transmembrane part of bilitranslocase obtained by use of multidimensional NMR spectroscopy. It has been experimentally confirmed that one of the transmembrane segments of bilitranslocase has alpha helical structure with hydrophilic amino acid residues oriented towards one side, thus capable of forming a channel in the membrane

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Modeling the Violation of Reward Maximization and Invariance in Reinforcement Schedules

Author: A Kacelnik
A Tversky
A Tversky
B De Martino
B Lau
B Marsh
Barry J. Richmond
CD Fiorillo
D Joel
D Kahneman
DE Bell
DM Egelman
EM Bowman
G La Camera
G La Camera
Giancarlo La Camera
HA Simon
HE Atallah
HR Arkes
HR Arkes
J O'Doherty
JH Zar
JM Simmons
JM Simmons
JM Simmons
JW Dickson
K Samejima
Karl J. Friston
KJ Arrow
KN Kirby
KR Janmaat
L Pompilio
LA Marascuilo
LD Brown
LJ Savage
LP Sugrue
M Haruno
M Pessiglione
M Shidara
M Shidara
N Schweighofer
N So
ND Daw
P Dayan
P Dayan
P Dayan
P Dayan
PJ Schoemaker
PL Meyer
PR Montague
R Thaler
RS Sutton
RS Sutton
S Kobayashi
S Ravel
SM McClure
W Schultz
WX Pan
Y Niv
Y Niv
Y Sugase-Miyamoto
Z Liu
Z Liu
Z Liu
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

It is often assumed that animals and people adjust their behavior to maximize reward acquisition. In visually cued reinforcement schedules, monkeys make errors in trials that are not immediately rewarded, despite having to repeat error trials. Here we show that error rates are typically smaller in trials equally distant from reward but belonging to longer schedules (referred to as “schedule length effect”). This violates the principles of reward maximization and invariance and cannot be predicted by the standard methods of Reinforcement Learning, such as the method of temporal differences. We develop a heuristic model that accounts for all of the properties of the behavior in the reinforcement schedule task but whose predictions are not different from those of the standard temporal difference model in choice tasks. In the modification of temporal difference learning introduced here, the effect of schedule length emerges spontaneously from the sensitivity to the immediately preceding trial. We also introduce a policy for general Markov Decision Processes, where the decision made at each node is conditioned on the motivation to perform an instrumental action, and show that the application of our model to the reinforcement schedule task and the choice task are special cases of this general theoretical framework. Within this framework, Reinforcement Learning can approach contextual learning with the mixture of empirical findings and principled assumptions that seem to coexist in the best descriptions of animal behavior. As examples, we discuss two phenomena observed in humans that often derive from the violation of the principle of invariance: “framing,” wherein equivalent options are treated differently depending on the context in which they are presented, and the “sunk cost” effect, the greater tendency to continue an endeavor once an investment in money, effort, or time has been made. The schedule length effect might be a manifestation of these phenomena in monkeys

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

The degree of segmental aneuploidy measured by total copy number abnormalities predicts survival and recurrence in superficial gastroesophageal adenocarcinoma

Author: A Janssen
AH Marx
AJ Lee
AM Dulak
AM Dulak
B Fu
BJ Reid
CA Ong
CD Schweighofer
Christin M. Sciulli
CM Jacobsen
D Choma
D Hanahan
D Hirsch
DJ Nancarrow
DP Cahill
ER Thompson
F Boudreau
G Ng
GA Prasad
George K. Michalopoulos
GJ Kops
H Lango Allen
H Pohl
HJ Stein
J Gu
J Theisen
J. Michael Krill-Burger
James D. Luketich
JC Saldivar
JO Korbel
Jon M. Davison
Katie S. Nason
L Lin
LA Lai
LO Baumbusch
Lori A. Kelly
M Greaves
M Sarbia
M Schweigert
M Tuefferd
Maureen A. Lyons-Weiler
MB Menke-Pluymers
Melissa Yee
N Deng
N McGranahan
NJ Birkbak
NS Chang
O Pech
OM Radu
PJ Stephens
R Beroukhim
R Roylance
S Bayraktar
S Li
SE Araujo
SL Carter
Svetlana Pack
T Nakamura
TM Kim
TW Rice
TW Rice
VI Gaidzik
William A. LaFramboise
WJ Blot
WJ Kent
XY Goh
YJ Bang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/01/2014
Field of study

Background: Prognostic biomarkers are needed for superficial gastroesophageal adenocarcinoma (EAC) to predict clinical outcomes and select therapy. Although recurrent mutations have been characterized in EAC, little is known about their clinical and prognostic significance. Aneuploidy is predictive of clinical outcome in many malignancies but has not been evaluated in superficial EAC. Methods: We quantified copy number changes in 41 superficial EAC using Affymetrix SNP 6.0 arrays. We identified recurrent chromosomal gains and losses and calculated the total copy number abnormality (CNA) count for each tumor as a measure of aneuploidy. We correlated CNA count with overall survival and time to first recurrence in univariate and multivariate analyses. Results: Recurrent segmental gains and losses involved multiple genes, including: HER2, EGFR, MET, CDK6, KRAS (recurrent gains); and FHIT, WWOX, CDKN2A/B, SMAD4, RUNX1 (recurrent losses). There was a 40-fold variation in CNA count across all cases. Tumors with the lowest and highest quartile CNA count had significantly better overall survival (p = 0.032) and time to first recurrence (p = 0.010) compared to those with intermediate CNA counts. These associations persisted when controlling for other prognostic variables. Significance: SNP arrays facilitate the assessment of recurrent chromosomal gain and loss and allow high resolution, quantitative assessment of segmental aneuploidy (total CNA count). The non-monotonic association of segmental aneuploidy with survival has been described in other tumors. The degree of aneuploidy is a promising prognostic biomarker in a potentially curable form of EAC. © 2014 Davison et al

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

D-Scholarship@Pitt

FigShare

Determinants of synaptic integration and heterogeneity in rebound firing explored with data-driven models of deep cerebellar nucleus cells

Author: A Destexhe
A Destexhe
A Traboulsie
AA Prinz
AK Seth
AL Taylor
AM Brown
AM Castelfranco
AR Gibson
B Hille
BJ Fredette
C Alzheimer
C Gunay
C Quaia
C Rivera
CD Aizenman
CD Aizenman
CM Pedroarena
D Anchisi
D Jaeger
D Jaeger
D Timmann
DA Robinson
Dieter Jaeger
E Schutter De
EA Stern
EJ Lang
EP Gardner
Erik De Schutter
F Sultan
G Baranauskas
G Baranauskas
GYY Shen
H Daniel
H Jahnsen
H Jahnsen
HC Pape
HHC Lee
HP Goodkin
IM Raman
J Magistretti
JC Houk
JD Schmahmann
JF Kleine
JF Medina
JM Bower
JM Bower
JR Pugh
JR Pugh
JS Choi
JS Rothman
K Alvina
K Alvina
K Hepp
L Zhu
LK Purvis
M Ito
M Palkovits
M Uusisaari
MG Paulin
ML Molineux
MM Jagodic
N Schweighofer
N Zheng
Nathan W. Schultheiss
NC Rowland
P Achard
P Aracri
P Telgkamp
PL Kan van
R Gardette
R Gardette
R Llinas
R Muri
R Surges
R Tadayonnejad
R. Angus Silver
RB Ivry
S Shin
S Çavdar
T Otsuka
TG Banke
V Gauck
V Gauck
V Gauck
V Steuber
V Steuber
Volker Steuber
W Rall
WA MacKay
Publication venue: Springer US
Publication date: 01/01/2010
Field of study

Significant inroads have been made to understand cerebellar cortical processing but neural coding at the output stage of the cerebellum in the deep cerebellar nuclei (DCN) remains poorly understood. The DCN are unlikely to just present a relay nucleus because Purkinje cell inhibition has to be turned into an excitatory output signal, and DCN neurons exhibit complex intrinsic properties. In particular, DCN neurons exhibit a range of rebound spiking properties following hyperpolarizing current injection, raising the question how this could contribute to signal processing in behaving animals. Computer modeling presents an ideal tool to investigate how intrinsic voltage-gated conductances in DCN neurons could generate the heterogeneous firing behavior observed, and what input conditions could result in rebound responses. To enable such an investigation we built a compartmental DCN neuron model with a full dendritic morphology and appropriate active conductances. We generated a good match of our simulations with DCN current clamp data we recorded in acute slices, including the heterogeneity in the rebound responses. We then examined how inhibitory and excitatory synaptic input interacted with these intrinsic conductances to control DCN firing. We found that the output spiking of the model reflected the ongoing balance of excitatory and inhibitory input rates and that changing the level of inhibition performed an additive operation. Rebound firing following strong Purkinje cell input bursts was also possible, but only if the chloride reversal potential was more negative than −70 mV to allow de-inactivation of rebound currents. Fast rebound bursts due to T-type calcium current and slow rebounds due to persistent sodium current could be differentially regulated by synaptic input, and the pattern of these rebounds was further influenced by HCN current. Our findings suggest that active properties of DCN neurons could play a crucial role for signal processing in the cerebellum

Crossref

Springer - Publisher Connector

PubMed Central

UCL Discovery

Institutional Repository Universiteit Antwerpen

University of Hertfordshire Research Archive

An Imperfect Dopaminergic Error Signal Can Drive Temporal-Difference Learning

Author: A Barto
A Garthe
A Hanuschkin
A Soltani
A Soltani
AA Prinz
Abigail Morrison
AG Barto
B Porr
B Seymour
B Seymour
BI Hyland
BJ Knowlton
CA Paladini
CD Fiorillo
D Baras
D Joel
DC Dennett
DJ Foster
DZ Jin
E Brazhnik
E Nordlie
E Vasilaki
EA Ludvig
EM Izhikevich
F Wörgötter
G La Camera
G Morris
G Morris
GS Berns
HC Tuckwell
HE Attalah
HM Bayer
HS Seung
IH Witten
J Brown
J O'Doherty
J Wickens
J Yacubian
JC Horvitz
JC Houk
JC Houk
JC Houk
JC Houk
JJ Hopfield
JL Contreras-Vidal
JM Tepper
JN Reynolds
JNJ Reynolds
K Doya
K Gurney
KD Sethi
KJ Friston
M Dai
M Helias
M Matsumoto
M Matsumoto
M Pessiglione
MA Farries
MA Häusser
Markus Diesmann
MD Humphries
MO Gewaltig
N Bowery
N Frémaux
N Schweighofer
ND Daw
O Arias-Carrion
P Calabresi
P Dayan
P Dayan
P Dayan
P Montague
P Redgrave
P Redgrave
PA Garris
PN Tobler
PR Montague
PR Montague
R Legenstein
R Suri
R VanRullen
RC Froemke
RE Suri
RE Suri
RJ McDonald
RJ Steele
RPN Rao
RS Sutton
RS Sutton
RV Florian
S Fusi
S Pecina
S Schrader
S Sugita
SM Reynolds
SM Reynolds
T Ljungberg
T Nakano
Tim Behrens
V Pawlak
V Pawlak
W Potjans
W Potjans
W Schultz
W Schultz
W Schultz
Wiebke Potjans
X Xie
Y Loewenstein
Y Niv
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

An open problem in the field of computational neuroscience is how to link synaptic plasticity to system-level learning. A promising framework in this context is temporal-difference (TD) learning. Experimental evidence that supports the hypothesis that the mammalian brain performs temporal-difference learning includes the resemblance of the phasic activity of the midbrain dopaminergic neurons to the TD error and the discovery that cortico-striatal synaptic plasticity is modulated by dopamine. However, as the phasic dopaminergic signal does not reproduce all the properties of the theoretical TD error, it is unclear whether it is capable of driving behavior adaptation in complex tasks. Here, we present a spiking temporal-difference learning model based on the actor-critic architecture. The model dynamically generates a dopaminergic signal with realistic firing rates and exploits this signal to modulate the plasticity of synapses as a third factor. The predictions of our proposed plasticity dynamics are in good agreement with experimental results with respect to dopamine, pre- and post-synaptic activity. An analytical mapping from the parameters of our proposed plasticity dynamics to those of the classical discrete-time TD algorithm reveals that the biological constraints of the dopaminergic signal entail a modified TD algorithm with self-adapting learning parameters and an adapting offset. We show that the neuronal network is able to learn a task with sparse positive rewards as fast as the corresponding classical discrete-time TD algorithm. However, the performance of the neuronal network is impaired with respect to the traditional algorithm on a task with both positive and negative rewards and breaks down entirely on a task with purely negative rewards. Our model demonstrates that the asymmetry of a realistic dopaminergic signal enables TD learning when learning is driven by positive rewards but not when driven by negative rewards

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Publikationsserver der RWTH Aachen University

Juelich Shared Electronic Resources