Search CORE

2,489 research outputs found

How attention and reinforcers jointly optimize the associations between sensory representations, working memory and motor programs

Author: Jaldert O Rombouts
Pieter R Roelfsema
PR Roelfsema
RS Sutton
Sander M Bohte
W Schultz
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

CWI's Institutional Repository

Directory of Open Access Journals

PubMed Central

How is a sensory stimulus represented in ongoing dynamics in the barrel cortex?

Author: B Haider
Elena Phoka
EM Izhikevich
Mark Wildie
Mauricio Barahona
Rasmus S Petersen
RS Petersen
S Lefort
Simon R Schultz
T Kenet
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Reinforcement learning or active inference?

This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

Mathematical properties of neuronal TD-rules and differential Hebbian learning: a comparison

Author: A Arleo
A Barto
A Saudargiene
AH Klopf
AH Klopf
B Porr
B Porr
B Porr
Bernd Porr
Christoph Kolodziejski
CJCH Watkins
CL Hull
CL Hull
DJ Foster
F Wörgötter
Florentin Wörgötter
GQ Bi
H Markram
IH Witten
JC Magee
JD Miller
JL Krichmar
JP Pfister
LP Kaelbling
M Tsukamoto
P Dayan
P Dayan
P Dayan
P Manoonpong
P Roberts
PR Montague
PR Montague
R Sutton
RE Suri
RE Suri
RE Suri
RE Suri
RE Suri
RS Sutton
RS Sutton
RS Sutton
RV Florian
SP Singh
T Kulvicius
T Strösslin
TB Boykina
W Gerstner
W Schultz
W Schultz
Y Humeau
Publication venue: Springer-Verlag
Publication date: 01/01/2008
Field of study

A confusingly wide variety of temporally asymmetric learning rules exists related to reinforcement learning and/or to spike-timing dependent plasticity, many of which look exceedingly similar, while displaying strongly different behavior. These rules often find their use in control tasks, for example in robotics and for this rigorous convergence and numerical stability is required. The goal of this article is to review these rules and compare them to provide a better overview over their different properties. Two main classes will be discussed: temporal difference (TD) rules and correlation based (differential hebbian) rules and some transition cases. In general we will focus on neuronal implementations with changeable synaptic weights and a time-continuous representation of activity. In a machine learning (non-neuronal) context, for TD-learning a solid mathematical theory has existed since several years. This can partly be transfered to a neuronal framework, too. On the other hand, only now a more complete theory has also emerged for differential Hebb rules. In general rules differ by their convergence conditions and their numerical stability, which can lead to very undesirable behavior, when wanting to apply them. For TD, convergence can be enforced with a certain output condition assuring that the δ-error drops on average to zero (output control). Correlation based rules, on the other hand, converge when one input drops to zero (input control). Temporally asymmetric learning rules treat situations where incoming stimuli follow each other in time. Thus, it is necessary to remember the first stimulus to be able to relate it to the later occurring second one. To this end different types of so-called eligibility traces are being used by these two different types of rules. This aspect leads again to different properties of TD and differential Hebbian learning as discussed here. Thus, this paper, while also presenting several novel mathematical results, is mainly meant to provide a road map through the different neuronally emulated temporal asymmetrical learning rules and their behavior to provide some guidance for possible applications

Crossref

Springer - Publisher Connector

PubMed Central

Enlighten

Towards a General Theory of Neural Computation Based on Prediction by Single Neurons

Author: A Dickinson
AL Fairhall
B Hille
CC Bell
CC Bell
Christopher D. Fiorillo
D Marr
D Oliver
DM Dacey
EL Bienenstock
EP Simoncelli
ET Jaynes
F Attneave
F Palmieri
F Rieke
G Rainer
G Turrigiano
H von Helmholz
HB Barlow
HB Barlow
HW Tao
J Golowasch
J Hawkins
JNJ Reynolds
JS Anderson
KS Park
LI Zhang
LI Zhang
LJ Toth
M Wehr
MF Bear
MV Srinivasan
NS Desai
Olaf Sporns
P Dayan
PR Montague
R Fettiplace
R Malinow
RPN Rao
RS Sutton
RS Sutton
T Hosoya
W Schultz
W Schultz
Publication venue: Public Library of Science
Publication date: 01/10/2008
Field of study

Although there has been tremendous progress in understanding the mechanics of the nervous system, there has not been a general theory of its computational function. Here I present a theory that relates the established biophysical properties of single generic neurons to principles of Bayesian probability theory, reinforcement learning and efficient coding. I suggest that this theory addresses the general computational problem facing the nervous system. Each neuron is proposed to mirror the function of the whole system in learning to predict aspects of the world related to future reward. According to the model, a typical neuron receives current information about the state of the world from a subset of its excitatory synaptic inputs, and prior information from its other inputs. Prior information would be contributed by synaptic inputs representing distinct regions of space, and by different types of non-synaptic, voltage-regulated channels representing distinct periods of the past. The neuron's membrane voltage is proposed to signal the difference between current and prior information (“prediction error” or “surprise”). A neuron would apply a Hebbian plasticity rule to select those excitatory inputs that are the most closely correlated with reward but are the least predictable, since unpredictable inputs provide the neuron with the most “new” information about future reward. To minimize the error in its predictions and to respond only when excitation is “new and surprising,” the neuron selects amongst its prior information sources through an anti-Hebbian rule. The unique inputs of a mature neuron would therefore result from learning about spatial and temporal patterns in its local environment, and by extension, the external world. Thus the theory describes how the structure of the mature nervous system could reflect the structure of the external world, and how the complexity and intelligence of the system might develop from a population of undifferentiated neurons, each implementing similar learning algorithms

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

A Cognitive Architecture Based on a Learning Classifier System with Spiking Classifiers

Author: AG Pipe
AL Hodgkin
AW Churchill
C Fernando
C Fernando
D Floreano
D Rumelhart
David Howard
G Buzsaki
H Hagras
H Shouval
I Rechenberg
J Hurst
J Hurst
JA Boyan
JH Holland
JH Holland
JH Holland
JY Donnart
L Bull
Larry Bull
M Dorigo
O Michel
Pier-Luca Lanzi
R Preen
RD Beer
RS Sutton
RS Sutton
S Fauer
SR Quartz
SW Wilson
W Gerstner
W Maass
W Schultz
WM Kistler
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

© 2015, Springer Science+Business Media New York. Learning classifier systems (LCS) are population-based reinforcement learners that were originally designed to model various cognitive phenomena. This paper presents an explicitly cognitive LCS by using spiking neural networks as classifiers, providing each classifier with a measure of temporal dynamism. We employ a constructivist model of growth of both neurons and synaptic connections, which permits a genetic algorithm to automatically evolve sufficiently-complex neural structures. The spiking classifiers are coupled with a temporally-sensitive reinforcement learning algorithm, which allows the system to perform temporal state decomposition by appropriately rewarding “macro-actions”, created by chaining together multiple atomic actions. The combination of temporal reinforcement learning and neural information processing is shown to outperform benchmark neural classifier systems, and successfully solve a robotic navigation task

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

UWE Bristol Research Repository

Temporal-Difference Reinforcement Learning with Distributed Representations

Author: A Johnson
A Johnson
A Kacelnik
A. David Redish
AD Redish
AD Redish
AD Redish
AG Barto
AG Sanfey
AL Odum
AM Graybiel
AV Beylin
B Reynolds
CD Fiorillo
CD Fiorillo
CD Fiorillo
CR Gallistel
D Read
D Self
DC Rubin
DC Rubin
DI Laibson
DW Stephens
E Pastalkova
EA Ludvig
EA Ludvig
F Wörgötter
G Ainslie
G Ainslie
G Ainslie
G Thibaudeau
GD Stuber
GE Alexander
GE Alexander
GJ Madden
HM Bayer
HM Bayer
I Pavlov
J Gibbon
J Mazur
J Mirenowicz
J Mirenowicz
JC Jackson
JE Mazur
JER Staddon
JF Cheer
JJ Day
JP O'Doherty
JP O'Doherty
JR Hollerman
JR Norris
K Doya
K Doya
K Doya
K Doya
K Samejima
K Samejima
M Bertin
M Kawato
MF Roitman
N Schweighofer
N Schweighofer
N Schweighofer
ND Daw
ND Daw
ND Daw
ND Daw
NJ Mackintosh
NM Petry
Olaf Sporns
P Brémaud
P Dayan
P Dayan
PD Sozou
PEM Phillips
PL Strick
PR Montague
PR Solomon
PS Kaplan
R Bellman
RA Rescorla
RE Suri
RE Suri
RE Vuchinich
RJ Herrnstein
RM Wightman
RN Cardinal
RS Sutton
RS Sutton
RS Zemel
S Kakade
SC Tanaka
SC Tanaka
SH Mitchell
SJ Badtke
SM Alessi
SM McClure
SN Haber
T Das
T Kalenscher
T Ljungberg
TJ Shors
W Schultz
W Schultz
W Schultz
W Schultz
W Schultz
W Schultz
WB Levy
WB Levy
WX Pan
Y Niv
Zeb Kurth-Nelson
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Temporal-difference (TD) algorithms have been proposed as models of reinforcement learning (RL). We examine two issues of distributed representation in these TD algorithms: distributed representations of belief and distributed discounting factors. Distributed representation of belief allows the believed state of the world to distribute across sets of equivalent states. Distributed exponential discounting factors produce hyperbolic discounting in the behavior of the agent itself. We examine these issues in the context of a TD RL model in which state-belief is distributed over a set of exponentially-discounting “micro-Agents”, each of which has a separate discounting factor (γ). Each µAgent maintains an independent hypothesis about the state of the world, and a separate value-estimate of taking actions within that hypothesized state. The overall agent thus instantiates a flexible representation of an evolving world-state. As with other TD models, the value-error (δ) signal within the model matches dopamine signals recorded from animals in standard conditioning reward-paradigms. The distributed representation of belief provides an explanation for the decrease in dopamine at the conditioned stimulus seen in overtrained animals, for the differences between trace and delay conditioning, and for transient bursts of dopamine seen at movement initiation. Because each µAgent also includes its own exponential discounting factor, the overall agent shows hyperbolic discounting, consistent with behavioral experiments

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

UCL Discovery

Defect Characterization in SiGe/SOI Epitaxial Semiconductors by Positron Annihilation

Author: A Calloni
A Dupasquier
A Uedono
A Veen van
A Vehanen
A. Calloni
A. Dupasquier
AR Powell
C Rosenblad
FK LeGoues
G Brauer
G. Isella
M Bruel
M Rummukainen
MH White
P Asoka-Kumar
P Folegati
PJ Schultz
R Krause-Rehberg
R. Ferragut
RS Brusa
S Szpala
SM Sze
Publication venue: Springer
Publication date: 01/01/2010
Field of study

The potential of positron annihilation spectroscopy (PAS) for defect characterization at the atomic scale in semiconductors has been demonstrated in thin multilayer structures of SiGe (50 nm) grown on UTB (ultra-thin body) SOI (silicon-on-insulator). A slow positron beam was used to probe the defect profile. The SiO2/Si interface in the UTB-SOI was well characterized, and a good estimation of its depth has been obtained. The chemical analysis indicates that the interface does not contain defects, but only strongly localized charged centers. In order to promote the relaxation, the samples have been submitted to a post-growth annealing treatment in vacuum. After this treatment, it was possible to observe the modifications of the defect structure of the relaxed film. Chemical analysis of the SiGe layers suggests a prevalent trapping site surrounded by germanium atoms, presumably Si vacancies associated with misfit dislocations and threading dislocations in the SiGe films

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Directory of Open Access Journals

PubMed Central

Functional MRI in Awake Unrestrained Dogs

Author: AN Hampton
Andrew M. Brooks
B Hare
B Hare
B Knutson
BW Balleine
CKR Willis
E Teglas
FP Leite
G Chen
GA Keliris
GK Aguirre
Gregory S. Berns
GS Berns
J Assheuer
J Bradshaw
J Kaminski
J O'Doherty
JP O'Doherty
L Stefanacci
M Bekoff
MAR Udell
Mark Spivak
MR Delgado
NK Logothetis
P Kulkarni
P Shipman
RM Wightman
RS Sutton
S Stoewer
SM McClure
Stephan C. F. Neuhauss
TC Saveraid
W Schultz
W Schultz
X Palazzi
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Because of dogs' prolonged evolution with humans, many of the canine cognitive skills are thought to represent a selection of traits that make dogs particularly sensitive to human cues. But how does the dog mind actually work? To develop a methodology to answer this question, we trained two dogs to remain motionless for the duration required to collect quality fMRI images by using positive reinforcement without sedation or physical restraints. The task was designed to determine which brain circuits differentially respond to human hand signals denoting the presence or absence of a food reward. Head motion within trials was less than 1 mm. Consistent with prior reinforcement learning literature, we observed caudate activation in both dogs in response to the hand signal denoting reward versus no-reward

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare