Search CORE

Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions

Author: A Arleo
B Espiau
C Breazeal
CJ Watkins
DJ Foster
F Chaumette
F Chaumette
F Wörgötter
Florentin Wörgötter
G Tesauro
GJ Gordon
J Peters
J Soechting
J Soechting
M Moussa
M Moussa
M Tamosiunaite
Minija Tamosiunaite
R Dillmann
R Horaud
R Sutton
RJ Williams
RS Sutton
RS Sutton
T Strösslin
Tamim Asfour
V Ruis de Angulo
Publication venue: Springer-Verlag
Publication date: 01/01/2009
Field of study

Reinforcement learning methods can be used in robotics applications especially for specific target-oriented problems, for example the reward-based recalibration of goal directed actions. To this end still relatively large and continuous state-action spaces need to be efficiently handled. The goal of this paper is, thus, to develop a novel, rather simple method which uses reinforcement learning with function approximation in conjunction with different reward-strategies for solving such problems. For the testing of our method, we use a four degree-of-freedom reaching problem in 3D-space simulated by a two-joint robot arm system with two DOF each. Function approximation is based on 4D, overlapping kernels (receptive fields) and the state-action space contains about 10,000 of these. Different types of reward structures are being compared, for example, reward-on- touching-only against reward-on-approach. Furthermore, forbidden joint configurations are punished. A continuous action space is used. In spite of a rather large number of states and the continuous action space these reward/punishment strategies allow the system to find a good solution usually within about 20 trials. The efficiency of our method demonstrated in this test scenario suggests that it might be possible to use it on a real robot for problems where mixed rewards can be defined in situations where other types of learning might be difficult

Vytautas Magnus University Institutional Repository (VMU ePub)

Public Library of Science (PLOS)

Reinforcement learning or active inference?

This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain

CiteSeerX

Directory of Open Access Journals

Cortical column design: a link between the maps of preferred orientation and orientation tuning strength?

Author: A Grinvald
AL Humphrey
AS Rojer
B Chapman
BB Lee
C Malsburg von der
C Malsburg von der
CR Legendy
D Ferster
DH Hubel
DH Hubel
DH Hubel
DH Hubel
DH Hubel
DP Edwards
DY Ts'o
E Batschelet
E Niebur
Ernst Niebur
F Wörgötter
F Wörgötter
F Wörgötter
F Wörgötter
Florentin Wörgötter
G Götz
G Götz
GA Orban
GG Blasdel
GG Blasdel
GG Blasdel
J Szentagothai
K Albus
K Obermayer
K Obermayer
K Toyama
K Toyama
KD Miller
KD Miller
LN Cooper
MM Nass
NV Swindale
NV Swindale
NV Swindale
NV Swindale
R Bauer
R Durbin
R Elsdale
R Hess
R Linsker
R Linsker
R Perez
RD Frostig
RE Soodak
RN Bracewell
T Bonhoeffer
T Kohonen
V Braitenberg
V Braitenberg
WT Baxter
Y Hata
YC Diao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

arXiv.org e-Print Archive

Coverage, Continuity and Visual Cortical Architecture

Author: A Antonini
A Cimponeriu
A Das
A Grabska-Barwinska
A Grinvald
A Newell
A Turing
AA Koulakov
B Chapman
BJ Farley
BS Wang
C Beaulieu
C Blakemore
C Chiu
C von der Malsburg
CE Giacomantonio
CE Giacomantonio
DH Hubel
DH Hubel
DJ Field
DL Applegate
DL Ringach
DN Spinelli
DR Muir
E Bartfeld
E Erwin
E Erwin
F Hoffsümmer
F Wolf
F Wolf
F Wolf
F Wolf
F Wolf
F Wörgötter
G Blasdel
GG Blasdel
GG Blasdel
GJ Goodhill
GJ Goodhill
GJ Goodhill
GJ Goodhill
H Ritter
H Ritter
H Simpson
H Yu
H Yu
HC Lee
HU Bauer
HY Lee
JC Crowley
JC Horton
JC Horton
JH Kaas
JO Kriegs
JO Kriegs
JR Wible
JS Lund
K Kang
K Obermayer
K Obermayer
K Obermayer
K Obermayer
K Ohki
K Pawelzik
K Rose
KD Miller
L Reichl
L Reichl
L Reichl
LE White
LE White
M Abramowitz
M Huang
M Kaschube
M Kaschube
M Kaschube
M Kaschube
M Kaschube
M Kaschube
M Schnabel
M Schnabel
M Schnabel
M Stetter
MA Carreira-Perpiñán
MA Carreira-Perpiñán
MA Carreira-Perpiñán
MC Cross
MC Cross
N Mayer
N Mermin
NM Mayer
NV Swindale
NV Swindale
NV Swindale
NV Swindale
NV Swindale
NV Swindale
NV Swindale
OR Bininda-Emonds
P Berkes
P Buzas
P Dayan
P Manneville
PA Hetherington
PC Bressloff
PC Bressloff
PJ Thomas
PJ Thomas
R Durbin
R Durbin
R Durbin
R Linsker
R Linsker
S Schuett
S Tanaka
S Wimbauer
SDV Hooser
T Bonhoeffer
T Bonhoeffer
T Hensch
T Kohonen
T Kohonen
UA Ernst
V Braitenberg
W Keil
WH Bosking
WH Bosking
Y Frégnac
Y Frégnac
Z Kielian-Jaworowsk
Publication venue
Publication date: 01/01/2011
Field of study

The primary visual cortex of many mammals contains a continuous representation of visual space, with a roughly repetitive aperiodic map of orientation preferences superimposed. It was recently found that orientation preference maps (OPMs) obey statistical laws which are apparently invariant among species widely separated in eutherian evolution. Here, we examine whether one of the most prominent models for the optimization of cortical maps, the elastic net (EN) model, can reproduce this common design. The EN model generates representations which optimally trade of stimulus space coverage and map continuity. While this model has been used in numerous studies, no analytical results about the precise layout of the predicted OPMs have been obtained so far. We present a mathematical approach to analytically calculate the cortical representations predicted by the EN model for the joint mapping of stimulus position and orientation. We find that in all previously studied regimes, predicted OPM layouts are perfectly periodic. An unbiased search through the EN parameter space identifies a novel regime of aperiodic OPMs with pinwheel densities lower than found in experiments. In an extreme limit, aperiodic OPMs quantitatively resembling experimental observations emerge. Stabilization of these layouts results from strong nonlocal interactions rather than from a coverage-continuity-compromise. Our results demonstrate that optimization models for stimulus representations dominated by nonlocal suppressive interactions are in principle capable of correctly predicting the common OPM design. They question that visual cortical feature representations can be explained by a coverage-continuity-compromise.Comment: 100 pages, including an Appendix, 21 + 7 figure

CiteSeerX

MPG.PuRe

Motion processing with wide-field neurons in the retino-tecto-rotundal pathway

Author: A Mahani
A Nguyen
A Revzin
A Schmidt
AV Laverghetta
B Bessete
B Dellen
B Dellen
B Frost
B Hellmann
B Pakkenberg
Babette Dellen
C Deng
C Koch
D Heeger
EH Adelson
F Prevost
F Wörgötter
Florentin Wörgötter
G Marin
GE Hinton
H Karten
H Luksch
H Luksch
H Sun
H Sun
J Engelage
J Letelier
J Mpodozis
JL Barron
John W. Clark
K Macko
K Nakayama
L Wu
LI Benowitz
M Hennig
N Troje
O Güntürkün
P Dayan
P Mulvanny
R Granit
R Khanbabaie
Ralf Wessel
RD Mooney
S Watanabe
T Ngo
U Meyer
W Hodos
W Hodos
W Hodos
W Hodos
Y Gu
Y Wang
Y Wang
Publication venue: Springer US
Publication date: 01/01/2009
Field of study

The retino-tecto-rotundal pathway is the main visual pathway in non-mammalian vertebrates and has been found to be highly involved in visual processing. Despite the extensive receptive fields of tectal and rotundal wide-field neurons, pattern discrimination tasks suggest a system with high spatial resolution. In this paper, we address the problem of how global processing performed by motion-sensitive wide-field neurons can be brought into agreement with the concept of a local analysis of visual stimuli. As a solution to this problem, we propose a firing-rate model of the retino-tecto-rotundal pathway which describes how spatiotemporal information can be organized and retained by tectal and rotundal wide-field neurons while processing Fourier-based motion in absence of periodic receptive-field structures. The model incorporates anatomical and electrophysiological experimental data on tectal and rotundal neurons, and the basic response characteristics of tectal and rotundal neurons to moving stimuli are captured by the model cells. We show that local velocity estimates may be derived from rotundal-cell responses via superposition in a subsequent processing step. Experimentally testable predictions which are both specific and characteristic to the model are provided. Thus, a conclusive explanation can be given of how the retino-tecto-rotundal pathway enables the animal to detect and localize moving objects or to estimate its self-motion parameters

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Springer

Secretaría de Estado de Cultura

Digital.CSIC

Mathematical properties of neuronal TD-rules and differential Hebbian learning: a comparison

Author: A Arleo
A Barto
A Saudargiene
AH Klopf
AH Klopf
B Porr
B Porr
B Porr
Bernd Porr
Christoph Kolodziejski
CJCH Watkins
CL Hull
CL Hull
DJ Foster
F Wörgötter
Florentin Wörgötter
GQ Bi
H Markram
IH Witten
JC Magee
JD Miller
JL Krichmar
JP Pfister
LP Kaelbling
M Tsukamoto
P Dayan
P Dayan
P Dayan
P Manoonpong
P Roberts
PR Montague
PR Montague
R Sutton
RE Suri
RE Suri
RE Suri
RE Suri
RE Suri
RS Sutton
RS Sutton
RS Sutton
RV Florian
SP Singh
T Kulvicius
T Strösslin
TB Boykina
W Gerstner
W Schultz
W Schultz
Y Humeau
Publication venue: Springer-Verlag
Publication date: 01/01/2008
Field of study

A confusingly wide variety of temporally asymmetric learning rules exists related to reinforcement learning and/or to spike-timing dependent plasticity, many of which look exceedingly similar, while displaying strongly different behavior. These rules often find their use in control tasks, for example in robotics and for this rigorous convergence and numerical stability is required. The goal of this article is to review these rules and compare them to provide a better overview over their different properties. Two main classes will be discussed: temporal difference (TD) rules and correlation based (differential hebbian) rules and some transition cases. In general we will focus on neuronal implementations with changeable synaptic weights and a time-continuous representation of activity. In a machine learning (non-neuronal) context, for TD-learning a solid mathematical theory has existed since several years. This can partly be transfered to a neuronal framework, too. On the other hand, only now a more complete theory has also emerged for differential Hebb rules. In general rules differ by their convergence conditions and their numerical stability, which can lead to very undesirable behavior, when wanting to apply them. For TD, convergence can be enforced with a certain output condition assuring that the δ-error drops on average to zero (output control). Correlation based rules, on the other hand, converge when one input drops to zero (input control). Temporally asymmetric learning rules treat situations where incoming stimuli follow each other in time. Thus, it is necessary to remember the first stimulus to be able to relate it to the later occurring second one. To this end different types of so-called eligibility traces are being used by these two different types of rules. This aspect leads again to different properties of TD and differential Hebbian learning as discussed here. Thus, this paper, while also presenting several novel mathematical results, is mainly meant to provide a road map through the different neuronally emulated temporal asymmetrical learning rules and their behavior to provide some guidance for possible applications

GRO.publications (Univ. Göttingen)

Enlighten

Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail

Author: A Arleo
A Barto
A Barto
A Klopf
A Klopf
A Morrison
B Devan
B Poucet
C Clopath
C Hull
C von der Malsburg
C Watkins
D Baras
D Di Castro
D Foster
D Sheynikhovich
DO Hebb
E Bienenstock
E Izhikevich
E Nordlie
E Oja
E Thorndike
E Toleman
E Vasilaki
Eleni Vasilaki
F Wörgötter
H Eichenbaum
H Markram
H Seung
I Fiete
J Baxter
J Wickens
J Wickens
JC Zhang
JNJ Reynolds
JNJ Reynolds
JP Pfister
K Doya
Karl J. Friston
KG Reymann
LF Abbott
M Packard
M Tsodyks
MA Farries
MCW van Rossum
N White
Nicolas Frémaux
P Dayan
P Dayan
P Redgrave
P Roberts
PJ Sjöström
R Jolivet
R Kempter
R Legenstein
R Morris
R Morris
R Morris
R Rao
R Rescorla
R Suri
R Sutton
R Sutton
R Urbanczik
R Williams
RB Stein
RC Malenka
Robert Urbanczik
RS Sutton
RV Florian
S Sajikumar
S Sajikumar
T Kohonen
T Stroesslin
TVP Bliss
U Frey
V Pawlak
W Gerstner
W Gerstner
W Gerstner
W Potjans
W Schultz
W Senn
Walter Senn
Wulfram Gerstner
X Xie
XJ Wang
Y Loewenstein
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Changes of synaptic connections between neurons are thought to be the physiological basis of learning. These changes can be gated by neuromodulators that encode the presence of reward. We study a family of reward-modulated synaptic learning rules for spiking neurons on a learning task in continuous space inspired by the Morris Water maze. The synaptic update rule modifies the release probability of synaptic transmission and depends on the timing of presynaptic spike arrival, postsynaptic action potentials, as well as the membrane potential of the postsynaptic neuron. The family of learning rules includes an optimal rule derived from policy gradient methods as well as reward modulated Hebbian learning. The synaptic update rule is implemented in a population of spiking neurons using a network architecture that combines feedforward input with lateral connections. Actions are represented by a population of hypothetical action cells with strong mexican-hat connectivity and are read out at theta frequency. We show that in this architecture, a standard policy gradient rule fails to solve the Morris watermaze task, whereas a variant with a Hebbian bias can learn the task within 20 trials, consistent with experiments. This result does not depend on implementation details such as the size of the neuronal populations. Our theoretical approach shows how learning new behaviors can be linked to reward-modulated plasticity at the level of single synapses and makes predictions about the voltage and spike-timing dependence of synaptic plasticity and the influence of neuromodulators such as dopamine. It is an important step towards connecting formal theories of reinforcement learning with neuronal and synaptic properties

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Public Library of Science (PLOS)

Directory of Open Access Journals

Bern Open Repository and Information System (BORIS)

White Rose Research Online

Disambiguating Multi–Modal Scene Representations Using Perceptual Grouping Constraints

Author: A Baumberg
A Sha'ashua
A Verri
C Harris
C Schmid
D Crevier
D Field
D Kraft
D Lowe
D Lowe
D Scharstein
E Baseski
E Brunswik
F Schaffalitzky
Florentin Wörgötter
HH Nagel
J Elder
J Elder
J Elder
J Koenderink
J Mayhew
J Rodrigues
J Rodrigues
J Shi
K Koffka
K Köhler
K Mikolajczyk
L van Gool
L Wolff
M Brown
M Felsber
M Felsberg
M Oram
M Popović
N Kim
N Krüger
N Krüger
N Krüger
N Pugeault
N Pugeault
N Pugeault
N Pugeault
N Pugeault
Nicolas Pugeault
Norbert Krüger
O Faugeras
P Kovesi
P König
P Parent
P Perona
R Chung
R Hartley
R Horaud
R Mohan
S Geman
S Sarkar
S Se
SH Lee
Teresa Serrano-Gotarredona
W Freeman
W Geisler
Y Aloimonos
Y Ohta
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

In its early stages, the visual system suffers from a lot of ambiguity and noise that severely limits the performance of early vision algorithms. This article presents feedback mechanisms between early visual processes, such as perceptual grouping, stereopsis and depth reconstruction, that allow the system to reduce this ambiguity and improve early representation of visual information. In the first part, the article proposes a local perceptual grouping algorithm that — in addition to commonly used geometric information — makes use of a novel multi–modal measure between local edge/line features. The grouping information is then used to: 1) disambiguate stereopsis by enforcing that stereo matches preserve groups; and 2) correct the reconstruction error due to the image pixel sampling using a linear interpolation over the groups. The integration of mutual feedback between early vision processes is shown to reduce considerably ambiguity and noise without the need for global constraints

Public Library of Science (PLOS)

Directory of Open Access Journals

GRO.publications (Univ. Göttingen)

Open Research Exeter

Enlighten

Syddansk Universitets Forskerportal

Surrey Research Insight

Odor supported place cell model and goal navigation in rodents

Author: A. Arleo
A. Arleo
A. D. Ekstrom
A. J. Hill
A. S. Etienne
A. S. Etienne
A. S. Etienne
B. Diekmann
B. L. McNaughton
B. McNaughton
B. Takács
C. Barnes
C. Barry
C. G. Atkeson
C. Hölscher
C. W. Harley
D. Eilam
D. G. Wallace
D. G. Wallace
D. G. Wallace
D. J. Foster
D. J. Hines
D. S. Touretzky
D. Yoganarasimha
E. J. Markus
E. Markus
E. Save
F. Nemati
F. P. Battaglia
F. Sargolini
Florentin Wörgötter
G. E. Carvell
H. Eichenbaum
H. Maaswinkel
H. Tanila
I. Q. Whishaw
J. Calton
J. J. Knierim
J. J. Knierim
J. L. Krichmar
J. O’Keefe
J. O’Keefe
J. O’Keefe
J. O’Keefe
J. O’Keefe
J. Prados
J. S. Taube
J. S. Taube
James Ainge
K. Balakrishnan
K. Jeffery
K. Jeffery
M. A. Brown
M. A. Wilson
M. Franzius
M. L. Shapiro
M. L. Shapiro
M. Recce
Minija Tamosiunaite
N. Brunel
P. A. Dudchenko
P. E. Sharp
P. Gaussier
P. Lavenex
P. Lavenex
P. Lavenex
Paul Dudchenko
R. A. Russell
R. Morris
R. Morris
R. Sutton
R. U. Muller
R. U. Muller
R. U. Muller
T. Hafting
T. Hartley
T. S. Collett
T. Strösslin
Tomas Kulvicius
W. T. Tomlinson
Publication venue: Springer US
Publication date: 01/01/2008
Field of study

Experiments with rodents demonstrate that visual cues play an important role in the control of hippocampal place cells and spatial navigation. Nevertheless, rats may also rely on auditory, olfactory and somatosensory stimuli for orientation. It is also known that rats can track odors or self-generated scent marks to find a food source. Here we model odor supported place cells by using a simple feed-forward network and analyze the impact of olfactory cues on place cell formation and spatial navigation. The obtained place cells are used to solve a goal navigation task by a novel mechanism based on self-marking by odor patches combined with a Q-learning algorithm. We also analyze the impact of place cell remapping on goal directed behavior when switching between two environments. We emphasize the importance of olfactory cues in place cell formation and show that the utility of environmental and self-generated olfactory cues, together with a mixed navigation strategy, improves goal directed navigation

Stirling Online Research Repository (RIOXX)

Enlighten

GRO.publications (Univ. Göttingen)

Stirling Online Research Repository

Vytautas Magnus University Institutional Repository (VMU ePub)