A two step algorithm for learning from unspecific reinforcement

Barto A G; Biehl M; Biehl M; Bös S; Hertz J; Ion-Olimpiu Stamatescu; Kaelbling L P; Kinouchi O; Mlodinow L; Reimer Kühn; Stamatescu I-O; Stamatescu I-O; Sutton R S; Vallet F; Watkins C J C H

research

A two step algorithm for learning from unspecific reinforcement

Authors: Barto A G
Biehl M
Biehl M
Bös S
Hertz J
Ion-Olimpiu Stamatescu
Kaelbling L P
Kinouchi O
Mlodinow L
Reimer Kühn
Stamatescu I-O
Stamatescu I-O
Sutton R S
Vallet F
Watkins C J C H
Publication date: 1 January 1999
Publisher: 'IOP Publishing'
Doi

Abstract

We study a simple learning model based on the Hebb rule to cope with "delayed", unspecific reinforcement. In spite of the unspecific nature of the information-feedback, convergence to asymptotically perfect generalization is observed, with a rate depending, however, in a non- universal way on learning parameters. Asymptotic convergence can be as fast as that of Hebbian learning, but may be slower. Moreover, for a certain range of parameter settings, it depends on initial conditions whether the system can reach the regime of asymptotically perfect generalization, or rather approaches a stationary state of poor generalization.Comment: 13 pages LaTeX, 4 figures, note on biologically motivated stochastic variant of the algorithm adde

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Crossref

Last time updated on 18/02/2019