Search CORE

344 research outputs found

Alopex: A Correlation-Based Learning Algorithm for Feedforward and Recurrent Neural Networks

Author: Barto A. G.
Barto A. G.
Harth E.
Harth E.
K. P. Unnikrishnan
K. P. Venugopal
Mukhopadhyay S.
Unnikrishnan K. P.
Unnikrishnan K. P.
Unnikrishnan K. P.
Venugopal K.
Verveen A. A.
Widrow B.
Publication venue: 'MIT Press - Journals'
Publication date
Field of study

Beyond Hebb: Exclusive-OR and Biological Learning

Author: A. G. Barto
A. G. Barto
A. L. Yuille
C. M. Coussens
D. E. Rumelhart
D. O. Hebb
D. R. Chialvo
D. Stassinopoulos
D. Zipser
F. H. C. Crick
F. Rosenblatt
Heinz Georg Schuster
J. J. Hopfield
J. J. Kim
Konstantin Klemm
L. P. Kaelbling
M. L. Minsky
M. Reyes-Harde
N. A. Otmakhova
P. Alstrøm
P. Bak
R. S. Sutton
S. Ramón y Cajal
Stefan Bornholdt
T. Heskes
W. C. Abraham
W. C. Abraham
W. S. McCulloch
Publication venue: 'American Physical Society (APS)'
Publication date: 24/03/2000
Field of study

A learning algorithm for multilayer neural networks based on biologically plausible mechanisms is studied. Motivated by findings in experimental neurobiology, we consider synaptic averaging in the induction of plasticity changes, which happen on a slower time scale than firing dynamics. This mechanism is shown to enable learning of the exclusive-OR (XOR) problem without the aid of error back-propagation, as well as to increase robustness of learning in the presence of noise.Comment: 4 pages RevTeX, 2 figures PostScript, revised versio

arXiv.org e-Print Archive

Crossref

Is there an integrative center in the vertebrate brain-stem? A robotic evaluation of a model of the reticular formation viewed as an action selection device

Author: Baerends G.
Barto A.
Blumberg B.
Bowsher D.
Cherniak C.
Delwaide P.
Humphries M.
Jones B.
Kevin Gurney
Maes P.
Mark D. Humphries
Montes-Gonzalez F.
Newman D.
Scheibel M.
Stevens D.
Tony J. Prescott
Publication venue: 'SAGE Publications'
Publication date: 01/01/2005
Field of study

Neurobehavioral data from intact, decerebrate, and neonatal rats, suggests that the reticular formation provides a brainstem substrate for action selection in the vertebrate central nervous system. In this article, Kilmer, McCulloch and Blum’s (1969, 1997) landmark reticular formation model is described and re-evaluated, both in simulation and, for the first time, as a mobile robot controller. Particular model configurations are found to provide effective action selection mechanisms in a robot survival task using either simulated or physical robots. The model’s competence is dependent on the organization of afferents from model sensory systems, and a genetic algorithm search identified a class of afferent configurations which have long survival times. The results support our proposal that the reticular formation evolved to provide effective arbitration between innate behaviors and, with the forebrain basal ganglia, may constitute the integrative, ’centrencephalic’ core of vertebrate brain architecture. Additionally, the results demonstrate that the Kilmer et al. model provides an alternative form of robot controller to those usually considered in the adaptive behavior literature

CiteSeerX

Crossref

The University of Manchester - Institutional Repository

White Rose Research Online

A two step algorithm for learning from unspecific reinforcement

Author: Barto A G
Biehl M
Biehl M
Bös S
Hertz J
Ion-Olimpiu Stamatescu
Kaelbling L P
Kinouchi O
Mlodinow L
Reimer Kühn
Stamatescu I-O
Stamatescu I-O
Sutton R S
Vallet F
Watkins C J C H
Publication venue: 'IOP Publishing'
Publication date: 01/01/1999
Field of study

We study a simple learning model based on the Hebb rule to cope with "delayed", unspecific reinforcement. In spite of the unspecific nature of the information-feedback, convergence to asymptotically perfect generalization is observed, with a rate depending, however, in a non- universal way on learning parameters. Asymptotic convergence can be as fast as that of Hebbian learning, but may be slower. Moreover, for a certain range of parameter settings, it depends on initial conditions whether the system can reach the regime of asymptotically perfect generalization, or rather approaches a stationary state of poor generalization.Comment: 13 pages LaTeX, 4 figures, note on biologically motivated stochastic variant of the algorithm adde

arXiv.org e-Print Archive

CiteSeerX

Crossref

Active Learning in Persistent Surveillance UAV Missions

Author: Astrom K. J.
Barto A.
Bertuccelli L.F.
Howard R. A.
Iyengar G.
Kaelbling L. P.
Moore A. W.
Nilim A.
Puterman M. L.
Russell S. J.
Tan M.
Watkins C.
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date: 01/04/2009
Field of study

The performance of many complex UAV decision-making problems can be extremely sensitive to small errors in the model parameters. One way of mitigating this sensitivity is by designing algorithms that more effectively learn the model throughout the course of a mission. This paper addresses this important problem by considering model uncertainty in a multi-agent Markov Decision Process (MDP) and using an active learning approach to quickly learn transition model parameters. We build on previous research that allowed UAVs to passively update model parameter estimates by incorporating new state transition observations. In this work, however, the UAVs choose to actively reduce the uncertainty in their model parameters by taking exploratory and informative actions. These actions result in a faster adaptation and, by explicitly accounting for UAV fuel dynamics, also mitigates the risk of the exploration. This paper compares the nominal, passive learning approach against two methods for incorporating active learning into the MDP framework: (1) All state transitions are rewarded equally, and (2) State transition rewards are weighted according to the expected resulting reduction in the variance of the model parameter. In both cases, agent behaviors emerge that enable faster convergence of the uncertain model parameters to their true values

DSpace@MIT

Crossref

Embodied imitation-enhanced reinforcement learning in multi-agent systems

Author: Abbeel P.
Barto A. G.
Bentivegna D. C.
Bradtke S. J.
Crites R. H.
Erbas M. D.
Latzke T.
Mondada F.
Nehaniv C. L.
Nolfi S.
Smart W. D.
Strosslin T.
Sutton R. S.
Watkins C.
Publication venue: 'SAGE Publications'
Publication date: 01/02/2014
Field of study

Imitation is an example of social learning in which an individual observes and copies another's actions. This paper presents a new method for using imitation as a way of enhancing the learning speed of individual agents that employ a well-known reinforcement learning algorithm, namely Q-learning. Compared with other research that uses imitation with reinforcement learning, our method uses imitation of purely observed behaviours to enhance learning, with no internal state access or sharing of experiences between agents. The paper evaluates our imitation-enhanced reinforcement learning approach in both simulation and with real robots in continuous space. Both simulation and real robot experimental results show that the learning speed of the group is improved. © The Author(s) 2013

Crossref

UWE Bristol Research Repository

Algebraic Theory of Promise Constraint Satisfaction Problems, First Steps

Author: A Bulatov
D Geiger
G Birkhoff
I Dinur
J Håstad
Joshua Brakensiek
L Barto
L Lovász
M Bodirsky
N Pippenger
P Austrin
P Jeavons
P Jeavons
S Arora
S Arora
S Huang
T Feder
VG Bodnarchuk
VG Bodnarchuk
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/09/2019
Field of study

What makes a computational problem easy (e.g., in P, that is, solvable in polynomial time) or hard (e.g., NP-hard)? This fundamental question now has a satisfactory answer for a quite broad class of computational problems, so called fixed-template constraint satisfaction problems (CSPs) -- it has turned out that their complexity is captured by a certain specific form of symmetry. This paper explains an extension of this theory to a much broader class of computational problems, the promise CSPs, which includes relaxed versions of CSPs such as the problem of finding a 137-coloring of a 3-colorable graph

arXiv.org e-Print Archive

Crossref

Human Demonstrations for Fast and Safe Exploration in Reinforcement Learning

Author: Barto A. G.
Bellman R. E.
Bertsekas D. P.
Busoniu L.
de Croon G. C. H. E.
Faust A.
Lagoudakis M. G.
Li L.
Mannucci T.
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date
Field of study

<p>Reinforcement learning is a promising framework for controlling complex vehicles with a high level of autonomy, since it does not need a dynamic model of the vehicle, and it is able to adapt to changing conditions. When learning from scratch, the performance of a reinforcement learning controller may initially be poor and -for real life applications- unsafe. In this paper the effects of using human demonstrations on the performance of reinforcement learning is investigated, using a combination of offline and online least squares policy iteration. It is found that using the human as an efficient explorer improves learning time and performance for a benchmark reinforcement learning problem. The benefit of the human demonstration is larger for problems where the human can make use of its understanding of the problem to efficiently explore the state space. Applied to a simplified quadrotor slung load drop off problem, the use of human demonstrations reduces the number of crashes during learning. As such, this paper contributes to safer and faster learning for model-free, adaptive control problems.</p

Crossref

Linear Least-Squares algorithms for temporal difference learning

Author: A. G. Barto
Andrew G. Barto
C. J. C. H. Watkins
C. W. Anderson
G.C. Goodwin
G.J. Tesauro
H. Robbins
J.G. Kemeny
J.N. Tsitsiklis
L. Ljung
P. Dayan
P.J. Werbos
P.J. Werbos
P.J. Werbos
P.J. Werbos
R.S. Sutton
Steven J. Bradtke
T. Söderström
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1996
Field of study

Crossref

Major liver resection, systemic fibrinolytic activity, and the impact of tranexamic acid

Author: Coburn Natalie G.
Hallet Julie
Karanicolas Paul J.
Law Calvin H. L.
Lin Yulia
McCluskey Stuart A.
Nascimento Barto
Pawliszyn Janusz
Tarshis Jordan
Publication venue: 'Elsevier BV'
Publication date: 18/10/2016
Field of study

The final publication is available at Elsevier via http://dx.doi.org/10.1016/j.hpb.2016.09.005 © 2016. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/Background: Hyperfibrinolysis may occur due to systemic inflammation or hepatic injury that occurs during liver resection. Tranexamic acid (TXA) is an antifibrinolytic agent that decreases bleeding in various settings, but has not been well studied in patients undergoing liver resection. Methods: In this prospective, phase II trial, 18 patients undergoing major liver resection were sequentially assigned to one of three cohorts: (i) Control (no TXA); (ii) TXA Dose I - 1 g bolus followed by 1 g infusion over 8 h; (iii) TXA Dose II - 1 g bolus followed by 10 mg/kg/hr until the end of surgery. Serial blood samples were collected for thromboelastography (TEG), coagulation components and TXA concentration. Results: No abnormalities in hemostatic function were identified on TEG. PAP complex levels increased to peak at 1106 mu g/L (normal 0-512 mu g/L) following parenchymal transection, then decreased to baseline by the morning following surgery. TXA reached stable, therapeutic concentrations early in both dosing regimens. There were no differences between patients based on TXA. Conclusions: There is no thromboelastographic evidence of hyperfibrinolysis in patients undergoing major liver resection. TXA does not influence the change in systemic fibrinolysis; it may reduce bleeding through a different mechanism of action

University of Waterloo's Institutional Repository

PubMed Central