Search CORE

8 research outputs found

Recommended from our members

SMART (Stochastic Model Acquisition with ReinforcemenT) learning agents: A preliminary report

Author: C. Boutilier
G.J. Tesauro
G.L. Drescher
L. Dehaspe
L.P. Kaelbling
R.E. Fikes
R.S. Sutton
S.H. Muggleton
T. Oates
T. Oates
W. Shen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

We present a framework for building agents that learn using SMART, a system that combines stochastic model acquisition with reinforcement learning to enable an agent to model its environment through experience and subsequently form action selection policies using the acquired model. We extend an existing algorithm for automatic creation of stochastic strips operators [9] as a preliminary method of environment modelling. We then define the process of generation of future states using these operators and an initial state and finally show the process by which the agent can use the generated states to form a policy with a standard reinforcement learning algorithm. The potential of SMART is exemplified using the well-known predator prey scenario. Results of applying SMART to this environment and directions for future work are discussed

City Research Online

Crossref

Linear Least-Squares algorithms for temporal difference learning

Author: A. G. Barto
Andrew G. Barto
C. J. C. H. Watkins
C. W. Anderson
G.C. Goodwin
G.J. Tesauro
H. Robbins
J.G. Kemeny
J.N. Tsitsiklis
L. Ljung
P. Dayan
P.J. Werbos
P.J. Werbos
P.J. Werbos
P.J. Werbos
R.S. Sutton
Steven J. Bradtke
T. Söderström
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1996
Field of study

Crossref

Neural networks for computer virus recognition

Author: Kephart J.O.
Sorkin Gregory B.
Tesauro G.J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/1996
Field of study

We have developed a neural network for generic detection of a particular class of computer viruses-the so called boot sector viruses that infect the boot sector of a floppy disk or a hard drive. This is an important and relatively tractable subproblem of generic virus detection. Only about 5% of all known viruses are boot sector viruses, yet they account for nearly 90% of all virus incidents. We have successfully deployed our neural network as a commercial product, distributing it to millions of PC users worldwide as part of the IBM AntiVirus software package. We faced several challenges in taking our neural network from a research idea to a commercial product. These included designing an appropriate input representation scheme; dealing with the scarcity of available training data; finding an appropriate trade off point between false positives and false negatives to conform to user expectations; and making the software conform to strict constraints on memory and speed of computation needed to run on PCs. The article discusses our methods for handling these challenges

LSE Research Online

Reinforcement Learning with Echo State Networks

Author: B. Bakker
D.P. Bertsekas
G. Tesauro
G.J. Gordon
J.L. Elman
M.R. Glickman
R. Sutton
S.J. Russell
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Crossref

Zusammenfassungen. Römische Historische Mitteilungen|Römische Historische Mitteilungen 60|

Author: C.J.C. Burges
D.W. Aha
G. Salton
G.J. Tesauro
I.H. Witten
J.R. Quinlan
J.R. Quinlan
L. Breiman
R. Perdisci
W.B. Frakes
Publication venue: oeaw
Publication date: 01/01/2010
Field of study

Crossref

Elektronisches Publikationsportal der Österreichischen Akademie der Wissenschaften

Architecture of a morphological malware detector

Author: A. Walenstein
E. Filiol
E. Filiol
G.J. Tesauro
Guillaume Bonfante
Jean-Yves Marion
M. Christodorescu
M. Christodorescu
Matthieu Kaczmarek
Ph. Beaucamps
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Convergence and divergence in standard and averaging reinforcement learning

Author: C.J.C.H. Watkins
G.J. Tesauro
J.A. Boyan
J.N. Tsitsiklis
J.S. Albus
L. Baird
L.P. Kaelbling
R. Bellman
R.S. Sutton
R.S. Sutton
S.P. Singh
T. Jaakkola
T.J. Perkins
Publication venue: Springer-Verlag
Publication date: 01/01/2004
Field of study

Abstract. Although tabular reinforcement learning (RL) methods have been proved to converge to an optimal policy, the combination of particular conventional reinforcement learning techniques with function approximators can lead to divergence. In this paper we show why off-policy RL methods combined with linear function approximators can lead to divergence. Furthermore, we analyze two different types of updates; standard and averaging RL updates. Although averaging RL will not diverge, we show that they can converge to wrong value functions. In our experiments we compare standard to averaging value iteration (VI) with CMACs and the results show that for small values of the discount factor averaging VI works better, whereas for large values of the discount factor standard VI performs better, although it does not always converge.

CiteSeerX

Crossref

Utrecht University Repository

Prioritized sweeping: Reinforcement learning with less data and less time

Author: A.D. Christiansen
A.G. Barto
A.G. Barto
A.G. Barto
A.L. Samuel
A.P. Sage
A.W. Moore
Andrew W. Moore
C. Stanfill
C.J.C.H. Watkins
Christopher G. Atkeson
D. Chapman
D. Michie
D.A. Berry
D.E. Knuth
D.P. Bertsekas
G.J. Tesauro
J. Peng
L.J. Lin
L.P. Kaelbling
M. Sato
N.J. Nilsson
P. Dayan
R.E. Bellman
R.E. Korf
R.S. Sutton
R.S. Sutton
R.S. Sutton
R.S. Sutton
S. Mahadevan
S.B. Thrun
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref