Search CORE

249 research outputs found

Reinforcement Learning: A Survey

Author: Kaelbling L. P.
Littman M. L.
Moore A. W.
Publication venue
Publication date: 01/01/1996
Field of study

This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Visualizing Convolutional Networks for MRI-based Diagnosis of Alzheimer's Disease

Author: G Litjens
G Montavon
GB Frisoni
L. P. Kaelbling
MD Zeiler
Y Mu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/08/2018
Field of study

Visualizing and interpreting convolutional neural networks (CNNs) is an important task to increase trust in automatic medical decision making systems. In this study, we train a 3D CNN to detect Alzheimer's disease based on structural MRI scans of the brain. Then, we apply four different gradient-based and occlusion-based visualization methods that explain the network's classification decisions by highlighting relevant areas in the input image. We compare the methods qualitatively and quantitatively. We find that all four methods focus on brain regions known to be involved in Alzheimer's disease, such as inferior and middle temporal gyrus. While the occlusion-based methods focus more on specific regions, the gradient-based methods pick up distributed relevance patterns. Additionally, we find that the distribution of relevance varies across patients, with some having a stronger focus on the temporal lobe, whereas for others more cortical areas are relevant. In summary, we show that applying different visualization methods is important to understand the decisions of a CNN, a step that is crucial to increase clinical impact and trust in computer-based decision support systems.Comment: MLCN 201

arXiv.org e-Print Archive

Crossref

Learning Symbolic Models of Stochastic Domains

Author: Kaelbling L. P.
Pasula H. M.
Zettlemoyer L. S.
Publication venue: 'AI Access Foundation'
Publication date: 10/10/2011
Field of study

In this article, we work towards the goal of developing agents that can learn to act in complex worlds. We develop a probabilistic, relational planning rule representation that compactly models noisy, nondeterministic action effects, and show how such rules can be effectively learned. Through experiments in simple planning domains and a 3D simulated blocks world with realistic physics, we demonstrate that this learning algorithm allows agents to effectively model world dynamics

arXiv.org e-Print Archive

Crossref

Evolving Symbolic Controllers

Author: D. Floreano
H.-P. Schwefel
J.R. Millan
L. P. Kaelbling
M. Keijzer
R. A. Brooks
Publication venue: Springer Verlag
Publication date: 01/01/2003
Field of study

International audienceThe idea of symbolic controllers tries to bridge the gap between the top-down manual design of the controller architecture, as advocated in Brooks' subsumption architecture, and the bottom-up designer-free approach that is now standard within the Evolutionary Robotics community. The designer provides a set of elementary behavior, and evolution is given the goal of assembling them to solve complex tasks. Two experiments are presented, demonstrating the efficiency and showing the recursiveness of this approach. In particular, the sensitivity with respect to the proposed elementary behaviors, and the robustness w.r.t. generalization of the resulting controllers are studied in detail

HAL-CentraleSupelec

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Polytechnique

HAL-Rennes 1

Competition in Social Networks: Emergence of a Scale-free Leadership Structure and Collective Efficiency

Author: B. W. Arthur
D. J. Watts
G. Korniss
Kevin E. Bassler
L. P. Kaelbling
M. Anghel
S. A. Kauffman
Z. Toroczkai
Zoltán Toroczkai
Publication venue: 'American Physical Society (APS)'
Publication date: 30/07/2003
Field of study

Using the minority game as a model for competition dynamics, we investigate the effects of inter-agent communications on the global evolution of the dynamics of a society characterized by competition for limited resources. The agents communicate across a social network with small-world character that forms the static substrate of a second network, the influence network, which is dynamically coupled to the evolution of the game. The influence network is a directed network, defined by the inter-agent communication links on the substrate along which communicated information is acted upon. We show that the influence network spontaneously develops hubs with a broad distribution of in-degrees, defining a robust leadership structure that is scale-free. Furthermore, in realistic parameter ranges, facilitated by information exchange on the network, agents can generate a high degree of cooperation making the collective almost maximally efficient.Comment: 4 pages, 2 postscript figures include

arXiv.org e-Print Archive

Crossref

Regression with Linear Factored Functions

Author: CM Bishop
I-C Yeh
J Gerritsma
JA Nelder
JH Friedman
L Csató
LP Kaelbling
ME Tipping
P Cortez
P Tüfekci
W Böhmer
W Böhmer
Z Wang
Publication venue
Publication date: 30/03/2015
Field of study

Many applications that use empirically estimated functions face a curse of dimensionality, because the integrals over most function classes must be approximated by sampling. This paper introduces a novel regression-algorithm that learns linear factored functions (LFF). This class of functions has structural properties that allow to analytically solve certain integrals and to calculate point-wise products. Applications like belief propagation and reinforcement learning can exploit these properties to break the curse and speed up computation. We derive a regularized greedy optimization scheme, that learns factored basis functions during training. The novel regression algorithm performs competitively to Gaussian processes on benchmark tasks, and the learned LFF functions are with 4-9 factored basis functions on average very compact.Comment: Under review as conference paper at ECML/PKDD 201

arXiv.org e-Print Archive

Crossref

Towards a Universal Theory of Artificial Intelligence based on Algorithmic Probability and Sequential Decision Theory

Author: A. N. Kolmogorov
D. P. Bertsekas
D. P. Bertsekas
G. J. Chaitin
J. Schmidhuber
J. Schmidhuber
L. A. Levin
L. A. Levin
L. P. Kaelbling
M. Feder
P. Gács
R. Bellman
R. J. Solomonoff
R. J. Solomonoff
R. Sutton
S. J. Russell
Publication venue
Publication date: 01/01/2000
Field of study

Decision theory formally solves the problem of rational agents in uncertain worlds if the true environmental probability distribution is known. Solomonoff's theory of universal induction formally solves the problem of sequence prediction for unknown distribution. We unify both theories and give strong arguments that the resulting universal AIXI model behaves optimal in any computable environment. The major drawback of the AIXI model is that it is uncomputable. To overcome this problem, we construct a modified algorithm AIXI^tl, which is still superior to any other time t and space l bounded agent. The computation time of AIXI^tl is of the order t x 2^l.Comment: 8 two-column pages, latex2e, 1 figure, submitted to ijca

arXiv.org e-Print Archive

CiteSeerX

Crossref

The Australian National University

A two step algorithm for learning from unspecific reinforcement

Author: Barto A G
Biehl M
Biehl M
Bös S
Hertz J
Ion-Olimpiu Stamatescu
Kaelbling L P
Kinouchi O
Mlodinow L
Reimer Kühn
Stamatescu I-O
Stamatescu I-O
Sutton R S
Vallet F
Watkins C J C H
Publication venue: 'IOP Publishing'
Publication date: 01/01/1999
Field of study

We study a simple learning model based on the Hebb rule to cope with "delayed", unspecific reinforcement. In spite of the unspecific nature of the information-feedback, convergence to asymptotically perfect generalization is observed, with a rate depending, however, in a non- universal way on learning parameters. Asymptotic convergence can be as fast as that of Hebbian learning, but may be slower. Moreover, for a certain range of parameter settings, it depends on initial conditions whether the system can reach the regime of asymptotically perfect generalization, or rather approaches a stationary state of poor generalization.Comment: 13 pages LaTeX, 4 figures, note on biologically motivated stochastic variant of the algorithm adde

arXiv.org e-Print Archive

CiteSeerX

Crossref

Towards a universal theory of artificial intelligence based on algorithmic probability and sequential decisions

Author: A. N. Kolmogorov
D. P. Bertsekas
D. P. Bertsekas
G. J. Chaitin
J. Schmidhuber
J. Schmidhuber
L. A. Levin
L. A. Levin
L. P. Kaelbling
M. Feder
P. Gács
R. Bellman
R. J. Solomonoff
R. J. Solomonoff
R. Sutton
S. J. Russell
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2001
Field of study

Decision theory formally solves the problem of rational agents in uncertain worlds if the true environmental probability distribution is known. Solomonoff’s theory of universal induction formally solves the problem of sequence prediction for unknown distributions. We unify both theories and give strong arguments that the resulting universal AIξ model behaves optimally in any computable environment. The major drawback of the AIξ model is that it is uncomputable. To overcome this problem, we construct a modified algorithm AIξ, which is still superior to any other time t and length l bounded agent. The computation time of AIξtl is of the order t·2 l.This work was supported by SNF grant 2000-61847.00 to Jürgen Schmidhuber

Crossref

The Australian National University

Beyond Hebb: Exclusive-OR and Biological Learning

Author: A. G. Barto
A. G. Barto
A. L. Yuille
C. M. Coussens
D. E. Rumelhart
D. O. Hebb
D. R. Chialvo
D. Stassinopoulos
D. Zipser
F. H. C. Crick
F. Rosenblatt
Heinz Georg Schuster
J. J. Hopfield
J. J. Kim
Konstantin Klemm
L. P. Kaelbling
M. L. Minsky
M. Reyes-Harde
N. A. Otmakhova
P. Alstrøm
P. Bak
R. S. Sutton
S. Ramón y Cajal
Stefan Bornholdt
T. Heskes
W. C. Abraham
W. C. Abraham
W. S. McCulloch
Publication venue: 'American Physical Society (APS)'
Publication date: 24/03/2000
Field of study

A learning algorithm for multilayer neural networks based on biologically plausible mechanisms is studied. Motivated by findings in experimental neurobiology, we consider synaptic averaging in the induction of plasticity changes, which happen on a slower time scale than firing dynamics. This mechanism is shown to enable learning of the exclusive-OR (XOR) problem without the aid of error back-propagation, as well as to increase robustness of learning in the presence of noise.Comment: 4 pages RevTeX, 2 figures PostScript, revised versio

arXiv.org e-Print Archive

Crossref