249 research outputs found
Reinforcement Learning: A Survey
This paper surveys the field of reinforcement learning from a
computer-science perspective. It is written to be accessible to researchers
familiar with machine learning. Both the historical basis of the field and a
broad selection of current work are summarized. Reinforcement learning is the
problem faced by an agent that learns behavior through trial-and-error
interactions with a dynamic environment. The work described here has a
resemblance to work in psychology, but differs considerably in the details and
in the use of the word ``reinforcement.'' The paper discusses central issues of
reinforcement learning, including trading off exploration and exploitation,
establishing the foundations of the field via Markov decision theory, learning
from delayed reinforcement, constructing empirical models to accelerate
learning, making use of generalization and hierarchy, and coping with hidden
state. It concludes with a survey of some implemented systems and an assessment
of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file
Visualizing Convolutional Networks for MRI-based Diagnosis of Alzheimer's Disease
Visualizing and interpreting convolutional neural networks (CNNs) is an
important task to increase trust in automatic medical decision making systems.
In this study, we train a 3D CNN to detect Alzheimer's disease based on
structural MRI scans of the brain. Then, we apply four different gradient-based
and occlusion-based visualization methods that explain the network's
classification decisions by highlighting relevant areas in the input image. We
compare the methods qualitatively and quantitatively. We find that all four
methods focus on brain regions known to be involved in Alzheimer's disease,
such as inferior and middle temporal gyrus. While the occlusion-based methods
focus more on specific regions, the gradient-based methods pick up distributed
relevance patterns. Additionally, we find that the distribution of relevance
varies across patients, with some having a stronger focus on the temporal lobe,
whereas for others more cortical areas are relevant. In summary, we show that
applying different visualization methods is important to understand the
decisions of a CNN, a step that is crucial to increase clinical impact and
trust in computer-based decision support systems.Comment: MLCN 201
Learning Symbolic Models of Stochastic Domains
In this article, we work towards the goal of developing agents that can learn
to act in complex worlds. We develop a probabilistic, relational planning rule
representation that compactly models noisy, nondeterministic action effects,
and show how such rules can be effectively learned. Through experiments in
simple planning domains and a 3D simulated blocks world with realistic physics,
we demonstrate that this learning algorithm allows agents to effectively model
world dynamics
Evolving Symbolic Controllers
International audienceThe idea of symbolic controllers tries to bridge the gap between the top-down manual design of the controller architecture, as advocated in Brooks' subsumption architecture, and the bottom-up designer-free approach that is now standard within the Evolutionary Robotics community. The designer provides a set of elementary behavior, and evolution is given the goal of assembling them to solve complex tasks. Two experiments are presented, demonstrating the efficiency and showing the recursiveness of this approach. In particular, the sensitivity with respect to the proposed elementary behaviors, and the robustness w.r.t. generalization of the resulting controllers are studied in detail
Competition in Social Networks: Emergence of a Scale-free Leadership Structure and Collective Efficiency
Using the minority game as a model for competition dynamics, we investigate
the effects of inter-agent communications on the global evolution of the
dynamics of a society characterized by competition for limited resources. The
agents communicate across a social network with small-world character that
forms the static substrate of a second network, the influence network, which is
dynamically coupled to the evolution of the game. The influence network is a
directed network, defined by the inter-agent communication links on the
substrate along which communicated information is acted upon. We show that the
influence network spontaneously develops hubs with a broad distribution of
in-degrees, defining a robust leadership structure that is scale-free.
Furthermore, in realistic parameter ranges, facilitated by information exchange
on the network, agents can generate a high degree of cooperation making the
collective almost maximally efficient.Comment: 4 pages, 2 postscript figures include
Regression with Linear Factored Functions
Many applications that use empirically estimated functions face a curse of
dimensionality, because the integrals over most function classes must be
approximated by sampling. This paper introduces a novel regression-algorithm
that learns linear factored functions (LFF). This class of functions has
structural properties that allow to analytically solve certain integrals and to
calculate point-wise products. Applications like belief propagation and
reinforcement learning can exploit these properties to break the curse and
speed up computation. We derive a regularized greedy optimization scheme, that
learns factored basis functions during training. The novel regression algorithm
performs competitively to Gaussian processes on benchmark tasks, and the
learned LFF functions are with 4-9 factored basis functions on average very
compact.Comment: Under review as conference paper at ECML/PKDD 201
Towards a Universal Theory of Artificial Intelligence based on Algorithmic Probability and Sequential Decision Theory
Decision theory formally solves the problem of rational agents in uncertain
worlds if the true environmental probability distribution is known.
Solomonoff's theory of universal induction formally solves the problem of
sequence prediction for unknown distribution. We unify both theories and give
strong arguments that the resulting universal AIXI model behaves optimal in any
computable environment. The major drawback of the AIXI model is that it is
uncomputable. To overcome this problem, we construct a modified algorithm
AIXI^tl, which is still superior to any other time t and space l bounded agent.
The computation time of AIXI^tl is of the order t x 2^l.Comment: 8 two-column pages, latex2e, 1 figure, submitted to ijca
A two step algorithm for learning from unspecific reinforcement
We study a simple learning model based on the Hebb rule to cope with
"delayed", unspecific reinforcement. In spite of the unspecific nature of the
information-feedback, convergence to asymptotically perfect generalization is
observed, with a rate depending, however, in a non- universal way on learning
parameters. Asymptotic convergence can be as fast as that of Hebbian learning,
but may be slower. Moreover, for a certain range of parameter settings, it
depends on initial conditions whether the system can reach the regime of
asymptotically perfect generalization, or rather approaches a stationary state
of poor generalization.Comment: 13 pages LaTeX, 4 figures, note on biologically motivated stochastic
variant of the algorithm adde
Towards a universal theory of artificial intelligence based on algorithmic probability and sequential decisions
Decision theory formally solves the problem of rational agents in uncertain worlds if the true environmental probability distribution is known. Solomonoff’s theory of universal induction formally solves the problem of sequence prediction for unknown distributions. We unify both theories and give strong arguments that the resulting universal AIξ model behaves optimally in any computable environment. The major drawback of the AIξ model is that it is uncomputable. To overcome this problem, we construct a modified algorithm AIξ, which is still superior to any other time t and length l bounded agent. The computation time of AIξtl is of the order t·2 l.This work was supported by SNF grant 2000-61847.00 to Jürgen Schmidhuber
Beyond Hebb: Exclusive-OR and Biological Learning
A learning algorithm for multilayer neural networks based on biologically
plausible mechanisms is studied. Motivated by findings in experimental
neurobiology, we consider synaptic averaging in the induction of plasticity
changes, which happen on a slower time scale than firing dynamics. This
mechanism is shown to enable learning of the exclusive-OR (XOR) problem without
the aid of error back-propagation, as well as to increase robustness of
learning in the presence of noise.Comment: 4 pages RevTeX, 2 figures PostScript, revised versio
- …