Search CORE

406 research outputs found

Reinforcement Learning: A Survey

Author: Kaelbling L. P.
Littman M. L.
Moore A. W.
Publication venue
Publication date: 01/01/1996
Field of study

This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Learning Symbolic Models of Stochastic Domains

Author: Kaelbling L. P.
Pasula H. M.
Zettlemoyer L. S.
Publication venue: 'AI Access Foundation'
Publication date: 10/10/2011
Field of study

In this article, we work towards the goal of developing agents that can learn to act in complex worlds. We develop a probabilistic, relational planning rule representation that compactly models noisy, nondeterministic action effects, and show how such rules can be effectively learned. Through experiments in simple planning domains and a 3D simulated blocks world with realistic physics, we demonstrate that this learning algorithm allows agents to effectively model world dynamics

arXiv.org e-Print Archive

Crossref

Learning probabilistic relational planning rules

Author: Hanna M. Pasula
Leslie Pack Kaelbling
Luke S. Zettlemoyer
Publication venue
Publication date
Field of study

To learn to behave in highly complex domains, agents must represent and learn compact models of the world dynamics. In this paper, we present an algorithm for learning probabilistic STRIPS-like planning operators from examples. We demonstrate the effective learning of rule-based operators for a wide range of traditional planning domains

CiteSeerX

Competition in Social Networks: Emergence of a Scale-free Leadership Structure and Collective Efficiency

Author: B. W. Arthur
D. J. Watts
G. Korniss
Kevin E. Bassler
L. P. Kaelbling
M. Anghel
S. A. Kauffman
Z. Toroczkai
Zoltán Toroczkai
Publication venue: 'American Physical Society (APS)'
Publication date: 30/07/2003
Field of study

Using the minority game as a model for competition dynamics, we investigate the effects of inter-agent communications on the global evolution of the dynamics of a society characterized by competition for limited resources. The agents communicate across a social network with small-world character that forms the static substrate of a second network, the influence network, which is dynamically coupled to the evolution of the game. The influence network is a directed network, defined by the inter-agent communication links on the substrate along which communicated information is acted upon. We show that the influence network spontaneously develops hubs with a broad distribution of in-degrees, defining a robust leadership structure that is scale-free. Furthermore, in realistic parameter ranges, facilitated by information exchange on the network, agents can generate a high degree of cooperation making the collective almost maximally efficient.Comment: 4 pages, 2 postscript figures include

arXiv.org e-Print Archive

Crossref

Fermionic Molecular Dynamics for nuclear dynamics and thermodynamics

Author: D.A. Berry
F.P. Kelly
K.E. Avrachenkov
L.P. Kaelbling
M. Hutter
M. Hutter
P. Samuelson
P.R. Kumar
R.H. Strotz
R.S. Sutton
S. Frederick
S. Kakade
S. Mahadevan
S.J. Russell
Publication venue
Publication date: 01/01/2006
Field of study

A new Fermionic Molecular Dynamics (FMD) model based on a Skyrme functional is proposed in this paper. After introducing the basic formalism, some first applications to nuclear structure and nuclear thermodynamics are presentedComment: 5 pages, Proceedings of the French-Japanese Symposium, September 2008. To be published in Int. J. of Mod. Phys.

arXiv.org e-Print Archive

HAL - Normandie Université

CiteSeerX

HAL-IN2P3

Crossref

The Australian National University

HAL-CEA

Self-Modification of Policy and Utility Function in Rational Agents

Author: B Hibbard
D Dewey
D Silver
J Schmidhuber
L Orseau
L Orseau
L Orseau
LP Kaelbling
M Hutter
M Hutter
M Ring
N Bostrom
R Sutton
RV Yampolskiy
S Legg
V Mnih
Publication venue
Publication date: 10/05/2016
Field of study

Any agent that is part of the environment it interacts with and has versatile actuators (such as arms and fingers), will in principle have the ability to self-modify -- for example by changing its own source code. As we continue to create more and more intelligent agents, chances increase that they will learn about this ability. The question is: will they want to use it? For example, highly intelligent systems may find ways to change their goals to something more easily achievable, thereby `escaping' the control of their designers. In an important paper, Omohundro (2008) argued that goal preservation is a fundamental drive of any intelligent system, since a goal is more likely to be achieved if future versions of the agent strive towards the same goal. In this paper, we formalise this argument in general reinforcement learning, and explore situations where it fails. Our conclusion is that the self-modification possibility is harmless if and only if the value function of the agent anticipates the consequences of self-modifications and use the current utility function when evaluating the future.Comment: Artificial General Intelligence (AGI) 201

arXiv.org e-Print Archive

Crossref

The Australian National University

Learning Users’ Interests in a Market-Based Recommender System

Author: J. Herlocker
L.P. Kaelbling
M. Montaner
P. Resnick
T. Mitchell
Y.Z. Wei
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Recommender systems are widely used to cope with the problem of information overload and, consequently, many recommendation methods have been developed. However, no one technique is best for all users in all situations. To combat this, we have previously developed a market-based recommender system that allows multiple agents (each representing a different recommendation method or system) to compete with one another to present their best recommendations to the user. Our marketplace thus coordinates multiple recommender agents and ensures only the best recommendations are presented. To do this effectively, however, each agent needs to learn the users’ interests and adapt its recommending behaviour accordingly. To this end, in this paper, we develop a reinforcement learning and Boltzmann exploration strategy that the recommender agents can use for these tasks. We then demonstrate that this strategy helps the agents to effectively obtain information about the users’ interests which, in turn, speeds up the market convergence and enables the system to rapidly highlight the best recommendations

Crossref

Southampton (e-Prints Soton)

Spiral - Imperial College Digital Repository

The Complexity of Graph-Based Reductions for Reachability in Markov Decision Processes

Author: AL Strehl
C Baier
C Courcoubetis
C Dehnert
Krishnendu Chatterjee
L Valiant
LP Kaelbling
M Kwiatkowska
M Steinmetz
ML Puterman
N Fijalkow
PR D’Argenio
S Fortune
SJ Russell
T Brázdil
T Eilam-Tzoreff
Publication venue
Publication date: 01/01/2018
Field of study

We study the never-worse relation (NWR) for Markov decision processes with an infinite-horizon reachability objective. A state q is never worse than a state p if the maximal probability of reaching the target set of states from p is at most the same value from q, regard- less of the probabilities labelling the transitions. Extremal-probability states, end components, and essential states are all special cases of the equivalence relation induced by the NWR. Using the NWR, states in the same equivalence class can be collapsed. Then, actions leading to sub- optimal states can be removed. We show the natural decision problem associated to computing the NWR is coNP-complete. Finally, we ex- tend a previously known incomplete polynomial-time iterative algorithm to under-approximate the NWR

arXiv.org e-Print Archive

Crossref

Institutional Repository Universiteit Antwerpen

DI-fusion

Evolving Symbolic Controllers

Author: D. Floreano
H.-P. Schwefel
J.R. Millan
L. P. Kaelbling
M. Keijzer
R. A. Brooks
Publication venue: Springer Verlag
Publication date: 01/01/2003
Field of study

International audienceThe idea of symbolic controllers tries to bridge the gap between the top-down manual design of the controller architecture, as advocated in Brooks' subsumption architecture, and the bottom-up designer-free approach that is now standard within the Evolutionary Robotics community. The designer provides a set of elementary behavior, and evolution is given the goal of assembling them to solve complex tasks. Two experiments are presented, demonstrating the efficiency and showing the recursiveness of this approach. In particular, the sensitivity with respect to the proposed elementary behaviors, and the robustness w.r.t. generalization of the resulting controllers are studied in detail

HAL-CentraleSupelec

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Polytechnique

HAL-Rennes 1

A two step algorithm for learning from unspecific reinforcement

Author: Barto A G
Biehl M
Biehl M
Bös S
Hertz J
Ion-Olimpiu Stamatescu
Kaelbling L P
Kinouchi O
Mlodinow L
Reimer Kühn
Stamatescu I-O
Stamatescu I-O
Sutton R S
Vallet F
Watkins C J C H
Publication venue: 'IOP Publishing'
Publication date: 01/01/1999
Field of study

We study a simple learning model based on the Hebb rule to cope with "delayed", unspecific reinforcement. In spite of the unspecific nature of the information-feedback, convergence to asymptotically perfect generalization is observed, with a rate depending, however, in a non- universal way on learning parameters. Asymptotic convergence can be as fast as that of Hebbian learning, but may be slower. Moreover, for a certain range of parameter settings, it depends on initial conditions whether the system can reach the regime of asymptotically perfect generalization, or rather approaches a stationary state of poor generalization.Comment: 13 pages LaTeX, 4 figures, note on biologically motivated stochastic variant of the algorithm adde

arXiv.org e-Print Archive

CiteSeerX

Crossref