Search CORE

295 research outputs found

Competition in Social Networks: Emergence of a Scale-free Leadership Structure and Collective Efficiency

Author: B. W. Arthur
D. J. Watts
G. Korniss
Kevin E. Bassler
L. P. Kaelbling
M. Anghel
S. A. Kauffman
Z. Toroczkai
Zoltán Toroczkai
Publication venue: 'American Physical Society (APS)'
Publication date: 30/07/2003
Field of study

Using the minority game as a model for competition dynamics, we investigate the effects of inter-agent communications on the global evolution of the dynamics of a society characterized by competition for limited resources. The agents communicate across a social network with small-world character that forms the static substrate of a second network, the influence network, which is dynamically coupled to the evolution of the game. The influence network is a directed network, defined by the inter-agent communication links on the substrate along which communicated information is acted upon. We show that the influence network spontaneously develops hubs with a broad distribution of in-degrees, defining a robust leadership structure that is scale-free. Furthermore, in realistic parameter ranges, facilitated by information exchange on the network, agents can generate a high degree of cooperation making the collective almost maximally efficient.Comment: 4 pages, 2 postscript figures include

arXiv.org e-Print Archive

Crossref

Learning Users’ Interests in a Market-Based Recommender System

Author: J. Herlocker
L.P. Kaelbling
M. Montaner
P. Resnick
T. Mitchell
Y.Z. Wei
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Recommender systems are widely used to cope with the problem of information overload and, consequently, many recommendation methods have been developed. However, no one technique is best for all users in all situations. To combat this, we have previously developed a market-based recommender system that allows multiple agents (each representing a different recommendation method or system) to compete with one another to present their best recommendations to the user. Our marketplace thus coordinates multiple recommender agents and ensures only the best recommendations are presented. To do this effectively, however, each agent needs to learn the users’ interests and adapt its recommending behaviour accordingly. To this end, in this paper, we develop a reinforcement learning and Boltzmann exploration strategy that the recommender agents can use for these tasks. We then demonstrate that this strategy helps the agents to effectively obtain information about the users’ interests which, in turn, speeds up the market convergence and enables the system to rapidly highlight the best recommendations

Crossref

Southampton (e-Prints Soton)

Spiral - Imperial College Digital Repository

Self-Modification of Policy and Utility Function in Rational Agents

Author: B Hibbard
D Dewey
D Silver
J Schmidhuber
L Orseau
L Orseau
L Orseau
LP Kaelbling
M Hutter
M Hutter
M Ring
N Bostrom
R Sutton
RV Yampolskiy
S Legg
V Mnih
Publication venue
Publication date: 10/05/2016
Field of study

Any agent that is part of the environment it interacts with and has versatile actuators (such as arms and fingers), will in principle have the ability to self-modify -- for example by changing its own source code. As we continue to create more and more intelligent agents, chances increase that they will learn about this ability. The question is: will they want to use it? For example, highly intelligent systems may find ways to change their goals to something more easily achievable, thereby `escaping' the control of their designers. In an important paper, Omohundro (2008) argued that goal preservation is a fundamental drive of any intelligent system, since a goal is more likely to be achieved if future versions of the agent strive towards the same goal. In this paper, we formalise this argument in general reinforcement learning, and explore situations where it fails. Our conclusion is that the self-modification possibility is harmless if and only if the value function of the agent anticipates the consequences of self-modifications and use the current utility function when evaluating the future.Comment: Artificial General Intelligence (AGI) 201

arXiv.org e-Print Archive

Crossref

The Australian National University

Towards a Universal Theory of Artificial Intelligence based on Algorithmic Probability and Sequential Decision Theory

Author: A. N. Kolmogorov
D. P. Bertsekas
D. P. Bertsekas
G. J. Chaitin
J. Schmidhuber
J. Schmidhuber
L. A. Levin
L. A. Levin
L. P. Kaelbling
M. Feder
P. Gács
R. Bellman
R. J. Solomonoff
R. J. Solomonoff
R. Sutton
S. J. Russell
Publication venue
Publication date: 01/01/2000
Field of study

Decision theory formally solves the problem of rational agents in uncertain worlds if the true environmental probability distribution is known. Solomonoff's theory of universal induction formally solves the problem of sequence prediction for unknown distribution. We unify both theories and give strong arguments that the resulting universal AIXI model behaves optimal in any computable environment. The major drawback of the AIXI model is that it is uncomputable. To overcome this problem, we construct a modified algorithm AIXI^tl, which is still superior to any other time t and space l bounded agent. The computation time of AIXI^tl is of the order t x 2^l.Comment: 8 two-column pages, latex2e, 1 figure, submitted to ijca

arXiv.org e-Print Archive

CiteSeerX

Crossref

The Australian National University

A two step algorithm for learning from unspecific reinforcement

Author: Barto A G
Biehl M
Biehl M
Bös S
Hertz J
Ion-Olimpiu Stamatescu
Kaelbling L P
Kinouchi O
Mlodinow L
Reimer Kühn
Stamatescu I-O
Stamatescu I-O
Sutton R S
Vallet F
Watkins C J C H
Publication venue: 'IOP Publishing'
Publication date: 01/01/1999
Field of study

We study a simple learning model based on the Hebb rule to cope with "delayed", unspecific reinforcement. In spite of the unspecific nature of the information-feedback, convergence to asymptotically perfect generalization is observed, with a rate depending, however, in a non- universal way on learning parameters. Asymptotic convergence can be as fast as that of Hebbian learning, but may be slower. Moreover, for a certain range of parameter settings, it depends on initial conditions whether the system can reach the regime of asymptotically perfect generalization, or rather approaches a stationary state of poor generalization.Comment: 13 pages LaTeX, 4 figures, note on biologically motivated stochastic variant of the algorithm adde

arXiv.org e-Print Archive

CiteSeerX

Crossref

Perceptual Context in Cognitive Hierarchies

Author: Charles D. Gilbert
David H. Hubel
George Konidaris
GL Drescher
Irving Biederman
J Hawkins
J Johnson
J Pearl
JS Albus
Leslie Pack Kaelbling
M Minsky
NJ Nilsson
P Cavanagh
RA Brooks
S Beer
TG Dietterich
V Lepetit
VF Turchin
WR Ashby
Publication venue
Publication date: 07/01/2018
Field of study

Cognition does not only depend on bottom-up sensor feature abstraction, but also relies on contextual information being passed top-down. Context is higher level information that helps to predict belief states at lower levels. The main contribution of this paper is to provide a formalisation of perceptual context and its integration into a new process model for cognitive hierarchies. Several simple instantiations of a cognitive hierarchy are used to illustrate the role of context. Notably, we demonstrate the use context in a novel approach to visually track the pose of rigid objects with just a 2D camera

arXiv.org e-Print Archive

Crossref

Bayesian optimization for materials design

Author: A Booker
A Forrester
AB Gelman
AIJ Forrester
B Ankenman
BE Stuckman
CE Rasmussen
D Huang
D Huang
David Ginsbourger
Diana M. Negoescu
DR Jones
HJ Kushner
J Bect
J Knowles
J Mockus
J Villemonteix
J Xie
LP Kaelbling
Noel Cressie
PI Frazier
PI Frazier
PI Frazier
R Waeber
RA Howard
RS Sutton
Sethuraman Sankaran
TJ Santner
W Scott
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/06/2015
Field of study

We introduce Bayesian optimization, a technique developed for optimizing time-consuming engineering simulations and for fitting machine learning models on large datasets. Bayesian optimization guides the choice of experiments during materials design and discovery to find good material designs in as few experiments as possible. We focus on the case when materials designs are parameterized by a low-dimensional vector. Bayesian optimization is built on a statistical technique called Gaussian process regression, which allows predicting the performance of a new design based on previously tested designs. After providing a detailed introduction to Gaussian process regression, we introduce two Bayesian optimization methods: expected improvement, for design problems with noise-free evaluations; and the knowledge-gradient method, which generalizes expected improvement and may be used in design problems with noisy evaluations. Both methods are derived using a value-of-information analysis, and enjoy one-step Bayes-optimality

arXiv.org e-Print Archive

Crossref

Active Learning in Persistent Surveillance UAV Missions

Author: Astrom K. J.
Barto A.
Bertuccelli L.F.
Howard R. A.
Iyengar G.
Kaelbling L. P.
Moore A. W.
Nilim A.
Puterman M. L.
Russell S. J.
Tan M.
Watkins C.
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date: 01/04/2009
Field of study

The performance of many complex UAV decision-making problems can be extremely sensitive to small errors in the model parameters. One way of mitigating this sensitivity is by designing algorithms that more effectively learn the model throughout the course of a mission. This paper addresses this important problem by considering model uncertainty in a multi-agent Markov Decision Process (MDP) and using an active learning approach to quickly learn transition model parameters. We build on previous research that allowed UAVs to passively update model parameter estimates by incorporating new state transition observations. In this work, however, the UAVs choose to actively reduce the uncertainty in their model parameters by taking exploratory and informative actions. These actions result in a faster adaptation and, by explicitly accounting for UAV fuel dynamics, also mitigates the risk of the exploration. This paper compares the nominal, passive learning approach against two methods for incorporating active learning into the MDP framework: (1) All state transitions are rewarded equally, and (2) State transition rewards are weighted according to the expected resulting reduction in the variance of the model parameter. In both cases, agent behaviors emerge that enable faster convergence of the uncertain model parameters to their true values

DSpace@MIT

Crossref

Toward Automatic Verification of Multiagent Systems for Training Simulations

Author: J. Klatt
J.M. Kim
L.P. Kaelbling
M. Tambe
W.K. Hastings
W.R. Gilks
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Abstract. Advances in multiagent systems have led to their successful applica-tion in experiential training simulations, where students learn by interacting with agents who represent people, groups, structures, etc. These multiagent simula-tions must model the training scenario so that the students ’ success is correlated with the degree to which they follow the intended pedagogy. As these simula-tions increase in size and richness, it becomes harder to guarantee that the agents accurately encode the pedagogy. Testing with human subjects provides the most accurate feedback, but it can explore only a limited subspace of simulation paths. In this paper, we present a mechanism for using human data to verify the degree to which the simulation encodes the intended pedagogy. Starting with an analysis of data from a deployed multiagent training simulation, we then present an auto-mated mechanism for using the human data to generate a distribution appropriate for sampling simulation paths. By generalizing from a small set of human data, the automated approach can systematically explore a much larger space of possi-ble training paths and verify the degree to which a multiagent training simulation adheres to its intended pedagogy

CiteSeerX

Crossref

Information theoretic approach to interactive learning

Author: Atkinson A. C. Bogacka B. Zhiglkilavskify A. A. (Editors)
Balcan M.-F.
Box G.
Dasgupta S.
Engel A.
Fedorov V. V.
Pack-Kaelbling L.
S. Still
Schmidhuber J.
Shannon C. E.
Still S. Bialek W.
Still S. Crutchfield J. P. Ellison C.
Still S. Precup D.
Sutton R. S.
Tishby N.
Vapnik V.
Publication venue: 'IOP Publishing'
Publication date: 30/01/2009
Field of study

The principles of statistical mechanics and information theory play an important role in learning and have inspired both theory and the design of numerous machine learning algorithms. The new aspect in this paper is a focus on integrating feedback from the learner. A quantitative approach to interactive learning and adaptive behavior is proposed, integrating model- and decision-making into one theoretical framework. This paper follows simple principles by requiring that the observer's world model and action policy should result in maximal predictive power at minimal complexity. Classes of optimal action policies and of optimal models are derived from an objective function that reflects this trade-off between prediction and complexity. The resulting optimal models then summarize, at different levels of abstraction, the process's causal organization in the presence of the learner's actions. A fundamental consequence of the proposed principle is that the learner's optimal action policies balance exploration and control as an emerging property. Interestingly, the explorative component is present in the absence of policy randomness, i.e. in the optimal deterministic behavior. This is a direct result of requiring maximal predictive power in the presence of feedback.Comment: 6 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

EDP Sciences OAI-PMH repository (1.2.0)