295 research outputs found
Competition in Social Networks: Emergence of a Scale-free Leadership Structure and Collective Efficiency
Using the minority game as a model for competition dynamics, we investigate
the effects of inter-agent communications on the global evolution of the
dynamics of a society characterized by competition for limited resources. The
agents communicate across a social network with small-world character that
forms the static substrate of a second network, the influence network, which is
dynamically coupled to the evolution of the game. The influence network is a
directed network, defined by the inter-agent communication links on the
substrate along which communicated information is acted upon. We show that the
influence network spontaneously develops hubs with a broad distribution of
in-degrees, defining a robust leadership structure that is scale-free.
Furthermore, in realistic parameter ranges, facilitated by information exchange
on the network, agents can generate a high degree of cooperation making the
collective almost maximally efficient.Comment: 4 pages, 2 postscript figures include
Learning Users’ Interests in a Market-Based Recommender System
Recommender systems are widely used to cope with the problem of information overload and, consequently, many recommendation methods have been developed. However, no one technique is best for all users in all situations. To combat this, we have previously developed a market-based recommender system that allows multiple agents (each representing a different recommendation method or system) to compete with one another to present their best recommendations to the user. Our marketplace thus coordinates multiple recommender agents and ensures only the best recommendations are presented. To do this effectively, however, each agent needs to learn the users’ interests and adapt its recommending behaviour accordingly. To this end, in this paper, we develop a reinforcement learning and Boltzmann exploration strategy that the recommender agents can use for these tasks. We then demonstrate that this strategy helps the agents to effectively obtain information about the users’ interests which, in turn, speeds up the market convergence and enables the system to rapidly highlight the best recommendations
Self-Modification of Policy and Utility Function in Rational Agents
Any agent that is part of the environment it interacts with and has versatile
actuators (such as arms and fingers), will in principle have the ability to
self-modify -- for example by changing its own source code. As we continue to
create more and more intelligent agents, chances increase that they will learn
about this ability. The question is: will they want to use it? For example,
highly intelligent systems may find ways to change their goals to something
more easily achievable, thereby `escaping' the control of their designers. In
an important paper, Omohundro (2008) argued that goal preservation is a
fundamental drive of any intelligent system, since a goal is more likely to be
achieved if future versions of the agent strive towards the same goal. In this
paper, we formalise this argument in general reinforcement learning, and
explore situations where it fails. Our conclusion is that the self-modification
possibility is harmless if and only if the value function of the agent
anticipates the consequences of self-modifications and use the current utility
function when evaluating the future.Comment: Artificial General Intelligence (AGI) 201
Towards a Universal Theory of Artificial Intelligence based on Algorithmic Probability and Sequential Decision Theory
Decision theory formally solves the problem of rational agents in uncertain
worlds if the true environmental probability distribution is known.
Solomonoff's theory of universal induction formally solves the problem of
sequence prediction for unknown distribution. We unify both theories and give
strong arguments that the resulting universal AIXI model behaves optimal in any
computable environment. The major drawback of the AIXI model is that it is
uncomputable. To overcome this problem, we construct a modified algorithm
AIXI^tl, which is still superior to any other time t and space l bounded agent.
The computation time of AIXI^tl is of the order t x 2^l.Comment: 8 two-column pages, latex2e, 1 figure, submitted to ijca
A two step algorithm for learning from unspecific reinforcement
We study a simple learning model based on the Hebb rule to cope with
"delayed", unspecific reinforcement. In spite of the unspecific nature of the
information-feedback, convergence to asymptotically perfect generalization is
observed, with a rate depending, however, in a non- universal way on learning
parameters. Asymptotic convergence can be as fast as that of Hebbian learning,
but may be slower. Moreover, for a certain range of parameter settings, it
depends on initial conditions whether the system can reach the regime of
asymptotically perfect generalization, or rather approaches a stationary state
of poor generalization.Comment: 13 pages LaTeX, 4 figures, note on biologically motivated stochastic
variant of the algorithm adde
Perceptual Context in Cognitive Hierarchies
Cognition does not only depend on bottom-up sensor feature abstraction, but
also relies on contextual information being passed top-down. Context is higher
level information that helps to predict belief states at lower levels. The main
contribution of this paper is to provide a formalisation of perceptual context
and its integration into a new process model for cognitive hierarchies. Several
simple instantiations of a cognitive hierarchy are used to illustrate the role
of context. Notably, we demonstrate the use context in a novel approach to
visually track the pose of rigid objects with just a 2D camera
Bayesian optimization for materials design
We introduce Bayesian optimization, a technique developed for optimizing
time-consuming engineering simulations and for fitting machine learning models
on large datasets. Bayesian optimization guides the choice of experiments
during materials design and discovery to find good material designs in as few
experiments as possible. We focus on the case when materials designs are
parameterized by a low-dimensional vector. Bayesian optimization is built on a
statistical technique called Gaussian process regression, which allows
predicting the performance of a new design based on previously tested designs.
After providing a detailed introduction to Gaussian process regression, we
introduce two Bayesian optimization methods: expected improvement, for design
problems with noise-free evaluations; and the knowledge-gradient method, which
generalizes expected improvement and may be used in design problems with noisy
evaluations. Both methods are derived using a value-of-information analysis,
and enjoy one-step Bayes-optimality
Active Learning in Persistent Surveillance UAV Missions
The performance of many complex UAV decision-making problems can be extremely sensitive to small errors in the model parameters. One way of mitigating this sensitivity is by designing algorithms that more effectively learn the model throughout the course of a mission. This paper addresses this important problem by considering model uncertainty in a multi-agent Markov Decision Process (MDP) and using an active learning approach to quickly learn transition model parameters. We build on previous research that allowed UAVs to passively update model parameter estimates by incorporating new state transition observations. In this work, however, the UAVs choose to actively reduce the uncertainty in their model parameters by taking exploratory and informative actions. These actions result in a faster adaptation and, by explicitly accounting for UAV fuel dynamics, also mitigates the risk of the exploration. This paper compares the nominal, passive learning approach against two methods for incorporating active learning into the MDP framework: (1) All state transitions are rewarded equally, and (2) State transition rewards are weighted according to the expected resulting reduction in the variance of the model parameter. In both cases, agent behaviors emerge that enable faster convergence of the uncertain model parameters to their true values
Toward Automatic Verification of Multiagent Systems for Training Simulations
Abstract. Advances in multiagent systems have led to their successful applica-tion in experiential training simulations, where students learn by interacting with agents who represent people, groups, structures, etc. These multiagent simula-tions must model the training scenario so that the students ’ success is correlated with the degree to which they follow the intended pedagogy. As these simula-tions increase in size and richness, it becomes harder to guarantee that the agents accurately encode the pedagogy. Testing with human subjects provides the most accurate feedback, but it can explore only a limited subspace of simulation paths. In this paper, we present a mechanism for using human data to verify the degree to which the simulation encodes the intended pedagogy. Starting with an analysis of data from a deployed multiagent training simulation, we then present an auto-mated mechanism for using the human data to generate a distribution appropriate for sampling simulation paths. By generalizing from a small set of human data, the automated approach can systematically explore a much larger space of possi-ble training paths and verify the degree to which a multiagent training simulation adheres to its intended pedagogy
Information theoretic approach to interactive learning
The principles of statistical mechanics and information theory play an
important role in learning and have inspired both theory and the design of
numerous machine learning algorithms. The new aspect in this paper is a focus
on integrating feedback from the learner. A quantitative approach to
interactive learning and adaptive behavior is proposed, integrating model- and
decision-making into one theoretical framework. This paper follows simple
principles by requiring that the observer's world model and action policy
should result in maximal predictive power at minimal complexity. Classes of
optimal action policies and of optimal models are derived from an objective
function that reflects this trade-off between prediction and complexity. The
resulting optimal models then summarize, at different levels of abstraction,
the process's causal organization in the presence of the learner's actions. A
fundamental consequence of the proposed principle is that the learner's optimal
action policies balance exploration and control as an emerging property.
Interestingly, the explorative component is present in the absence of policy
randomness, i.e. in the optimal deterministic behavior. This is a direct result
of requiring maximal predictive power in the presence of feedback.Comment: 6 page
- …