129 research outputs found
Higher coordination with less control - A result of information maximization in the sensorimotor loop
This work presents a novel learning method in the context of embodied
artificial intelligence and self-organization, which has as few assumptions and
restrictions as possible about the world and the underlying model. The learning
rule is derived from the principle of maximizing the predictive information in
the sensorimotor loop. It is evaluated on robot chains of varying length with
individually controlled, non-communicating segments. The comparison of the
results shows that maximizing the predictive information per wheel leads to a
higher coordinated behavior of the physically connected robots compared to a
maximization per robot. Another focus of this paper is the analysis of the
effect of the robot chain length on the overall behavior of the robots. It will
be shown that longer chains with less capable controllers outperform those of
shorter length and more complex controllers. The reason is found and discussed
in the information-geometric interpretation of the learning process
POWERPLAY: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem
Most of computer science focuses on automatically solving given computational
problems. I focus on automatically inventing or discovering problems in a way
inspired by the playful behavior of animals and humans, to train a more and
more general problem solver from scratch in an unsupervised fashion. Consider
the infinite set of all computable descriptions of tasks with possibly
computable solutions. The novel algorithmic framework POWERPLAY (2011)
continually searches the space of possible pairs of new tasks and modifications
of the current problem solver, until it finds a more powerful problem solver
that provably solves all previously learned tasks plus the new one, while the
unmodified predecessor does not. Wow-effects are achieved by continually making
previously learned skills more efficient such that they require less time and
space. New skills may (partially) re-use previously learned skills. POWERPLAY's
search orders candidate pairs of tasks and solver modifications by their
conditional computational (time & space) complexity, given the stored
experience so far. The new task and its corresponding task-solving skill are
those first found and validated. The computational costs of validating new
tasks need not grow with task repertoire size. POWERPLAY's ongoing search for
novelty keeps breaking the generalization abilities of its present solver. This
is related to Goedel's sequence of increasingly powerful formal theories based
on adding formerly unprovable statements to the axioms without affecting
previously provable theorems. The continually increasing repertoire of problem
solving procedures can be exploited by a parallel search for solutions to
additional externally posed tasks. POWERPLAY may be viewed as a greedy but
practical implementation of basic principles of creativity. A first
experimental analysis can be found in separate papers [53,54].Comment: 21 pages, additional connections to previous work, references to
first experiments with POWERPLA
Spatial representation for navigation in animats
This article considers the problem of spatial representation for animat navigation systems. It is proposed that the global navigation task, or "wayfinding, " is best supported by multiple interacting subsystems, each of which builds its own partial representation of relevant world knowledge. Evidence from the study of animal navigation is reviewed to demonstrate that similar principles underlie the wayfinding behavior of animals, including humans. A simulated wayfinding system is described that embodies and illustrates several of the themes identified with animat navigation. This system constructs a network of partial models of the quantitative spatial relations between groups of salient landmarks. Navigation tasks are solved by propagating egocentric view information through this network, using a simple but effective heuristic to arbitrate between multiple solutions
An Intrinsically-Motivated Approach for Learning Highly Exploring and Fast Mixing Policies
What is a good exploration strategy for an agent that interacts with an
environment in the absence of external rewards? Ideally, we would like to get a
policy driving towards a uniform state-action visitation (highly exploring) in
a minimum number of steps (fast mixing), in order to ease efficient learning of
any goal-conditioned policy later on. Unfortunately, it is remarkably arduous
to directly learn an optimal policy of this nature. In this paper, we propose a
novel surrogate objective for learning highly exploring and fast mixing
policies, which focuses on maximizing a lower bound to the entropy of the
steady-state distribution induced by the policy. In particular, we introduce
three novel lower bounds, that lead to as many optimization problems, that
tradeoff the theoretical guarantees with computational complexity. Then, we
present a model-based reinforcement learning algorithm, IDEAL, to learn
an optimal policy according to the introduced objective. Finally, we provide an
empirical evaluation of this algorithm on a set of hard-exploration tasks.Comment: In 34th AAAI Conference on Artificial Intelligence (AAAI 2020
Old tricks, new dogs : ethology and interactive creatures
Thesis (Ph. D.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1997.Includes bibliographical references (p. 135-140).by Bruce Mitchell Blumberg.Ph.D
Neuroevolution in Games: State of the Art and Open Challenges
This paper surveys research on applying neuroevolution (NE) to games. In
neuroevolution, artificial neural networks are trained through evolutionary
algorithms, taking inspiration from the way biological brains evolved. We
analyse the application of NE in games along five different axes, which are the
role NE is chosen to play in a game, the different types of neural networks
used, the way these networks are evolved, how the fitness is determined and
what type of input the network receives. The article also highlights important
open research challenges in the field.Comment: - Added more references - Corrected typos - Added an overview table
(Table 1
Perceptual abstraction and attention
This is a report on the preliminary achievements of WP4 of the IM-CleVeR project on abstraction for cumulative learning, in particular directed to: (1) producing algorithms to develop abstraction features under top-down action influence; (2) algorithms for supporting detection of change in motion pictures; (3) developing attention and vergence control on the basis of locally computed rewards; (4) searching abstract representations suitable for the LCAS framework; (5) developing predictors based on information theory to support novelty detection. The report is organized around these 5 tasks that are part of WP4. We provide a synthetic description of the work done for each task by the partners
- …