22,212 research outputs found
POWERPLAY: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem
Most of computer science focuses on automatically solving given computational
problems. I focus on automatically inventing or discovering problems in a way
inspired by the playful behavior of animals and humans, to train a more and
more general problem solver from scratch in an unsupervised fashion. Consider
the infinite set of all computable descriptions of tasks with possibly
computable solutions. The novel algorithmic framework POWERPLAY (2011)
continually searches the space of possible pairs of new tasks and modifications
of the current problem solver, until it finds a more powerful problem solver
that provably solves all previously learned tasks plus the new one, while the
unmodified predecessor does not. Wow-effects are achieved by continually making
previously learned skills more efficient such that they require less time and
space. New skills may (partially) re-use previously learned skills. POWERPLAY's
search orders candidate pairs of tasks and solver modifications by their
conditional computational (time & space) complexity, given the stored
experience so far. The new task and its corresponding task-solving skill are
those first found and validated. The computational costs of validating new
tasks need not grow with task repertoire size. POWERPLAY's ongoing search for
novelty keeps breaking the generalization abilities of its present solver. This
is related to Goedel's sequence of increasingly powerful formal theories based
on adding formerly unprovable statements to the axioms without affecting
previously provable theorems. The continually increasing repertoire of problem
solving procedures can be exploited by a parallel search for solutions to
additional externally posed tasks. POWERPLAY may be viewed as a greedy but
practical implementation of basic principles of creativity. A first
experimental analysis can be found in separate papers [53,54].Comment: 21 pages, additional connections to previous work, references to
first experiments with POWERPLA
Progressive growing of self-organized hierarchical representations for exploration
Designing agent that can autonomously discover and learn a diversity of
structures and skills in unknown changing environments is key for lifelong
machine learning. A central challenge is how to learn incrementally
representations in order to progressively build a map of the discovered
structures and re-use it to further explore. To address this challenge, we
identify and target several key functionalities. First, we aim to build lasting
representations and avoid catastrophic forgetting throughout the exploration
process. Secondly we aim to learn a diversity of representations allowing to
discover a "diversity of diversity" of structures (and associated skills) in
complex high-dimensional environments. Thirdly, we target representations that
can structure the agent discoveries in a coarse-to-fine manner. Finally, we
target the reuse of such representations to drive exploration toward an
"interesting" type of diversity, for instance leveraging human guidance.
Current approaches in state representation learning rely generally on
monolithic architectures which do not enable all these functionalities.
Therefore, we present a novel technique to progressively construct a Hierarchy
of Observation Latent Models for Exploration Stratification, called HOLMES.
This technique couples the use of a dynamic modular model architecture for
representation learning with intrinsically-motivated goal exploration processes
(IMGEPs). The paper shows results in the domain of automated discovery of
diverse self-organized patterns, considering as testbed the experimental
framework from Reinke et al. (2019)
Computational and Robotic Models of Early Language Development: A Review
We review computational and robotics models of early language learning and
development. We first explain why and how these models are used to understand
better how children learn language. We argue that they provide concrete
theories of language learning as a complex dynamic system, complementing
traditional methods in psychology and linguistics. We review different modeling
formalisms, grounded in techniques from machine learning and artificial
intelligence such as Bayesian and neural network approaches. We then discuss
their role in understanding several key mechanisms of language development:
cross-situational statistical learning, embodiment, situated social
interaction, intrinsically motivated learning, and cultural evolution. We
conclude by discussing future challenges for research, including modeling of
large-scale empirical data about language acquisition in real-world
environments.
Keywords: Early language learning, Computational and robotic models, machine
learning, development, embodiment, social interaction, intrinsic motivation,
self-organization, dynamical systems, complexity.Comment: to appear in International Handbook on Language Development, ed. J.
Horst and J. von Koss Torkildsen, Routledg
Linear combination of one-step predictive information with an external reward in an episodic policy gradient setting: a critical analysis
One of the main challenges in the field of embodied artificial intelligence
is the open-ended autonomous learning of complex behaviours. Our approach is to
use task-independent, information-driven intrinsic motivation(s) to support
task-dependent learning. The work presented here is a preliminary step in which
we investigate the predictive information (the mutual information of the past
and future of the sensor stream) as an intrinsic drive, ideally supporting any
kind of task acquisition. Previous experiments have shown that the predictive
information (PI) is a good candidate to support autonomous, open-ended learning
of complex behaviours, because a maximisation of the PI corresponds to an
exploration of morphology- and environment-dependent behavioural regularities.
The idea is that these regularities can then be exploited in order to solve any
given task. Three different experiments are presented and their results lead to
the conclusion that the linear combination of the one-step PI with an external
reward function is not generally recommended in an episodic policy gradient
setting. Only for hard tasks a great speed-up can be achieved at the cost of an
asymptotic performance lost
- …