1,130 research outputs found
ATNoSFERES revisited
ATNoSFERES is a Pittsburgh style Learning Classifier System (LCS) in which
the rules are represented as edges of an Augmented Transition Network.
Genotypes are strings of tokens of a stack-based language, whose execution
builds the labeled graph. The original ATNoSFERES, using a bitstring to
represent the language tokens, has been favorably compared in previous work to
several Michigan style LCSs architectures in the context of Non Markov
problems. Several modifications of ATNoSFERES are proposed here: the most
important one conceptually being a representational change: each token is now
represented by an integer, hence the genotype is a string of integers; several
other modifications of the underlying grammar language are also proposed. The
resulting ATNoSFERES-II is validated on several standard animat Non Markov
problems, on which it outperforms all previously published results in the LCS
literature. The reasons for these improvement are carefully analyzed, and some
assumptions are proposed on the underlying mechanisms in order to explain these
good results
Importance mixing: Improving sample reuse in evolutionary policy search methods
Deep neuroevolution, that is evolutionary policy search methods based on deep
neural networks, have recently emerged as a competitor to deep reinforcement
learning algorithms due to their better parallelization capabilities. However,
these methods still suffer from a far worse sample efficiency. In this paper we
investigate whether a mechanism known as "importance mixing" can significantly
improve their sample efficiency. We provide a didactic presentation of
importance mixing and we explain how it can be extended to reuse more samples.
Then, from an empirical comparison based on a simple benchmark, we show that,
though it actually provides better sample efficiency, it is still far from the
sample efficiency of deep reinforcement learning, though it is more stable
Identification of Invariant Sensorimotor Structures as a Prerequisite for the Discovery of Objects
Perceiving the surrounding environment in terms of objects is useful for any
general purpose intelligent agent. In this paper, we investigate a fundamental
mechanism making object perception possible, namely the identification of
spatio-temporally invariant structures in the sensorimotor experience of an
agent. We take inspiration from the Sensorimotor Contingencies Theory to define
a computational model of this mechanism through a sensorimotor, unsupervised
and predictive approach. Our model is based on processing the unsupervised
interaction of an artificial agent with its environment. We show how
spatio-temporally invariant structures in the environment induce regularities
in the sensorimotor experience of an agent, and how this agent, while building
a predictive model of its sensorimotor experience, can capture them as densely
connected subgraphs in a graph of sensory states connected by motor commands.
Our approach is focused on elementary mechanisms, and is illustrated with a set
of simple experiments in which an agent interacts with an environment. We show
how the agent can build an internal model of moving but spatio-temporally
invariant structures by performing a Spectral Clustering of the graph modeling
its overall sensorimotor experiences. We systematically examine properties of
the model, shedding light more globally on the specificities of the paradigm
with respect to methods based on the supervised processing of collections of
static images.Comment: 24 pages, 10 figures, published in Frontiers Robotics and A
A Hitchhiker's Guide to Statistical Comparisons of Reinforcement Learning Algorithms
Consistently checking the statistical significance of experimental results is
the first mandatory step towards reproducible science. This paper presents a
hitchhiker's guide to rigorous comparisons of reinforcement learning
algorithms. After introducing the concepts of statistical testing, we review
the relevant statistical tests and compare them empirically in terms of false
positive rate and statistical power as a function of the sample size (number of
seeds) and effect size. We further investigate the robustness of these tests to
violations of the most common hypotheses (normal distributions, same
distributions, equal variances). Beside simulations, we compare empirical
distributions obtained by running Soft-Actor Critic and Twin-Delayed Deep
Deterministic Policy Gradient on Half-Cheetah. We conclude by providing
guidelines and code to perform rigorous comparisons of RL algorithm
performances.Comment: 8 pages + supplementary materia
Gated networks: an inventory
Gated networks are networks that contain gating connections, in which the
outputs of at least two neurons are multiplied. Initially, gated networks were
used to learn relationships between two input sources, such as pixels from two
images. More recently, they have been applied to learning activity recognition
or multi-modal representations. The aims of this paper are threefold: 1) to
explain the basic computations in gated networks to the non-expert, while
adopting a standpoint that insists on their symmetric nature. 2) to serve as a
quick reference guide to the recent literature, by providing an inventory of
applications of these networks, as well as recent extensions to the basic
architecture. 3) to suggest future research directions and applications.Comment: Unpublished manuscript, 17 page
CLIC: Curriculum Learning and Imitation for object Control in non-rewarding environments
In this paper we study a new reinforcement learning setting where the
environment is non-rewarding, contains several possibly related objects of
various controllability, and where an apt agent Bob acts independently, with
non-observable intentions. We argue that this setting defines a realistic
scenario and we present a generic discrete-state discrete-action model of such
environments. To learn in this environment, we propose an unsupervised
reinforcement learning agent called CLIC for Curriculum Learning and Imitation
for Control. CLIC learns to control individual objects in its environment, and
imitates Bob's interactions with these objects. It selects objects to focus on
when training and imitating by maximizing its learning progress. We show that
CLIC is an effective baseline in our new setting. It can effectively observe
Bob to gain control of objects faster, even if Bob is not explicitly teaching.
It can also follow Bob when he acts as a mentor and provides ordered
demonstrations. Finally, when Bob controls objects that the agent cannot, or in
presence of a hierarchy between objects in the environment, we show that CLIC
ignores non-reproducible and already mastered interactions with objects,
resulting in a greater benefit from imitation
- …