10 research outputs found
Intrinsically Motivated Learning of Visual Motion Perception and Smooth Pursuit
We extend the framework of efficient coding, which has been used to model the
development of sensory processing in isolation, to model the development of the
perception/action cycle. Our extension combines sparse coding and reinforcement
learning so that sensory processing and behavior co-develop to optimize a
shared intrinsic motivational signal: the fidelity of the neural encoding of
the sensory input under resource constraints. Applying this framework to a
model system consisting of an active eye behaving in a time varying
environment, we find that this generic principle leads to the simultaneous
development of both smooth pursuit behavior and model neurons whose properties
are similar to those of primary visual cortical neurons selective for different
directions of visual motion. We suggest that this general principle may form
the basis for a unified and integrated explanation of many perception/action
loops.Comment: 6 pages, 5 figure
A neural network model of curiosity-driven categorization
Infants are curious learners who drive their own cognitive development by imposing structure on their learning environments as they explore. Understanding the mechanisms underlying this curiosity is therefore critical to our understanding of development. However, very few studies have examined the role of curiosity in infantsâ learning, and in particular, their categorization; what structure infants impose on their own environment and how this affects learning is therefore unclear. The results of studies in which the learning environment is structured a priori are contradictory: while some suggest that complexity optimizes learning, others suggest that minimal complexity is optimal, and still others report a Goldilocks effect by which intermediate difficulty is best. We used an autoencoder network to capture empirical data in which 10-month old infantsâ categorization was supported by maximal complexity [1]. When we allowed the same model to choose stimulus sequences based on a âcuriosityâ metric which took into account the modelâs internal states as well as stimulus features, categorization was better than selection based solely on stimulus characteristics. The sequences of stimuli chosen by the model in the curiosity condition showed a Goldilocks effect with intermediate complexity. This study provides the first computational investigation of curiosity-based categorization, and points to the importance characterizing development as emerging from the relationship between the learner and its environment
Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey
Building autonomous machines that can explore open-ended environments,
discover possible interactions and build repertoires of skills is a general
objective of artificial intelligence. Developmental approaches argue that this
can only be achieved by : intrinsically motivated learning
agents that can learn to represent, generate, select and solve their own
problems. In recent years, the convergence of developmental approaches with
deep reinforcement learning (RL) methods has been leading to the emergence of a
new field: . Developmental RL is
concerned with the use of deep RL algorithms to tackle a developmental problem
-- the -
. The self-generation of goals requires the learning
of compact goal encodings as well as their associated goal-achievement
functions. This raises new challenges compared to standard RL algorithms
originally designed to tackle pre-defined sets of goals using external reward
signals. The present paper introduces developmental RL and proposes a
computational framework based on goal-conditioned RL to tackle the
intrinsically motivated skills acquisition problem. It proceeds to present a
typology of the various goal representations used in the literature, before
reviewing existing methods to learn to represent and prioritize goals in
autonomous systems. We finally close the paper by discussing some open
challenges in the quest of intrinsically motivated skills acquisition
Robust active binocular vision through intrinsically motivated learning
The efficient coding hypothesis posits that sensory systems of animals strive to encode sensory signals efficiently by taking into account the redundancies in them. This principle has been very successful in explaining response properties of visual sensory neurons as adaptations to the statistics of natural images. Recently, we have begun to extend the efficient coding hypothesis to active perception through a form of intrinsically motivated learning: a sensory model learns an efficient code for the sensory signals while a reinforcement learner generates movements of the sense organs to improve the encoding of the signals. To this end, it receives an intrinsically generated reinforcement signal indicating how well the sensory model encodes the data. This approach has been tested in the context of binocular vison, leading to the autonomous development of disparity tuning and vergence control. Here we systematically investigate the robustness of the new approach in the context of a binocular vision system implemented on a robot. Robustness is an important aspect that reflects the ability of the system to deal with unmodeled disturbances or events, such as insults to the system that displace the stereo cameras. To demonstrate the robustness of our method and its ability to self-calibrate, we introduce various perturbations and test if and how the system recovers from them. We find that (1) the system can fully recover from a perturbation that can be compensated through the system's motor degrees of freedom, (2) performance degrades gracefully if the system cannot use its motor degrees of freedom to compensate for the perturbation, and (3) recovery from a perturbation is improved if both the sensory encoding and the behavior policy can adapt to the perturbation. Overall, this work demonstrates that our intrinsically motivated learning approach for efficient coding in active perception gives rise to a self-calibrating perceptual system of high robustness
Final report key contents: main results accomplished by the EU-Funded project IM-CLeVeR - Intrinsically Motivated Cumulative Learning Versatile Robots
This document has the goal of presenting the main scientific and technological achievements of the project IM-CLeVeR. The document is organised as follows: 1. Project executive summary: a brief overview of the project vision, objectives and keywords. 2. Beneficiaries of the project and contacts: list of Teams (partners) of the project, Team Leaders and contacts. 3. Project context and objectives: the vision of the project and its overall objectives 4. Overview of work performed and main results achieved: a one page overview of the main results of the project 5. Overview of main results per partner: a bullet-point list of main results per partners 6. Main achievements in detail, per partner: a throughout explanation of the main results per partner (but including collaboration work), with also reference to the main publications supporting them
Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey
Building autonomous machines that can explore open-ended environments, discover possible interactions and autonomously build repertoires of skills is a general objective of artificial intelligence. Developmental approaches argue that this can only be achieved by autonomous and intrinsically motivated learning agents that can generate, select and learn to solve their own problems. In recent years, we have seen a convergence of developmental approaches, and developmental robotics in particular, with deep reinforcement learning (RL) methods, forming the new domain of developmental machine learning. Within this new domain, we review here a set of methods where deep RL algorithms are trained to tackle the developmental robotics problem of the autonomous acquisition of open-ended repertoires of skills. Intrinsically motivated goal-conditioned RL algorithms train agents to learn to represent, generate and pursue their own goals. The self-generation of goals requires the learning of compact goal encodings as well as their associated goal-achievement functions, which results in new challenges compared to traditional RL algorithms designed to tackle pre-defined sets of goals using external reward signals. This paper proposes a typology of these methods at the intersection of deep RL and developmental approaches, surveys recent approaches and discusses future avenues
Effizientes und stabiles online Lernen fĂŒr "Developmental Robots"
Recent progress in robotics and cognitive science has inspired a new generation of more versatile robots, so-called developmental robots. Many learning approaches for these robots are inspired by developmental processes and learning mechanisms observed in children. It is widely accepted that developmental robots must autonomously develop, acquire their skills, and cope with unforeseen challenges in unbounded environments through lifelong learning. Continuous online adaptation and intrinsically motivated learning are thus essential capabilities for these robots. However, the high sample-complexity of online learning and intrinsic motivation methods impedes the efficiency and practical feasibility of these methods for lifelong learning. Consequently, the majority of previous work has been demonstrated only in simulation. This thesis devises new methods and learning schemes to mitigate this problem and to permit direct online training on physical robots. A novel intrinsic motivation method is developed to drive the robotâs exploration to efficiently select what to learn. This method combines new knowledge-based and competence-based signals to increase sample-efficiency and to enable lifelong learning. While developmental robots typically acquire their skills through self-exploration, their autonomous development could be accelerated by additionally learning from humans. Yet there is hardly any research to integrate intrinsic motivation with learning from a teacher. The thesis therefore establishes a new learning scheme to integrate intrinsic motivation with learning from observation. The underlying exploration mechanism in the proposed learning schemes relies on Goal Babbling as a goal-directed method for learning direct inverse robot models online, from scratch, and in a learning while behaving fashion. Online learning of multiple solutions for redundant robots with this framework was missing. This thesis devises an incremental online associative network to enable simultaneous exploration and solution consolidation and establishes a new technique to stabilize the learning system. The proposed methods and learning schemes are demonstrated for acquiring reaching skills. Their efficiency, stability, and applicability are benchmarked in simulation and demonstrated on a physical 7-DoF Baxter robot arm.JĂŒngste Entwicklungen in der Robotik und den Kognitionswissenschaften haben zu einer Generation von vielseitigen Robotern gefĂŒhrt, die als âDevelopmental Robotsâ bezeichnet werden. Lernverfahren fĂŒr diese Roboter sind inspiriert von Lernmechanismen, die bei Kindern beobachtet wurden. âDevelopmental Robotsâ mĂŒssen autonom Fertigkeiten erwerben und unvorhergesehene Herausforderungen in uneingeschrĂ€nkten Umgebungen durch lebenslanges Lernen meistern. Kontinuierliches Anpassen und Lernen durch intrinsische Motivation sind daher wichtige Eigenschaften. Allerdings schrĂ€nkt der hohe Aufwand beim Generieren von Datenpunkten die praktische Nutzbarkeit solcher Verfahren ein. Daher wurde ein GroĂteil nur in Simulationen demonstriert. In dieser Arbeit werden daher neue Methoden konzipiert, um dieses Problem zu meistern und ein direktes Online-Training auf realen Robotern zu ermöglichen. Dazu wird eine neue intrinsisch motivierte Methode entwickelt, die wĂ€hrend der Umgebungsexploration effizient auswĂ€hlt, was gelernt wird. Sie kombiniert neue wissens- und kompetenzbasierte Signale, um die Sampling-Effizienz zu steigern und lebenslanges Lernen zu ermöglichen. WĂ€hrend âDevelopmental Robotsâ Fertigkeiten durch Selbstexploration erwerben, kann ihre Entwicklung durch Lernen durch Beobachten beschleunigt werden. Dennoch gibt es kaum Arbeiten, die intrinsische Motivation mit Lernen von interagierenden Lehrern verbinden. Die vorliegende Arbeit entwickelt ein neues Lernschema, das diese Verbindung schafft. Der in den vorgeschlagenen Lernmethoden genutzte Explorationsmechanismus beruht auf Goal Babbling, einer zielgerichteten Methode zum Lernen inverser Modelle, die online-fĂ€hig ist, kein Vorwissen benötigt und Lernen wĂ€hrend der AusfĂŒhrung von Bewegungen ermöglicht. Das Online-Lernen mehrerer Lösungen inverser Modelle redundanter Roboter mit Goal Babbling wurde bisher nicht erforscht. In dieser Arbeit wird dazu ein inkrementell lernendes, assoziatives neuronales Netz entwickelt und eine Methode konzipiert, die es stabilisiert. Das Netz ermöglicht deren gleichzeitige Exploration und Konsolidierung. Die vorgeschlagenen Verfahren werden fĂŒr das Greifen nach Objekten demonstriert. Ihre Effizienz, StabilitĂ€t und Anwendbarkeit werden simulativ verglichen und mit einem Roboter mit sieben Gelenken demonstriert
On the relationship between neuronal codes and mental models
Das ĂŒbergeordnete Ziel meiner Arbeit an dieser Dissertation
war ein besseres VerstÀndnis des Zusammenhangs
von mentalen Modellen
und den zugrundeliegenden Prinzipien,
die zur Selbstorganisation neuronaler Verschaltung fĂŒhren.
Die Dissertation besteht aus vier individuellen Publikationen,
die dieses Ziel aus unterschiedlichen Perspektiven angehen.
WÀhrend die Selbstorganisation von Sparse-Coding-ReprÀsentationen
in neuronalem Substrat
bereits ausgiebig untersucht worden ist,
sind viele Forschungsfragen dazu,
wie Sparse-Coding fĂŒr höhere, kognitive Prozesse genutzt werden könnte
noch offen.
Die ersten zwei Studien,
die in Kapitel 2 und Kapitel 3 enthalten sind,
behandeln die Frage,
inwieweit ReprÀsentationen, die mit Sparse-Coding entstehen,
mentalen Modellen entsprechen.
Wir haben folgende SelektivitÀten
in Sparse-Coding-ReprÀsentationen identifiziert:
mit Stereo-Bildern als Eingangsdaten
war die ReprĂ€sentation selektiv fĂŒr die DisparitĂ€ten von Bildstrukturen,
welche fĂŒr das AbschĂ€tzen der Entfernung der Strukturen zum Beobachter genutzt werden können.
AuĂerdem war die ReprĂ€sentation selektiv fĂŒr die die vorherrschende Orientierung in Texturen,
was fĂŒr das AbschĂ€tzen der Neigung von OberflĂ€chen genutzt werden kann.
Mit optischem Fluss von Eigenbewegung als Eingangsdaten
war die ReprĂ€sentation selektiv fĂŒr die Richtung der Eigenbewegung
in den sechs Freiheitsgraden.
Wegen des direkten Zusammenhangs der SelektivitÀten mit physikalischen Eigenschaften
können ReprÀsentationen, die mit Sparse-Coding entstehen,
als frĂŒhe sensorische Modelle der Umgebung dienen.
Die kognitiven Prozesse hinter rÀumlichem Wissen
ruhen auf mentalen Modellen, welche die Umgebung representieren.
Wir haben in der dritten Studie,
welche in Kapitel 4 enthalten ist,
ein topologisches Modell zur Navigation prÀsentiert,
Es beschreibt einen dualen Populations-Code,
bei dem der erste Populations-Code Orte anhand von Orts-Feldern (Place-Fields) kodiert
und der zweite Populations-Code Bewegungs-Instruktionen,
basierend auf der VerknĂŒpfung von Orts-Feldern, kodiert.
Der Fokus lag nicht auf der Implementation in biologischem Substrat
oder auf einer exakten Modellierung physiologischer Ergebnisse.
Das Modell ist eine biologisch plausible, einfache Methode zur Navigation,
welche sich an einen Zwischenschritt emergenter Navigations-FĂ€higkeiten
in einer evolutiven Navigations-Hierarchie annÀhert.
Unser automatisierter Test der Sehleistungen von MĂ€usen,
welcher in Kapitel 5 beschrieben wird,
ist ein Beispiel von Verhaltens-Tests
im Wahrnehmungs-Handlungs-Zyklus (Perception-Action-Cycle).
Das Ziel dieser Studie war die Quantifizierung des optokinetischen Reflexes.
Wegen des reichhaltigen Verhaltensrepertoires von MĂ€usen
sind fĂŒr die Quantifizierung viele umfangreiche Analyseschritte erforderlich.
Tiere und Menschen sind verkörperte (embodied) lebende Systeme
und daher aus stark miteinander verwobenen Modulen oder EntitÀten zusammengesetzt,
welche auĂerdem auch mit der Umgebung verwoben sind.
Um lebende Systeme als Ganzes zu studieren
ist es notwendig Hypothesen,
zum Beispiel zur Natur mentaler Modelle,
im Wahrnehmungs-Handlungs-Zyklus zu testen.
Zusammengefasst erweitern die Studien dieser Dissertation
unser VerstĂ€ndnis des Charakters frĂŒher sensorischer ReprĂ€sentationen als mentale Modelle,
sowie unser VerstĂ€ndnis höherer, mentalen Modellen fĂŒr die rĂ€umliche Navigation.
DarĂŒber hinaus enthĂ€lt es ein Beispiel
fĂŒr das Evaluieren von Hypothesn im Wahr\-neh\-mungs-Handlungs-Zyklus.The superordinate aim of my work towards this thesis
was a better understanding
of the relationship between mental models
and the underlying principles that lead to the self-organization
of neuronal circuitry.
The thesis consists of four individual publications,
which approach this goal from differing perspectives.
While the formation of sparse coding representations in neuronal substrate
has been investigated extensively,
many research questions
on how sparse coding may be exploited for higher cognitive processing
are still open.
The first two studies,
included as chapter 2 and chapter 3,
asked to what extend representations obtained with sparse coding
match mental models.
We identified the following selectivities in sparse coding representations:
with stereo images as input,
the representation was selective for the disparity of image structures,
which can be used to infer the distance of structures to the observer.
Furthermore, it was selective to the predominant orientation in textures,
which can be used to infer the orientation of surfaces.
With optic flow from egomotion as input,
the representation was selective to the direction of egomotion
in 6 degrees of freedom.
Due to the direct relation between selectivity and physical properties,
these representations, obtained with sparse coding,
can serve as early sensory models of the environment.
The cognitive processes behind spatial knowledge
rest on mental models that represent the environment.
We presented a topological model for wayfinding
in the third study,
included as chapter 4.
It describes a dual population code,
where the first population code encodes places
by means of place fields,
and the second population code encodes motion instructions
based on links between place fields.
We did not focus on an implementation in biological substrate
or on an exact fit to physiological findings.
The model is a biologically plausible, parsimonious method for wayfinding,
which may be close to an intermediate step
of emergent skills in an evolutionary navigational hierarchy.
Our automated testing for visual performance in mice,
included in chapter 5,
is an example of behavioral testing in the perception-action cycle.
The goal of this study was to quantify the optokinetic reflex.
Due to the rich behavioral repertoire of mice,
quantification required many elaborate steps of computational analyses.
Animals and humans are embodied living systems,
and therefore composed of strongly enmeshed modules or entities,
which are also enmeshed with the environment.
In order to study living systems as a whole,
it is necessary to test hypothesis,
for example on the nature of mental models,
in the perception-action cycle.
In summary,
the studies included in this thesis
extend our view on the character of early sensory representations
as mental models,
as well as on high-level mental models
for spatial navigation.
Additionally it contains an example
for the evaluation of hypotheses in the perception-action cycle