Search CORE

10 research outputs found

Intrinsically Motivated Learning of Visual Motion Perception and Smooth Pursuit

Author: Shi Bertram E.
Triesch Jochen
Zhang Chong
Zhao Yu
Publication venue
Publication date: 24/02/2014
Field of study

We extend the framework of efficient coding, which has been used to model the development of sensory processing in isolation, to model the development of the perception/action cycle. Our extension combines sparse coding and reinforcement learning so that sensory processing and behavior co-develop to optimize a shared intrinsic motivational signal: the fidelity of the neural encoding of the sensory input under resource constraints. Applying this framework to a model system consisting of an active eye behaving in a time varying environment, we find that this generic principle leads to the simultaneous development of both smooth pursuit behavior and model neurons whose properties are similar to those of primary visual cortical neurons selective for different directions of visual motion. We suggest that this general principle may form the basis for a unified and integrated explanation of many perception/action loops.Comment: 6 pages, 5 figure

arXiv.org e-Print Archive

Crossref

A neural network model of curiosity-driven categorization

Author: Twomey Katherine Elizabeth
Westermann Gert
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2015
Field of study

Infants are curious learners who drive their own cognitive development by imposing structure on their learning environments as they explore. Understanding the mechanisms underlying this curiosity is therefore critical to our understanding of development. However, very few studies have examined the role of curiosity in infants’ learning, and in particular, their categorization; what structure infants impose on their own environment and how this affects learning is therefore unclear. The results of studies in which the learning environment is structured a priori are contradictory: while some suggest that complexity optimizes learning, others suggest that minimal complexity is optimal, and still others report a Goldilocks effect by which intermediate difficulty is best. We used an autoencoder network to capture empirical data in which 10-month old infants’ categorization was supported by maximal complexity [1]. When we allowed the same model to choose stimulus sequences based on a “curiosity” metric which took into account the model’s internal states as well as stimulus features, categorization was better than selection based solely on stimulus characteristics. The sequences of stimuli chosen by the model in the curiosity condition showed a Goldilocks effect with intermediate complexity. This study provides the first computational investigation of curiosity-based categorization, and points to the importance characterizing development as emerging from the relationship between the learner and its environment

Crossref

The University of Manchester - Institutional Repository

Lancaster E-Prints

Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey

Author: Colas Cédric
Karch Tristan
Oudeyer Pierre-Yves
Sigaud Olivier
Publication venue
Publication date: 14/12/2021
Field of study

Building autonomous machines that can explore open-ended environments, discover possible interactions and build repertoires of skills is a general objective of artificial intelligence. Developmental approaches argue that this can only be achieved by

autotelic

agents

: intrinsically motivated learning agents that can learn to represent, generate, select and solve their own problems. In recent years, the convergence of developmental approaches with deep reinforcement learning (RL) methods has been leading to the emergence of a new field:

developmental

reinforcement

learning

. Developmental RL is concerned with the use of deep RL algorithms to tackle a developmental problem -- the

intrinsically

motivated

acquisition

of

open

ended

repertoires

of

skills

. The self-generation of goals requires the learning of compact goal encodings as well as their associated goal-achievement functions. This raises new challenges compared to standard RL algorithms originally designed to tackle pre-defined sets of goals using external reward signals. The present paper introduces developmental RL and proposes a computational framework based on goal-conditioned RL to tackle the intrinsically motivated skills acquisition problem. It proceeds to present a typology of the various goal representations used in the literature, before reviewing existing methods to learn to represent and prioritize goals in autonomous systems. We finally close the paper by discussing some open challenges in the quest of intrinsically motivated skills acquisition

arXiv.org e-Print Archive

Robust active binocular vision through intrinsically motivated learning

Author: Bertram E. Shi
Céline eTeulière
Jochen eTriesch
Luca eLonini
Sébastien eForestier
Sébastien eForestier
Yu eZhao
Publication venue
Publication date: 01/01/2013
Field of study

The efficient coding hypothesis posits that sensory systems of animals strive to encode sensory signals efficiently by taking into account the redundancies in them. This principle has been very successful in explaining response properties of visual sensory neurons as adaptations to the statistics of natural images. Recently, we have begun to extend the efficient coding hypothesis to active perception through a form of intrinsically motivated learning: a sensory model learns an efficient code for the sensory signals while a reinforcement learner generates movements of the sense organs to improve the encoding of the signals. To this end, it receives an intrinsically generated reinforcement signal indicating how well the sensory model encodes the data. This approach has been tested in the context of binocular vison, leading to the autonomous development of disparity tuning and vergence control. Here we systematically investigate the robustness of the new approach in the context of a binocular vision system implemented on a robot. Robustness is an important aspect that reflects the ability of the system to deal with unmodeled disturbances or events, such as insults to the system that displace the stereo cameras. To demonstrate the robustness of our method and its ability to self-calibrate, we introduce various perturbations and test if and how the system recovers from them. We find that (1) the system can fully recover from a perturbation that can be compensated through the system's motor degrees of freedom, (2) performance degrades gracefully if the system cannot use its motor degrees of freedom to compensate for the perturbation, and (3) recovery from a perturbation is improved if both the sensory encoding and the behavior policy can adapt to the perturbation. Overall, this work demonstrates that our intrinsically motivated learning approach for efficient coding in active perception gives rise to a self-calibrating perceptual system of high robustness

Crossref

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

Hochschulschriftenserver - Universität Frankfurt am Main

Final report key contents: main results accomplished by the EU-Funded project IM-CLeVeR - Intrinsically Motivated Cumulative Learning Versatile Robots

Author: Baldassarre Gianluca
Barto Andrew
Guglielmelli Eugenio
Gurney Kevin
Keller Flavio
Lee Mark
McGuinnity Martin
Mirolli Marco
Redgrave Peter
Schmidhuber Juergen
Triesch Jochen
Visalberghi Elisabetta
Publication venue
Publication date
Field of study

This document has the goal of presenting the main scientific and technological achievements of the project IM-CLeVeR. The document is organised as follows: 1. Project executive summary: a brief overview of the project vision, objectives and keywords. 2. Beneficiaries of the project and contacts: list of Teams (partners) of the project, Team Leaders and contacts. 3. Project context and objectives: the vision of the project and its overall objectives 4. Overview of work performed and main results achieved: a one page overview of the main results of the project 5. Overview of main results per partner: a bullet-point list of main results per partners 6. Main achievements in detail, per partner: a throughout explanation of the main results per partner (but including collaboration work), with also reference to the main publications supporting them

PUblication MAnagement

Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey

Author: Colas Cédric
Karch Tristan
Oudeyer Pierre-Yves
Sigaud Olivier
Publication venue: HAL CCSD
Publication date: 06/01/2021
Field of study

Building autonomous machines that can explore open-ended environments, discover possible interactions and autonomously build repertoires of skills is a general objective of artificial intelligence. Developmental approaches argue that this can only be achieved by autonomous and intrinsically motivated learning agents that can generate, select and learn to solve their own problems. In recent years, we have seen a convergence of developmental approaches, and developmental robotics in particular, with deep reinforcement learning (RL) methods, forming the new domain of developmental machine learning. Within this new domain, we review here a set of methods where deep RL algorithms are trained to tackle the developmental robotics problem of the autonomous acquisition of open-ended repertoires of skills. Intrinsically motivated goal-conditioned RL algorithms train agents to learn to represent, generate and pursue their own goals. The self-generation of goals requires the learning of compact goal encodings as well as their associated goal-achievement functions, which results in new challenges compared to traditional RL algorithms designed to tackle pre-defined sets of goals using external reward signals. This paper proposes a typology of these methods at the intersection of deep RL and developmental approaches, surveys recent approaches and discusses future avenues

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Effizientes und stabiles online Lernen für "Developmental Robots"

Author: Rayyes Rania
Publication venue
Publication date: 01/01/2020
Field of study

Recent progress in robotics and cognitive science has inspired a new generation of more versatile robots, so-called developmental robots. Many learning approaches for these robots are inspired by developmental processes and learning mechanisms observed in children. It is widely accepted that developmental robots must autonomously develop, acquire their skills, and cope with unforeseen challenges in unbounded environments through lifelong learning. Continuous online adaptation and intrinsically motivated learning are thus essential capabilities for these robots. However, the high sample-complexity of online learning and intrinsic motivation methods impedes the efficiency and practical feasibility of these methods for lifelong learning. Consequently, the majority of previous work has been demonstrated only in simulation. This thesis devises new methods and learning schemes to mitigate this problem and to permit direct online training on physical robots. A novel intrinsic motivation method is developed to drive the robot’s exploration to efficiently select what to learn. This method combines new knowledge-based and competence-based signals to increase sample-efficiency and to enable lifelong learning. While developmental robots typically acquire their skills through self-exploration, their autonomous development could be accelerated by additionally learning from humans. Yet there is hardly any research to integrate intrinsic motivation with learning from a teacher. The thesis therefore establishes a new learning scheme to integrate intrinsic motivation with learning from observation. The underlying exploration mechanism in the proposed learning schemes relies on Goal Babbling as a goal-directed method for learning direct inverse robot models online, from scratch, and in a learning while behaving fashion. Online learning of multiple solutions for redundant robots with this framework was missing. This thesis devises an incremental online associative network to enable simultaneous exploration and solution consolidation and establishes a new technique to stabilize the learning system. The proposed methods and learning schemes are demonstrated for acquiring reaching skills. Their efficiency, stability, and applicability are benchmarked in simulation and demonstrated on a physical 7-DoF Baxter robot arm.Jüngste Entwicklungen in der Robotik und den Kognitionswissenschaften haben zu einer Generation von vielseitigen Robotern geführt, die als ”Developmental Robots” bezeichnet werden. Lernverfahren für diese Roboter sind inspiriert von Lernmechanismen, die bei Kindern beobachtet wurden. ”Developmental Robots” müssen autonom Fertigkeiten erwerben und unvorhergesehene Herausforderungen in uneingeschränkten Umgebungen durch lebenslanges Lernen meistern. Kontinuierliches Anpassen und Lernen durch intrinsische Motivation sind daher wichtige Eigenschaften. Allerdings schränkt der hohe Aufwand beim Generieren von Datenpunkten die praktische Nutzbarkeit solcher Verfahren ein. Daher wurde ein Großteil nur in Simulationen demonstriert. In dieser Arbeit werden daher neue Methoden konzipiert, um dieses Problem zu meistern und ein direktes Online-Training auf realen Robotern zu ermöglichen. Dazu wird eine neue intrinsisch motivierte Methode entwickelt, die während der Umgebungsexploration effizient auswählt, was gelernt wird. Sie kombiniert neue wissens- und kompetenzbasierte Signale, um die Sampling-Effizienz zu steigern und lebenslanges Lernen zu ermöglichen. Während ”Developmental Robots” Fertigkeiten durch Selbstexploration erwerben, kann ihre Entwicklung durch Lernen durch Beobachten beschleunigt werden. Dennoch gibt es kaum Arbeiten, die intrinsische Motivation mit Lernen von interagierenden Lehrern verbinden. Die vorliegende Arbeit entwickelt ein neues Lernschema, das diese Verbindung schafft. Der in den vorgeschlagenen Lernmethoden genutzte Explorationsmechanismus beruht auf Goal Babbling, einer zielgerichteten Methode zum Lernen inverser Modelle, die online-fähig ist, kein Vorwissen benötigt und Lernen während der Ausführung von Bewegungen ermöglicht. Das Online-Lernen mehrerer Lösungen inverser Modelle redundanter Roboter mit Goal Babbling wurde bisher nicht erforscht. In dieser Arbeit wird dazu ein inkrementell lernendes, assoziatives neuronales Netz entwickelt und eine Methode konzipiert, die es stabilisiert. Das Netz ermöglicht deren gleichzeitige Exploration und Konsolidierung. Die vorgeschlagenen Verfahren werden für das Greifen nach Objekten demonstriert. Ihre Effizienz, Stabilität und Anwendbarkeit werden simulativ verglichen und mit einem Roboter mit sieben Gelenken demonstriert

Digitale Bibliothek Braunschweig

On the relationship between neuronal codes and mental models

Author: Ecke Gerrit
Publication venue: Universität Tübingen
Publication date: 01/01/2021
Field of study

Das übergeordnete Ziel meiner Arbeit an dieser Dissertation war ein besseres Verständnis des Zusammenhangs von mentalen Modellen und den zugrundeliegenden Prinzipien, die zur Selbstorganisation neuronaler Verschaltung führen. Die Dissertation besteht aus vier individuellen Publikationen, die dieses Ziel aus unterschiedlichen Perspektiven angehen. Während die Selbstorganisation von Sparse-Coding-Repräsentationen in neuronalem Substrat bereits ausgiebig untersucht worden ist, sind viele Forschungsfragen dazu, wie Sparse-Coding für höhere, kognitive Prozesse genutzt werden könnte noch offen. Die ersten zwei Studien, die in Kapitel 2 und Kapitel 3 enthalten sind, behandeln die Frage, inwieweit Repräsentationen, die mit Sparse-Coding entstehen, mentalen Modellen entsprechen. Wir haben folgende Selektivitäten in Sparse-Coding-Repräsentationen identifiziert: mit Stereo-Bildern als Eingangsdaten war die Repräsentation selektiv für die Disparitäten von Bildstrukturen, welche für das Abschätzen der Entfernung der Strukturen zum Beobachter genutzt werden können. Außerdem war die Repräsentation selektiv für die die vorherrschende Orientierung in Texturen, was für das Abschätzen der Neigung von Oberflächen genutzt werden kann. Mit optischem Fluss von Eigenbewegung als Eingangsdaten war die Repräsentation selektiv für die Richtung der Eigenbewegung in den sechs Freiheitsgraden. Wegen des direkten Zusammenhangs der Selektivitäten mit physikalischen Eigenschaften können Repräsentationen, die mit Sparse-Coding entstehen, als frühe sensorische Modelle der Umgebung dienen. Die kognitiven Prozesse hinter räumlichem Wissen ruhen auf mentalen Modellen, welche die Umgebung representieren. Wir haben in der dritten Studie, welche in Kapitel 4 enthalten ist, ein topologisches Modell zur Navigation präsentiert, Es beschreibt einen dualen Populations-Code, bei dem der erste Populations-Code Orte anhand von Orts-Feldern (Place-Fields) kodiert und der zweite Populations-Code Bewegungs-Instruktionen, basierend auf der Verknüpfung von Orts-Feldern, kodiert. Der Fokus lag nicht auf der Implementation in biologischem Substrat oder auf einer exakten Modellierung physiologischer Ergebnisse. Das Modell ist eine biologisch plausible, einfache Methode zur Navigation, welche sich an einen Zwischenschritt emergenter Navigations-Fähigkeiten in einer evolutiven Navigations-Hierarchie annähert. Unser automatisierter Test der Sehleistungen von Mäusen, welcher in Kapitel 5 beschrieben wird, ist ein Beispiel von Verhaltens-Tests im Wahrnehmungs-Handlungs-Zyklus (Perception-Action-Cycle). Das Ziel dieser Studie war die Quantifizierung des optokinetischen Reflexes. Wegen des reichhaltigen Verhaltensrepertoires von Mäusen sind für die Quantifizierung viele umfangreiche Analyseschritte erforderlich. Tiere und Menschen sind verkörperte (embodied) lebende Systeme und daher aus stark miteinander verwobenen Modulen oder Entitäten zusammengesetzt, welche außerdem auch mit der Umgebung verwoben sind. Um lebende Systeme als Ganzes zu studieren ist es notwendig Hypothesen, zum Beispiel zur Natur mentaler Modelle, im Wahrnehmungs-Handlungs-Zyklus zu testen. Zusammengefasst erweitern die Studien dieser Dissertation unser Verständnis des Charakters früher sensorischer Repräsentationen als mentale Modelle, sowie unser Verständnis höherer, mentalen Modellen für die räumliche Navigation. Darüber hinaus enthält es ein Beispiel für das Evaluieren von Hypothesn im Wahr\-neh\-mungs-Handlungs-Zyklus.The superordinate aim of my work towards this thesis was a better understanding of the relationship between mental models and the underlying principles that lead to the self-organization of neuronal circuitry. The thesis consists of four individual publications, which approach this goal from differing perspectives. While the formation of sparse coding representations in neuronal substrate has been investigated extensively, many research questions on how sparse coding may be exploited for higher cognitive processing are still open. The first two studies, included as chapter 2 and chapter 3, asked to what extend representations obtained with sparse coding match mental models. We identified the following selectivities in sparse coding representations: with stereo images as input, the representation was selective for the disparity of image structures, which can be used to infer the distance of structures to the observer. Furthermore, it was selective to the predominant orientation in textures, which can be used to infer the orientation of surfaces. With optic flow from egomotion as input, the representation was selective to the direction of egomotion in 6 degrees of freedom. Due to the direct relation between selectivity and physical properties, these representations, obtained with sparse coding, can serve as early sensory models of the environment. The cognitive processes behind spatial knowledge rest on mental models that represent the environment. We presented a topological model for wayfinding in the third study, included as chapter 4. It describes a dual population code, where the first population code encodes places by means of place fields, and the second population code encodes motion instructions based on links between place fields. We did not focus on an implementation in biological substrate or on an exact fit to physiological findings. The model is a biologically plausible, parsimonious method for wayfinding, which may be close to an intermediate step of emergent skills in an evolutionary navigational hierarchy. Our automated testing for visual performance in mice, included in chapter 5, is an example of behavioral testing in the perception-action cycle. The goal of this study was to quantify the optokinetic reflex. Due to the rich behavioral repertoire of mice, quantification required many elaborate steps of computational analyses. Animals and humans are embodied living systems, and therefore composed of strongly enmeshed modules or entities, which are also enmeshed with the environment. In order to study living systems as a whole, it is necessary to test hypothesis, for example on the nature of mental models, in the perception-action cycle. In summary, the studies included in this thesis extend our view on the character of early sensory representations as mental models, as well as on high-level mental models for spatial navigation. Additionally it contains an example for the evaluation of hypotheses in the perception-action cycle

Publikationsserver der Universität Tübingen