10 research outputs found

    Intrinsically Motivated Learning of Visual Motion Perception and Smooth Pursuit

    Full text link
    We extend the framework of efficient coding, which has been used to model the development of sensory processing in isolation, to model the development of the perception/action cycle. Our extension combines sparse coding and reinforcement learning so that sensory processing and behavior co-develop to optimize a shared intrinsic motivational signal: the fidelity of the neural encoding of the sensory input under resource constraints. Applying this framework to a model system consisting of an active eye behaving in a time varying environment, we find that this generic principle leads to the simultaneous development of both smooth pursuit behavior and model neurons whose properties are similar to those of primary visual cortical neurons selective for different directions of visual motion. We suggest that this general principle may form the basis for a unified and integrated explanation of many perception/action loops.Comment: 6 pages, 5 figure

    A neural network model of curiosity-driven categorization

    Get PDF
    Infants are curious learners who drive their own cognitive development by imposing structure on their learning environments as they explore. Understanding the mechanisms underlying this curiosity is therefore critical to our understanding of development. However, very few studies have examined the role of curiosity in infants’ learning, and in particular, their categorization; what structure infants impose on their own environment and how this affects learning is therefore unclear. The results of studies in which the learning environment is structured a priori are contradictory: while some suggest that complexity optimizes learning, others suggest that minimal complexity is optimal, and still others report a Goldilocks effect by which intermediate difficulty is best. We used an autoencoder network to capture empirical data in which 10-month old infants’ categorization was supported by maximal complexity [1]. When we allowed the same model to choose stimulus sequences based on a “curiosity” metric which took into account the model’s internal states as well as stimulus features, categorization was better than selection based solely on stimulus characteristics. The sequences of stimuli chosen by the model in the curiosity condition showed a Goldilocks effect with intermediate complexity. This study provides the first computational investigation of curiosity-based categorization, and points to the importance characterizing development as emerging from the relationship between the learner and its environment

    Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey

    Full text link
    Building autonomous machines that can explore open-ended environments, discover possible interactions and build repertoires of skills is a general objective of artificial intelligence. Developmental approaches argue that this can only be achieved by autotelicautotelic agentsagents: intrinsically motivated learning agents that can learn to represent, generate, select and solve their own problems. In recent years, the convergence of developmental approaches with deep reinforcement learning (RL) methods has been leading to the emergence of a new field: developmentaldevelopmental reinforcementreinforcement learninglearning. Developmental RL is concerned with the use of deep RL algorithms to tackle a developmental problem -- the intrinsicallyintrinsically motivatedmotivated acquisitionacquisition ofof openopen-endedended repertoiresrepertoires ofof skillsskills. The self-generation of goals requires the learning of compact goal encodings as well as their associated goal-achievement functions. This raises new challenges compared to standard RL algorithms originally designed to tackle pre-defined sets of goals using external reward signals. The present paper introduces developmental RL and proposes a computational framework based on goal-conditioned RL to tackle the intrinsically motivated skills acquisition problem. It proceeds to present a typology of the various goal representations used in the literature, before reviewing existing methods to learn to represent and prioritize goals in autonomous systems. We finally close the paper by discussing some open challenges in the quest of intrinsically motivated skills acquisition

    Robust active binocular vision through intrinsically motivated learning

    Get PDF
    The efficient coding hypothesis posits that sensory systems of animals strive to encode sensory signals efficiently by taking into account the redundancies in them. This principle has been very successful in explaining response properties of visual sensory neurons as adaptations to the statistics of natural images. Recently, we have begun to extend the efficient coding hypothesis to active perception through a form of intrinsically motivated learning: a sensory model learns an efficient code for the sensory signals while a reinforcement learner generates movements of the sense organs to improve the encoding of the signals. To this end, it receives an intrinsically generated reinforcement signal indicating how well the sensory model encodes the data. This approach has been tested in the context of binocular vison, leading to the autonomous development of disparity tuning and vergence control. Here we systematically investigate the robustness of the new approach in the context of a binocular vision system implemented on a robot. Robustness is an important aspect that reflects the ability of the system to deal with unmodeled disturbances or events, such as insults to the system that displace the stereo cameras. To demonstrate the robustness of our method and its ability to self-calibrate, we introduce various perturbations and test if and how the system recovers from them. We find that (1) the system can fully recover from a perturbation that can be compensated through the system's motor degrees of freedom, (2) performance degrades gracefully if the system cannot use its motor degrees of freedom to compensate for the perturbation, and (3) recovery from a perturbation is improved if both the sensory encoding and the behavior policy can adapt to the perturbation. Overall, this work demonstrates that our intrinsically motivated learning approach for efficient coding in active perception gives rise to a self-calibrating perceptual system of high robustness

    Final report key contents: main results accomplished by the EU-Funded project IM-CLeVeR - Intrinsically Motivated Cumulative Learning Versatile Robots

    Get PDF
    This document has the goal of presenting the main scientific and technological achievements of the project IM-CLeVeR. The document is organised as follows: 1. Project executive summary: a brief overview of the project vision, objectives and keywords. 2. Beneficiaries of the project and contacts: list of Teams (partners) of the project, Team Leaders and contacts. 3. Project context and objectives: the vision of the project and its overall objectives 4. Overview of work performed and main results achieved: a one page overview of the main results of the project 5. Overview of main results per partner: a bullet-point list of main results per partners 6. Main achievements in detail, per partner: a throughout explanation of the main results per partner (but including collaboration work), with also reference to the main publications supporting them

    Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey

    Get PDF
    Building autonomous machines that can explore open-ended environments, discover possible interactions and autonomously build repertoires of skills is a general objective of artificial intelligence. Developmental approaches argue that this can only be achieved by autonomous and intrinsically motivated learning agents that can generate, select and learn to solve their own problems. In recent years, we have seen a convergence of developmental approaches, and developmental robotics in particular, with deep reinforcement learning (RL) methods, forming the new domain of developmental machine learning. Within this new domain, we review here a set of methods where deep RL algorithms are trained to tackle the developmental robotics problem of the autonomous acquisition of open-ended repertoires of skills. Intrinsically motivated goal-conditioned RL algorithms train agents to learn to represent, generate and pursue their own goals. The self-generation of goals requires the learning of compact goal encodings as well as their associated goal-achievement functions, which results in new challenges compared to traditional RL algorithms designed to tackle pre-defined sets of goals using external reward signals. This paper proposes a typology of these methods at the intersection of deep RL and developmental approaches, surveys recent approaches and discusses future avenues

    Effizientes und stabiles online Lernen fĂŒr "Developmental Robots"

    Get PDF
    Recent progress in robotics and cognitive science has inspired a new generation of more versatile robots, so-called developmental robots. Many learning approaches for these robots are inspired by developmental processes and learning mechanisms observed in children. It is widely accepted that developmental robots must autonomously develop, acquire their skills, and cope with unforeseen challenges in unbounded environments through lifelong learning. Continuous online adaptation and intrinsically motivated learning are thus essential capabilities for these robots. However, the high sample-complexity of online learning and intrinsic motivation methods impedes the efficiency and practical feasibility of these methods for lifelong learning. Consequently, the majority of previous work has been demonstrated only in simulation. This thesis devises new methods and learning schemes to mitigate this problem and to permit direct online training on physical robots. A novel intrinsic motivation method is developed to drive the robot’s exploration to efficiently select what to learn. This method combines new knowledge-based and competence-based signals to increase sample-efficiency and to enable lifelong learning. While developmental robots typically acquire their skills through self-exploration, their autonomous development could be accelerated by additionally learning from humans. Yet there is hardly any research to integrate intrinsic motivation with learning from a teacher. The thesis therefore establishes a new learning scheme to integrate intrinsic motivation with learning from observation. The underlying exploration mechanism in the proposed learning schemes relies on Goal Babbling as a goal-directed method for learning direct inverse robot models online, from scratch, and in a learning while behaving fashion. Online learning of multiple solutions for redundant robots with this framework was missing. This thesis devises an incremental online associative network to enable simultaneous exploration and solution consolidation and establishes a new technique to stabilize the learning system. The proposed methods and learning schemes are demonstrated for acquiring reaching skills. Their efficiency, stability, and applicability are benchmarked in simulation and demonstrated on a physical 7-DoF Baxter robot arm.JĂŒngste Entwicklungen in der Robotik und den Kognitionswissenschaften haben zu einer Generation von vielseitigen Robotern gefĂŒhrt, die als ”Developmental Robots” bezeichnet werden. Lernverfahren fĂŒr diese Roboter sind inspiriert von Lernmechanismen, die bei Kindern beobachtet wurden. ”Developmental Robots” mĂŒssen autonom Fertigkeiten erwerben und unvorhergesehene Herausforderungen in uneingeschrĂ€nkten Umgebungen durch lebenslanges Lernen meistern. Kontinuierliches Anpassen und Lernen durch intrinsische Motivation sind daher wichtige Eigenschaften. Allerdings schrĂ€nkt der hohe Aufwand beim Generieren von Datenpunkten die praktische Nutzbarkeit solcher Verfahren ein. Daher wurde ein Großteil nur in Simulationen demonstriert. In dieser Arbeit werden daher neue Methoden konzipiert, um dieses Problem zu meistern und ein direktes Online-Training auf realen Robotern zu ermöglichen. Dazu wird eine neue intrinsisch motivierte Methode entwickelt, die wĂ€hrend der Umgebungsexploration effizient auswĂ€hlt, was gelernt wird. Sie kombiniert neue wissens- und kompetenzbasierte Signale, um die Sampling-Effizienz zu steigern und lebenslanges Lernen zu ermöglichen. WĂ€hrend ”Developmental Robots” Fertigkeiten durch Selbstexploration erwerben, kann ihre Entwicklung durch Lernen durch Beobachten beschleunigt werden. Dennoch gibt es kaum Arbeiten, die intrinsische Motivation mit Lernen von interagierenden Lehrern verbinden. Die vorliegende Arbeit entwickelt ein neues Lernschema, das diese Verbindung schafft. Der in den vorgeschlagenen Lernmethoden genutzte Explorationsmechanismus beruht auf Goal Babbling, einer zielgerichteten Methode zum Lernen inverser Modelle, die online-fĂ€hig ist, kein Vorwissen benötigt und Lernen wĂ€hrend der AusfĂŒhrung von Bewegungen ermöglicht. Das Online-Lernen mehrerer Lösungen inverser Modelle redundanter Roboter mit Goal Babbling wurde bisher nicht erforscht. In dieser Arbeit wird dazu ein inkrementell lernendes, assoziatives neuronales Netz entwickelt und eine Methode konzipiert, die es stabilisiert. Das Netz ermöglicht deren gleichzeitige Exploration und Konsolidierung. Die vorgeschlagenen Verfahren werden fĂŒr das Greifen nach Objekten demonstriert. Ihre Effizienz, StabilitĂ€t und Anwendbarkeit werden simulativ verglichen und mit einem Roboter mit sieben Gelenken demonstriert

    On the relationship between neuronal codes and mental models

    Get PDF
    Das ĂŒbergeordnete Ziel meiner Arbeit an dieser Dissertation war ein besseres VerstĂ€ndnis des Zusammenhangs von mentalen Modellen und den zugrundeliegenden Prinzipien, die zur Selbstorganisation neuronaler Verschaltung fĂŒhren. Die Dissertation besteht aus vier individuellen Publikationen, die dieses Ziel aus unterschiedlichen Perspektiven angehen. WĂ€hrend die Selbstorganisation von Sparse-Coding-ReprĂ€sentationen in neuronalem Substrat bereits ausgiebig untersucht worden ist, sind viele Forschungsfragen dazu, wie Sparse-Coding fĂŒr höhere, kognitive Prozesse genutzt werden könnte noch offen. Die ersten zwei Studien, die in Kapitel 2 und Kapitel 3 enthalten sind, behandeln die Frage, inwieweit ReprĂ€sentationen, die mit Sparse-Coding entstehen, mentalen Modellen entsprechen. Wir haben folgende SelektivitĂ€ten in Sparse-Coding-ReprĂ€sentationen identifiziert: mit Stereo-Bildern als Eingangsdaten war die ReprĂ€sentation selektiv fĂŒr die DisparitĂ€ten von Bildstrukturen, welche fĂŒr das AbschĂ€tzen der Entfernung der Strukturen zum Beobachter genutzt werden können. Außerdem war die ReprĂ€sentation selektiv fĂŒr die die vorherrschende Orientierung in Texturen, was fĂŒr das AbschĂ€tzen der Neigung von OberflĂ€chen genutzt werden kann. Mit optischem Fluss von Eigenbewegung als Eingangsdaten war die ReprĂ€sentation selektiv fĂŒr die Richtung der Eigenbewegung in den sechs Freiheitsgraden. Wegen des direkten Zusammenhangs der SelektivitĂ€ten mit physikalischen Eigenschaften können ReprĂ€sentationen, die mit Sparse-Coding entstehen, als frĂŒhe sensorische Modelle der Umgebung dienen. Die kognitiven Prozesse hinter rĂ€umlichem Wissen ruhen auf mentalen Modellen, welche die Umgebung representieren. Wir haben in der dritten Studie, welche in Kapitel 4 enthalten ist, ein topologisches Modell zur Navigation prĂ€sentiert, Es beschreibt einen dualen Populations-Code, bei dem der erste Populations-Code Orte anhand von Orts-Feldern (Place-Fields) kodiert und der zweite Populations-Code Bewegungs-Instruktionen, basierend auf der VerknĂŒpfung von Orts-Feldern, kodiert. Der Fokus lag nicht auf der Implementation in biologischem Substrat oder auf einer exakten Modellierung physiologischer Ergebnisse. Das Modell ist eine biologisch plausible, einfache Methode zur Navigation, welche sich an einen Zwischenschritt emergenter Navigations-FĂ€higkeiten in einer evolutiven Navigations-Hierarchie annĂ€hert. Unser automatisierter Test der Sehleistungen von MĂ€usen, welcher in Kapitel 5 beschrieben wird, ist ein Beispiel von Verhaltens-Tests im Wahrnehmungs-Handlungs-Zyklus (Perception-Action-Cycle). Das Ziel dieser Studie war die Quantifizierung des optokinetischen Reflexes. Wegen des reichhaltigen Verhaltensrepertoires von MĂ€usen sind fĂŒr die Quantifizierung viele umfangreiche Analyseschritte erforderlich. Tiere und Menschen sind verkörperte (embodied) lebende Systeme und daher aus stark miteinander verwobenen Modulen oder EntitĂ€ten zusammengesetzt, welche außerdem auch mit der Umgebung verwoben sind. Um lebende Systeme als Ganzes zu studieren ist es notwendig Hypothesen, zum Beispiel zur Natur mentaler Modelle, im Wahrnehmungs-Handlungs-Zyklus zu testen. Zusammengefasst erweitern die Studien dieser Dissertation unser VerstĂ€ndnis des Charakters frĂŒher sensorischer ReprĂ€sentationen als mentale Modelle, sowie unser VerstĂ€ndnis höherer, mentalen Modellen fĂŒr die rĂ€umliche Navigation. DarĂŒber hinaus enthĂ€lt es ein Beispiel fĂŒr das Evaluieren von Hypothesn im Wahr\-neh\-mungs-Handlungs-Zyklus.The superordinate aim of my work towards this thesis was a better understanding of the relationship between mental models and the underlying principles that lead to the self-organization of neuronal circuitry. The thesis consists of four individual publications, which approach this goal from differing perspectives. While the formation of sparse coding representations in neuronal substrate has been investigated extensively, many research questions on how sparse coding may be exploited for higher cognitive processing are still open. The first two studies, included as chapter 2 and chapter 3, asked to what extend representations obtained with sparse coding match mental models. We identified the following selectivities in sparse coding representations: with stereo images as input, the representation was selective for the disparity of image structures, which can be used to infer the distance of structures to the observer. Furthermore, it was selective to the predominant orientation in textures, which can be used to infer the orientation of surfaces. With optic flow from egomotion as input, the representation was selective to the direction of egomotion in 6 degrees of freedom. Due to the direct relation between selectivity and physical properties, these representations, obtained with sparse coding, can serve as early sensory models of the environment. The cognitive processes behind spatial knowledge rest on mental models that represent the environment. We presented a topological model for wayfinding in the third study, included as chapter 4. It describes a dual population code, where the first population code encodes places by means of place fields, and the second population code encodes motion instructions based on links between place fields. We did not focus on an implementation in biological substrate or on an exact fit to physiological findings. The model is a biologically plausible, parsimonious method for wayfinding, which may be close to an intermediate step of emergent skills in an evolutionary navigational hierarchy. Our automated testing for visual performance in mice, included in chapter 5, is an example of behavioral testing in the perception-action cycle. The goal of this study was to quantify the optokinetic reflex. Due to the rich behavioral repertoire of mice, quantification required many elaborate steps of computational analyses. Animals and humans are embodied living systems, and therefore composed of strongly enmeshed modules or entities, which are also enmeshed with the environment. In order to study living systems as a whole, it is necessary to test hypothesis, for example on the nature of mental models, in the perception-action cycle. In summary, the studies included in this thesis extend our view on the character of early sensory representations as mental models, as well as on high-level mental models for spatial navigation. Additionally it contains an example for the evaluation of hypotheses in the perception-action cycle
    corecore