7 research outputs found

    A modular neural-network model of the basal ganglia\u27s role in learning and selecting motor behaviours

    Get PDF
    This work presents a modular neural-network model (based on reinforcement-learning actor-critic methods) that tries to capture some of the most relevant known aspects of the role that basal ganglia play in learning and selecting motor behavior related to different goals. In particular some simulations with the model show that basal ganglia selects "chunks" of behaviour whose "details" are specified by direct sensory-motor pathways, and how emergent modularity can help to deal with multiple behavioral tasks. A "top-down" approach is adopted. The starting point is the adaptive interaction of a (simulated) organism with the environment, and its capacity to learn. Then an attempt is made to implement these functions with neural architectures and mechanisms that have a neuroanatomical and neurophysiological empirical foundation

    A planning modular neural-network robot for asynchronous multi-goal navigation tasks

    Get PDF
    This paper focuses on two planning neural-network controllers, a "forward planner" and a "bidirectional planner". These have been developed within the framework of Sutton\u27s Dyna-PI architectures (planning within reinforcement learning) and have already been presented in previous papers. The novelty of this paper is that the architecture of these planners is made modular in some of its components in order to deal with catastrophic interference. The controllers are tested through a simulated robot engaged in an asynchronous multi-goal path-planning problem that should exacerbate the interference problems. The results show that: (a) the modular planners can cope with multi-goal problems allowing generalisation but avoiding interference; (b) when dealing with multi-goal problems the planners keeps the advantages shown previously for one-goal problems vs. sheer reinforcement learning; (c) the superiority of the bidirectional planner vs. the forward planner is confirmed for the multi-goal task

    Planning with neural networks and reinforcement learning

    Get PDF
    This thesis presents the design, implementation and investigation of some predictive-planning controllers built with neural-networks and inspired by Dyna-PI architectures (Sutton, 1990). Dyna-PI architectures are planning systems based on actor-critic reinforcement learning methods and a model of the environment. The controllers are tested with a simulated robot that solves a stochastic path-finding landmark navigation task. A critical review of ideas and models proposed by the literature on problem solving, planning, reinforcement learning, and neural networks precedes the presentation of the controllers. The review isolates ideas relevant to the design of planners based on neural networks. A "neural forward planner" is implemented that, unlike the Dyna-PI architectures, is taskable in a strong sense. This planner is capable of building a "partial policy" focussed on around efficient start-goal paths, and is capable of deciding to re-plan if "unexpected" states are encountered. Planning iteratively generates "chains of predictions" starting from the current state and using the model of the environment. This model is made up by some neural networks trained to predict the next input when an action is executed. A "neural bidirectional planner" that generates trajectories backward from the goal and forward from the current state is also implemented. This planner exploits the knowledge (image) on the goal, further focuses planning around efficient start-goal paths, and produces a quicker updating of evaluations. In several experiments the generalisation capacity of neural networks proves important for learning but it also causes problems of interference. To deal with these problems a modular neural architecture is implemented, that uses a mixture of experts network for the critic, and a simple hierarchical modular network for the actor. The research also implements a simple form of neural abstract planning named "coarse planning", and investigates its strengths in terms of exploration and evaluations\u27 updating. Some experiments with coarse planning and with other controllers suggest that discounted reinforcement learning may have problems dealing with long-lasting tasks

    Dynamics of dopamine signaling and network activity in the striatum during learning and motivated pursuit of goals

    Get PDF
    Thesis (Ph. D. in Neuroscience)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2013.Cataloged from PDF version of thesis. "February 2013."Includes bibliographical references (p. 118-126).Learning to direct behaviors towards goals is a central function of all vertebrate nervous systems. Initial learning often involves an exploratory phase, in which actions are flexible and highly variable. With repeated successful experience, behaviors may be guided by cues in the environment that reliably predict the desired outcome, and eventually behaviors can be executed as crystallized action sequences, or "habits", which are relatively inflexible. Parallel circuits through the basal ganglia and their inputs from midbrain dopamine neurons are believed to make critical contributions to these phases of learning and behavioral execution. To explore the neural mechanisms underlying goal-directed learning and behavior, I have employed electrophysiological and electrochemical techniques to measure neural activity and dopamine release in networks of the striatum, the principle input nucleus of the basal ganglia as rats learned to pursue rewards in mazes. The electrophysiological recordings revealed training dependent dynamics in striatum local field potentials and coordinated neural firing that may differentially support both network rigidity and flexibility during pursuit of goals. Electrochemical measurements of real-time dopamine signaling during maze running revealed prolonged signaling changes that may contribute to motivating or guiding behavior. Pathological over or under-expression of these network states may contribute to symptoms experienced in a range of basal ganglia disorders, from Parkinson's disease to drug addiction.by Mark W. Howe.Ph.D.in Neuroscienc

    Modèle informatique du coapprentissage des ganglions de la base et du cortex : l'apprentissage par renforcement et le développement de représentations

    Full text link
    Tout au long de la vie, le cerveau développe des représentations de son environnement permettant à l’individu d’en tirer meilleur profit. Comment ces représentations se développent-elles pendant la quête de récompenses demeure un mystère. Il est raisonnable de penser que le cortex est le siège de ces représentations et que les ganglions de la base jouent un rôle important dans la maximisation des récompenses. En particulier, les neurones dopaminergiques semblent coder un signal d’erreur de prédiction de récompense. Cette thèse étudie le problème en construisant, à l’aide de l’apprentissage machine, un modèle informatique intégrant de nombreuses évidences neurologiques. Après une introduction au cadre mathématique et à quelques algorithmes de l’apprentissage machine, un survol de l’apprentissage en psychologie et en neuroscience et une revue des modèles de l’apprentissage dans les ganglions de la base, la thèse comporte trois articles. Le premier montre qu’il est possible d’apprendre à maximiser ses récompenses tout en développant de meilleures représentations des entrées. Le second article porte sur l'important problème toujours non résolu de la représentation du temps. Il démontre qu’une représentation du temps peut être acquise automatiquement dans un réseau de neurones artificiels faisant office de mémoire de travail. La représentation développée par le modèle ressemble beaucoup à l’activité de neurones corticaux dans des tâches similaires. De plus, le modèle montre que l’utilisation du signal d’erreur de récompense peut accélérer la construction de ces représentations temporelles. Finalement, il montre qu’une telle représentation acquise automatiquement dans le cortex peut fournir l’information nécessaire aux ganglions de la base pour expliquer le signal dopaminergique. Enfin, le troisième article évalue le pouvoir explicatif et prédictif du modèle sur différentes situations comme la présence ou l’absence d’un stimulus (conditionnement classique ou de trace) pendant l’attente de la récompense. En plus de faire des prédictions très intéressantes en lien avec la littérature sur les intervalles de temps, l’article révèle certaines lacunes du modèle qui devront être améliorées. Bref, cette thèse étend les modèles actuels de l’apprentissage des ganglions de la base et du système dopaminergique au développement concurrent de représentations temporelles dans le cortex et aux interactions de ces deux structures.Throughout lifetime, the brain develops abstract representations of its environment that allow the individual to maximize his benefits. How these representations are developed while trying to acquire rewards remains a mystery. It is reasonable to assume that these representations arise in the cortex and that the basal ganglia are playing an important role in reward maximization. In particular, dopaminergic neurons appear to code a reward prediction error signal. This thesis studies the problem by constructing, using machine learning tools, a computational model that incorporates a number of relevant neurophysiological findings. After an introduction to the machine learning framework and to some of its algorithms, an overview of learning in psychology and neuroscience, and a review of models of learning in the basal ganglia, the thesis comprises three papers. The first article shows that it is possible to learn a better representation of the inputs while learning to maximize reward. The second paper addresses the important and still unresolved problem of the representation of time in the brain. The paper shows that a time representation can be acquired automatically in an artificial neural network acting like a working memory. The representation learned by the model closely resembles the activity of cortical neurons in similar tasks. Moreover, the model shows that the reward prediction error signal could accelerate the development of the temporal representation. Finally, it shows that if such a learned representation exists in the cortex, it could provide the necessary information to the basal ganglia to explain the dopaminergic signal. The third article evaluates the explanatory and predictive power of the model on the effects of differences in task conditions such as the presence or absence of a stimulus (classical versus trace conditioning) while waiting for the reward. Beyond making interesting predictions relevant to the timing literature, the paper reveals some shortcomings of the model that will need to be resolved. In summary, this thesis extends current models of reinforcement learning of the basal ganglia and the dopaminergic system to the concurrent development of representation in the cortex and to the interactions between these two regions
    corecore