23,314 research outputs found

    A neural network model of adaptively timed reinforcement learning and hippocampal dynamics

    Full text link
    A neural model is described of how adaptively timed reinforcement learning occurs. The adaptive timing circuit is suggested to exist in the hippocampus, and to involve convergence of dentate granule cells on CA3 pyramidal cells, and NMDA receptors. This circuit forms part of a model neural system for the coordinated control of recognition learning, reinforcement learning, and motor learning, whose properties clarify how an animal can learn to acquire a delayed reward. Behavioral and neural data are summarized in support of each processing stage of the system. The relevant anatomical sites are in thalamus, neocortex, hippocampus, hypothalamus, amygdala, and cerebellum. Cerebellar influences on motor learning are distinguished from hippocampal influences on adaptive timing of reinforcement learning. The model simulates how damage to the hippocampal formation disrupts adaptive timing, eliminates attentional blocking, and causes symptoms of medial temporal amnesia. It suggests how normal acquisition of subcortical emotional conditioning can occur after cortical ablation, even though extinction of emotional conditioning is retarded by cortical ablation. The model simulates how increasing the duration of an unconditioned stimulus increases the amplitude of emotional conditioning, but does not change adaptive timing; and how an increase in the intensity of a conditioned stimulus "speeds up the clock", but an increase in the intensity of an unconditioned stimulus does not. Computer simulations of the model fit parametric conditioning data, including a Weber law property and an inverted U property. Both primary and secondary adaptively timed conditioning are simulated, as are data concerning conditioning using multiple interstimulus intervals (ISIs), gradually or abruptly changing ISis, partial reinforcement, and multiple stimuli that lead to time-averaging of responses. Neurobiologically testable predictions are made to facilitate further tests of the model.Air Force Office of Scientific Research (90-0175, 90-0128); Defense Advanced Research Projects Agency (90-0083); National Science Foundation (IRI-87-16960); Office of Naval Research (N00014-91-J-4100

    Metabotropic Glutamate Receptor Activation in Cerebelar Purkinje Cells as Substrate for Adaptive Timing of the Classicaly Conditioned Eye Blink Response

    Full text link
    To understand how the cerebellum adaptively times the classically conditioned nictitating membrane response (NMR), a model of the metabotropic glutamate receptor (mGluR) second messenger system in cerebellar Purkinje cells is constructed. In the model slow responses, generated postsynaptically by mGluR-mediated phosphoinositide hydrolysis, and calcium release from intracellular stores, bridge the interstimulus interval (ISI) between the onset of parallel fiber activity associated with the conditioned stimulus (CS) and climbing fiber activity associated with unconditioned stimulus (US) onset. Temporal correlation of metabotropic responses and climbing fiber signals produces persistent phosphorylation of both AMPA receptors and Ca2+-dependent K+ channels. This is responsible for long-term depression (LTD) of AMPA receptors. The phosphorylation of Ca2+-dependent K+ channels leads to a reduction in baseline membrane potential and a reduction of Purkinje cell population firing during the CS-US interval. The Purkinje cell firing decrease disinhibits cerebellar nuclear cells which then produce an excitatory response corresponding to the learned movement. Purkinje cell learning times the response, while nuclear cell learning can calibrate it. The model reproduces key features of the conditioned rabbit NMR: Purkinje cell population response is properly timed, delay conditioning occurs for ISIs of up to four seconds while trace conditioning occurs only at shorter ISIs, mixed training at two different ISis produces a double-peaked response, and ISIs of 200-400ms produce maximal responding. Biochemical similarities between timed cerebellar learning and photoreceptor transduction, and circuit similarities between the timed cerebellar circuit and a timed dentate-CA3 hippocampal circuit, are noted.Office of Naval Research (N00014- 92-J-4015, N00014-92-J-1309, N00014-95-1-0409); Air Force Office of Scientific Research (F49620-92-J-0225);National Science Foundation (IRI-90-24877

    Timing in trace conditioning of the nictitating membrane response of the rabbit (Oryctolagus cuniculus) : scalar, nonscalar, and adaptive features

    Get PDF
    Using interstimulus intervals (ISIs) of 125, 250, and 500 msec in trace conditioning of the rabbit nictitating membrane response, the offset times and durations of conditioned responses (CRs) were collected along with onset and peak latencies. All measures were proportional to the ISI, but only onset and peak latencies conformed to the criterion for scalar timing. Regarding the CR’s possible protective overlap of the unconditioned stimulus (US), CR duration increased with ISI, while the peak’s alignment with the US declined. Implications for models of timing and CR adaptiveness are discussed

    A Neural Model of Timed Response Learning in the Cerebellum

    Full text link
    A spectral timing model is developed to explain how the cerebellum learns adaptively timed responses during the rabbit's conditioned nictitating membrane response (NMR). The model posits two learning sites that respectively enable conditioned excitation and timed disinhibition of the response. Long-term potentiation of mossy fiber pathways projecting to interpositus nucleus cells allows conditioned excitation of the response's adaptive gain. Long-term depression of parallel fiber- Purkinje cell synapses in the cerebellar cortex allows learning of an adaptively timed reduction in Purkinje cell inhibition of the same nuclear cells. A spectrum of partially timed responses summate to generate an accurately timed population response. In agreement with physiological data, the model Purkinje cell activity decreases in the interval following the onset of the conditioned stimulus, and nuclear cell responses match conditioned response (CR) topography. The model reproduces key behavioral features of the NMR, including the properties that CR peak amplitude occurs at the unconditioned stimulus (US) onset, a discrete CR peak shift occurs with a change in interstimulus interval (ISI) between conditioned stim- ulus (CS) and US, mixed training at two different ISis produces a double-peaked CR, CR acquisition and rate of responding depend unimodally on the lSI, CR onset latency decreases during training, and maladaptively-timed, small-amplitude CRs result from ablation of cerebellar cortex.National Science Foundation (IRI-90-24877); Office of Naval Research (N00014-92-J-1309); Air Force Office of Scientific Research (F49620-92-J-0225

    A Neural Model of Biased Oscillations in Aplysia Head-Waving Behavior

    Full text link
    A long-term bias in the exploratory head-waving behavior of Aplysia can be induced using bright lights as an aversive stimulus: coupling onset of the lights with head movements to one side results in a bias away from that side (Cook & Carew, 1986). This bias has been interpreted as a form of operant conditioning, and has previously been simulated with a neural network model based on associative synaptic facilitation (Raymond, Baxter, Buonomano, & Byrne, 1992). In this article we simulate the head-waving behavior using a recurrent gated dipole, a nonlinear dynamical neural model that has previously been used to explain various data including oscillatory behavior in biological pacemakers. Within the recurrent gated dipole, two channels operate antagonistically to generate oscillations, which drive the side-to-side head waving. The frequency of oscillations depends on transmitter mobilization dynamics, which exhibit both short- and long-term adaptation. We assume that light onset results in a nonspecific increase in arousal to both channels of the dipole. Repeated pairing of arousal increments with activation of one channel (the "reinforced" channel) of the dipole leads to a bias in transmitter dynamics, which causes the oscillation to last a shorter time on the reinforced channel than on the non-reinforced channel. Our model provides a parsimonious explanation of the observed behavior, and it avoids some of the unexpected results obtained with the Raymond et al. model. In addition, our model makes predictions concerning the rate of onset and extinction of the biases, and it suggests new lines of experimentation to test the nature of the head-waving behavior.Office of Naval Research (N00014-92-J-4015, N00014-91-J-4100, N0014-92-J-1309); Air Force Office of Scientific Research (F49620-92-J-0499); A.P. Sloan Foundation (BR-3122

    Learning obstacle avoidance with an operant behavioral model

    Get PDF
    Artificial intelligence researchers have been attracted by the idea of having robots learn how to accomplish a task, rather than being told explicitly. Reinforcement learning has been proposed as an appealing framework to be used in controlling mobile agents. Robot learning research, as well as research in biological systems, face many similar problems in order to display high flexibility in performing a variety of tasks. In this work, the controlling of a vehicle in an avoidance task by a previously developed operant learning model (a form of animal learning) is studied. An environment in which a mobile robot with proximity sensors has to minimize the punishment for colliding against obstacles is simulated. The results were compared with the Q-Learning algorithm, and the proposed model had better performance. In this way a new artificial intelligence agent inspired by neurobiology, psychology, and ethology research is proposed.Fil: Gutnisky, D. A.. Universidad de Buenos Aires. Facultad de Ingeniería.Instituto de Ingeniería Biomédica; ArgentinaFil: Zanutto, Bonifacio Silvano. Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Biología y Medicina Experimental. Fundación de Instituto de Biología y Medicina Experimental. Instituto de Biología y Medicina Experimental; Argentina. Universidad de Buenos Aires. Facultad de Ingeniería.Instituto de Ingeniería Biomédica; Argentin

    The hippocampus and cerebellum in adaptively timed learning, recognition, and movement

    Full text link
    The concepts of declarative memory and procedural memory have been used to distinguish two basic types of learning. A neural network model suggests how such memory processes work together as recognition learning, reinforcement learning, and sensory-motor learning take place during adaptive behaviors. To coordinate these processes, the hippocampal formation and cerebellum each contain circuits that learn to adaptively time their outputs. Within the model, hippocampal timing helps to maintain attention on motivationally salient goal objects during variable task-related delays, and cerebellar timing controls the release of conditioned responses. This property is part of the model's description of how cognitive-emotional interactions focus attention on motivationally valued cues, and how this process breaks down due to hippocampal ablation. The model suggests that the hippocampal mechanisms that help to rapidly draw attention to salient cues could prematurely release motor commands were not the release of these commands adaptively timed by the cerebellum. The model hippocampal system modulates cortical recognition learning without actually encoding the representational information that the cortex encodes. These properties avoid the difficulties faced by several models that propose a direct hippocampal role in recognition learning. Learning within the model hippocampal system controls adaptive timing and spatial orientation. Model properties hereby clarify how hippocampal ablations cause amnesic symptoms and difficulties with tasks which combine task delays, novelty detection, and attention towards goal objects amid distractions. When these model recognition, reinforcement, sensory-motor, and timing processes work together, they suggest how the brain can accomplish conditioning of multiple sensory events to delayed rewards, as during serial compound conditioning.Air Force Office of Scientific Research (F49620-92-J-0225, F49620-86-C-0037, 90-0128); Advanced Research Projects Agency (ONR N00014-92-J-4015); Office of Naval Research (N00014-91-J-4100, N00014-92-J-1309, N00014-92-J-1904); National Institute of Mental Health (MH-42900

    Intelligent control based on fuzzy logic and neural net theory

    Get PDF
    In the conception and design of intelligent systems, one promising direction involves the use of fuzzy logic and neural network theory to enhance such systems' capability to learn from experience and adapt to changes in an environment of uncertainty and imprecision. Here, an intelligent control scheme is explored by integrating these multidisciplinary techniques. A self-learning system is proposed as an intelligent controller for dynamical processes, employing a control policy which evolves and improves automatically. One key component of the intelligent system is a fuzzy logic-based system which emulates human decision making behavior. It is shown that the system can solve a fairly difficult control learning problem. Simulation results demonstrate that improved learning performance can be achieved in relation to previously described systems employing bang-bang control. The proposed system is relatively insensitive to variations in the parameters of the system environment
    corecore