18 research outputs found

    Closing the loop between neural network simulators and the OpenAI Gym

    Full text link
    Since the enormous breakthroughs in machine learning over the last decade, functional neural network models are of growing interest for many researchers in the field of computational neuroscience. One major branch of research is concerned with biologically plausible implementations of reinforcement learning, with a variety of different models developed over the recent years. However, most studies in this area are conducted with custom simulation scripts and manually implemented tasks. This makes it hard for other researchers to reproduce and build upon previous work and nearly impossible to compare the performance of different learning architectures. In this work, we present a novel approach to solve this problem, connecting benchmark tools from the field of machine learning and state-of-the-art neural network simulators from computational neuroscience. This toolchain enables researchers in both fields to make use of well-tested high-performance simulation software supporting biologically plausible neuron, synapse and network models and allows them to evaluate and compare their approach on the basis of standardized environments of varying complexity. We demonstrate the functionality of the toolchain by implementing a neuronal actor-critic architecture for reinforcement learning in the NEST simulator and successfully training it on two different environments from the OpenAI Gym

    Population-coding and Dynamic-neurons improved Spiking Actor Network for Reinforcement Learning

    Full text link
    With the Deep Neural Networks (DNNs) as a powerful function approximator, Deep Reinforcement Learning (DRL) has been excellently demonstrated on robotic control tasks. Compared to DNNs with vanilla artificial neurons, the biologically plausible Spiking Neural Network (SNN) contains a diverse population of spiking neurons, making it naturally powerful on state representation with spatial and temporal information. Based on a hybrid learning framework, where a spike actor-network infers actions from states and a deep critic network evaluates the actor, we propose a Population-coding and Dynamic-neurons improved Spiking Actor Network (PDSAN) for efficient state representation from two different scales: input coding and neuronal coding. For input coding, we apply population coding with dynamically receptive fields to directly encode each input state component. For neuronal coding, we propose different types of dynamic-neurons (containing 1st-order and 2nd-order neuronal dynamics) to describe much more complex neuronal dynamics. Finally, the PDSAN is trained in conjunction with deep critic networks using the Twin Delayed Deep Deterministic policy gradient algorithm (TD3-PDSAN). Extensive experimental results show that our TD3-PDSAN model achieves better performance than state-of-the-art models on four OpenAI gym benchmark tasks. It is an important attempt to improve RL with SNN towards the effective computation satisfying biological plausibility.Comment: 27 pages, 11 figures, accepted by Journal of Neural Network

    A Compositionality Machine Realized by a Hierarchic Architecture of Synfire Chains

    Get PDF
    The composition of complex behavior is thought to rely on the concurrent and sequential activation of simpler action components, or primitives. Systems of synfire chains have previously been proposed to account for either the simultaneous or the sequential aspects of compositionality; however, the compatibility of the two aspects has so far not been addressed. Moreover, the simultaneous activation of primitives has up until now only been investigated in the context of reactive computations, i.e., the perception of stimuli. In this study we demonstrate how a hierarchical organization of synfire chains is capable of generating both aspects of compositionality for proactive computations such as the generation of complex and ongoing action. To this end, we develop a network model consisting of two layers of synfire chains. Using simple drawing strokes as a visualization of abstract primitives, we map the feed-forward activity of the upper level synfire chains to motion in two-dimensional space. Our model is capable of producing drawing strokes that are combinations of primitive strokes by binding together the corresponding chains. Moreover, when the lower layer of the network is constructed in a closed-loop fashion, drawing strokes are generated sequentially. The generated pattern can be random or deterministic, depending on the connection pattern between the lower level chains. We propose quantitative measures for simultaneity and sequentiality, revealing a wide parameter range in which both aspects are fulfilled. Finally, we investigate the spiking activity of our model to propose candidate signatures of synfire chain computation in measurements of neural activity during action execution

    Reinforcement Learning on Slow Features of High-Dimensional Input Streams

    Get PDF
    Humans and animals are able to learn complex behaviors based on a massive stream of sensory information from different modalities. Early animal studies have identified learning mechanisms that are based on reward and punishment such that animals tend to avoid actions that lead to punishment whereas rewarded actions are reinforced. However, most algorithms for reward-based learning are only applicable if the dimensionality of the state-space is sufficiently small or its structure is sufficiently simple. Therefore, the question arises how the problem of learning on high-dimensional data is solved in the brain. In this article, we propose a biologically plausible generic two-stage learning system that can directly be applied to raw high-dimensional input streams. The system is composed of a hierarchical slow feature analysis (SFA) network for preprocessing and a simple neural network on top that is trained based on rewards. We demonstrate by computer simulations that this generic architecture is able to learn quite demanding reinforcement learning tasks on high-dimensional visual input streams in a time that is comparable to the time needed when an explicit highly informative low-dimensional state-space representation is given instead of the high-dimensional visual input. The learning speed of the proposed architecture in a task similar to the Morris water maze task is comparable to that found in experimental studies with rats. This study thus supports the hypothesis that slowness learning is one important unsupervised learning principle utilized in the brain to form efficient state representations for behavioral learning

    Rare neural correlations implement robotic conditioning with delayed rewards and disturbances

    Get PDF
    Neural conditioning associates cues and actions with following rewards. The environments in which robots operate, however, are pervaded by a variety of disturbing stimuli and uncertain timing. In particular, variable reward delays make it difficult to reconstruct which previous actions are responsible for following rewards. Such an uncertainty is handled by biological neural networks, but represents a challenge for computational models, suggesting the lack of a satisfactory theory for robotic neural conditioning. The present study demonstrates the use of rare neural correlations in making correct associations between rewards and previous cues or actions. Rare correlations are functional in selecting sparse synapses to be eligible for later weight updates if a reward occurs. The repetition of this process singles out the associating and reward-triggering pathways, and thereby copes with distal rewards. The neural network displays macro-level classical and operant conditioning, which is demonstrated in an interactive real-life human-robot interaction. The proposed mechanism models realistic conditioning in humans and animals and implements similar behaviors in neuro-robotic platforms

    Synchronicity:The Role of Midbrain Dopamine in Whole-Brain Coordination

    Get PDF
    Midbrain dopamine seems to play an outsized role in motivated behavior and learning. Widely associated with mediating reward-related behavior, decision making, and learning, dopamine continues to generate controversies in the field. While many studies and theories focus on what dopamine cells encode, the question of how the midbrain derives the information it encodes is poorly understood and comparatively less addressed. Recent anatomical studies suggest greater diversity and complexity of afferent inputs than previously appreciated, requiring rethinking of prior models. Here, we elaborate a hypothesis that construes midbrain dopamine as implementing a Bayesian selector in which individual dopamine cells sample afferent activity across distributed brain substrates, comprising evidence to be evaluated on the extent to which stimuli in the on-going sensorimotor stream organizes distributed, parallel processing, reflecting implicit value. To effectively generate a temporally resolved phasic signal, a population of dopamine cells must exhibit synchronous activity. We argue that synchronous activity across a population of dopamine cells signals consensus across distributed afferent substrates, invigorating responding to recognized opportunities and facilitating further learning. In framing our hypothesis, we shift from the question of how value is computed to the broader question of how the brain achieves coordination across distributed, parallel processing. We posit the midbrain is part of an “axis of agency” in which the prefrontal cortex (PFC), basal ganglia (BGS), and midbrain form an axis mediating control, coordination, and consensus, respectively

    Dopamine, affordance and active inference.

    Get PDF
    The role of dopamine in behaviour and decision-making is often cast in terms of reinforcement learning and optimal decision theory. Here, we present an alternative view that frames the physiology of dopamine in terms of Bayes-optimal behaviour. In this account, dopamine controls the precision or salience of (external or internal) cues that engender action. In other words, dopamine balances bottom-up sensory information and top-down prior beliefs when making hierarchical inferences (predictions) about cues that have affordance. In this paper, we focus on the consequences of changing tonic levels of dopamine firing using simulations of cued sequential movements. Crucially, the predictions driving movements are based upon a hierarchical generative model that infers the context in which movements are made. This means that we can confuse agents by changing the context (order) in which cues are presented. These simulations provide a (Bayes-optimal) model of contextual uncertainty and set switching that can be quantified in terms of behavioural and electrophysiological responses. Furthermore, one can simulate dopaminergic lesions (by changing the precision of prediction errors) to produce pathological behaviours that are reminiscent of those seen in neurological disorders such as Parkinson's disease. We use these simulations to demonstrate how a single functional role for dopamine at the synaptic level can manifest in different ways at the behavioural level

    Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail

    Get PDF
    Changes of synaptic connections between neurons are thought to be the physiological basis of learning. These changes can be gated by neuromodulators that encode the presence of reward. We study a family of reward-modulated synaptic learning rules for spiking neurons on a learning task in continuous space inspired by the Morris Water maze. The synaptic update rule modifies the release probability of synaptic transmission and depends on the timing of presynaptic spike arrival, postsynaptic action potentials, as well as the membrane potential of the postsynaptic neuron. The family of learning rules includes an optimal rule derived from policy gradient methods as well as reward modulated Hebbian learning. The synaptic update rule is implemented in a population of spiking neurons using a network architecture that combines feedforward input with lateral connections. Actions are represented by a population of hypothetical action cells with strong mexican-hat connectivity and are read out at theta frequency. We show that in this architecture, a standard policy gradient rule fails to solve the Morris watermaze task, whereas a variant with a Hebbian bias can learn the task within 20 trials, consistent with experiments. This result does not depend on implementation details such as the size of the neuronal populations. Our theoretical approach shows how learning new behaviors can be linked to reward-modulated plasticity at the level of single synapses and makes predictions about the voltage and spike-timing dependence of synaptic plasticity and the influence of neuromodulators such as dopamine. It is an important step towards connecting formal theories of reinforcement learning with neuronal and synaptic properties
    corecore