690 research outputs found

    VIME: Variational Information Maximizing Exploration

    Full text link
    Scalable and effective exploration remains a key challenge in reinforcement learning (RL). While there are methods with optimality guarantees in the setting of discrete state and action spaces, these methods cannot be applied in high-dimensional deep RL scenarios. As such, most contemporary RL relies on simple heuristics such as epsilon-greedy exploration or adding Gaussian noise to the controls. This paper introduces Variational Information Maximizing Exploration (VIME), an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics. We propose a practical implementation, using variational inference in Bayesian neural networks which efficiently handles continuous state and action spaces. VIME modifies the MDP reward function, and can be applied with several different underlying RL algorithms. We demonstrate that VIME achieves significantly better performance compared to heuristic exploration methods across a variety of continuous control tasks and algorithms, including tasks with very sparse rewards.Comment: Published in Advances in Neural Information Processing Systems 29 (NIPS), pages 1109-111

    The Role of Conditional Independence in the Evolution of Intelligent Systems

    Full text link
    Systems are typically made from simple components regardless of their complexity. While the function of each part is easily understood, higher order functions are emergent properties and are notoriously difficult to explain. In networked systems, both digital and biological, each component receives inputs, performs a simple computation, and creates an output. When these components have multiple outputs, we intuitively assume that the outputs are causally dependent on the inputs but are themselves independent of each other given the state of their shared input. However, this intuition can be violated for components with probabilistic logic, as these typically cannot be decomposed into separate logic gates with one output each. This violation of conditional independence on the past system state is equivalent to instantaneous interaction --- the idea is that some information between the outputs is not coming from the inputs and thus must have been created instantaneously. Here we compare evolved artificial neural systems with and without instantaneous interaction across several task environments. We show that systems without instantaneous interactions evolve faster, to higher final levels of performance, and require fewer logic components to create a densely connected cognitive machinery.Comment: Original Abstract submitted to the GECCO conference 2017 Berli

    Significance of neural noise

    Get PDF

    The impact of aging on human brain network target controllability

    Full text link
    Understanding how few distributed areas can steer large-scale brain activity is a fundamental question that has practical implications, which range from inducing specific patterns of behavior to counteracting disease. Recent endeavors based on network controllability provided fresh insights into the potential ability of single regions to influence whole brain dynamics through the underlying structural connectome. However, controlling the entire brain activity is often unfeasible and might not always be necessary. The question whether single areas can control specific target subsystems remains crucial, albeit still poorly explored. Furthermore, the structure of the brain network exhibits progressive changes across the lifespan, but little is known about the possible consequences in the controllability properties. To address these questions, we adopted a novel target controllability approach that quantifies the centrality of brain nodes in controlling specific target anatomo-functional systems. We then studied such target control centrality in human connectomes obtained from healthy individuals aged from 5 to 85. Main results showed that the sensorimotor system has a high influencing capacity, but it is difficult for other areas to influence it. Furthermore, we reported that target control centrality varies with age and that temporal-parietal regions, whose cortical thinning is crucial in dementia-related diseases, exhibit lower values in older people. By simulating targeted attacks, such as those 19 occurring in focal stroke, we showed that the ipsilesional hemisphere is the most affected one regardless of the damaged area. Notably, such degradation in target control centrality was more evident in younger people, thus supporting early-vulnerability hypotheses after stroke

    The value of what’s to come: Neural mechanisms coupling prediction error and the utility of anticipation

    Get PDF
    Having something to look forward to is a keystone of well-being. Anticipation of future reward, such as an upcoming vacation, can often be more gratifying than the experience itself. Theories suggest the utility of anticipation underpins various behaviors, ranging from beneficial information-seeking to harmful addiction. However, how neural systems compute anticipatory utility remains unclear. We analyzed the brain activity of human participants as they performed a task involving choosing whether to receive information predictive of future pleasant outcomes. Using a computational model, we show three brain regions orchestrate anticipatory utility. Specifically, ventromedial prefrontal cortex tracks the value of anticipatory utility, dopaminergic midbrain correlates with information that enhances anticipation, while sustained hippocampal activity mediates a functional coupling between these regions. Our findings suggest a previously unidentified neural underpinning for anticipation’s influence over decision-making and unify a range of phenomena associated with risk and time-delay preference

    TĂ€tigkeitsbericht 2014-2016

    Get PDF

    TĂ€tigkeitsbericht 2017-2019/20

    Get PDF
    • 

    corecore