36 research outputs found

    Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcement driving both action acquisition and reward maximization: A simulated robotic study

    Get PDF
    An important issue of recent neuroscientific research is to understand the functional role of the phasic release of dopamine in the striatum, and in particular its relation to reinforcement learning. The literature is split between two alternative hypotheses: one considers phasic dopamine as a reward prediction error similar to the computational TD-error, whose function is to guide an animal to maximize future rewards; the other holds that phasic dopamine is a sensory prediction error signal that lets the animal discover and acquire novel actions. In this paper we propose an original hypothesis that integrates these two contrasting positions: according to our view phasic dopamine represents a TD-like reinforcement prediction error learning signal determined by both unexpected changes in the environment (temporary, intrinsic reinforcements) and biological rewards (permanent, extrinsic reinforcements). Accordingly, dopamine plays the functional role of driving both the discovery and acquisition of novel actions and the maximization of future rewards. To validate our hypothesis we perform a series of experiments with a simulated robotic system that has to learn different skills in order to get rewards. We compare different versions of the system in which we vary the composition of the learning signal. The results show that only the system reinforced by both extrinsic and intrinsic reinforcements is able to reach high performance in sufficiently complex conditions

    Cumulative learning through intrinsic reinforcements

    Get PDF
    Building artificial agents able to autonomously learn new skills and to easily adapt in different and complex environments is an important goal for robotics and machine learning. We propose that providing reinforcement learning artificial agents with a learning signal that resembles the charac- teristic of the phasic activations of dopaminergic neurons would be an advancement in the development of more autonomous and versatile systems. In particular, we suggest that the particular composition of such a signal, determined by both extrinsic and intrinsic reinforcements, would be suitable to improve the implementation of cumulative learning in artificial agents. To validate our hypothesis we performed experiments with a simulated robotic system that has to learn different skills to obtain extrinsic rewards. We compare different versions of the system varying the composition of the learning signal and we show that the only system able to reach high performance in the task is the one that implements the learning signal suggested by our hypothesis

    Autonomous learning of multiple skills through intrinsic motivations: A study with computational embodied models

    Get PDF
    The experimental works presented in this thesis have been carried out and published together with my supervisor Marco Mirolli and with Gianluca Baldassarre. In particular, chapter 3 is adapted from (Mirolli et al. 2013, Santucci et al. 2014b, 2010, 2012a); chapter 4 is adapted from (Santucci et al. 2013b,a); chapter 5 is adapted from (Santucci et al. 2014a); and chapter 6 is adapted from (Santucci et al. 2016). Moreover, (Santucci et al. 2012b) focused on the same topics of the research described in chapter 4. Since the experiments presented in that paper constitute preliminary results obtained in a simplified experimental scenario, they have not been included in this thesis. However, the insights provided by that work have been used in the research presented in chapter 4.Developing artificial agents able to autonomously discover new goals, to select them and learn the related skills is an important challenge for robotics. This becomes even crucial if we want robots to interact with real environments where they have to face many unpredictable problems and where it is not clear which skills will be the more suitable to solve them. The ability to learn and store multiple skills in order to use them when required is one of the main characteristics of biological agents: forming ample repertoires of actions is important to widen the possibility for an agent to better adapt to different environments and to improve its chance of survival and reproduction. Moreover, humans and other mammals explore the environment and learn new skills not only on the basis of reward-related stimuli but also on the basis of novel or unexpected neutral stimuli. The mechanisms related to this kind of learning processes have been studied under the heading of “Intrinsic Motivations” (IMs), and in the last decades the concept of IMs have been used in developmental and autonomous robotics to foster an artificial curiosity that can improve the autonomy and versatility of artificial agents. In the research presented in this thesis I focus on the development of open-ended learning robots able to autonomously discover interesting events in the environment and autonomously learn the skills necessary to reproduce those events. In particular, this research focuses on the role that IMs can play in fostering those processes and in improving the autonomy and versatility of artificial agents. Taking inspiration from recent and past research in this field, I tackle some of the interesting open challenges related to IMs and to the implementation of intrinsically motivated robots. I first focus on the neurophysiology underlying IM learning signals, and in particular on the relations between IMs and phasic dopamine (DA). With the support of a first computational model, I propose a new hypothesis that addresses the dispute over the nature and the functions of phasic DA activations: reconciling two contrasting theories in the literature and taking xi into account the different experimental data, I suggest that phasic DA can be considered as a reinforcement prediction error learning signal determined by both unexpected changes in the environment (temporary, intrinsic reinforcements) and biological rewards (permanent, extrinsic reinforcements). The results obtained with my computational model support the presented hypothesis, showing how such a learning signal can serve two important functions: driving both the discovery and acquisition of novel actions and the maximisation of rewards. Moreover, those results provide a first example of the power of IMs to guide artificial agents in the cumulative learning of complex behaviours that would not be learnt simply providing a direct reward for the final tasks. In a second work, I move to investigate the issues related to the implementation of IMs signal in robots. Since the literature still lacks a specific analysis of which is the best IM signal to drive skill acquisition, I compare in a robotic setup different typologies of IMs, as well as the different mechanisms used to implement them. The results provide two important contributions: 1) they show how IM signals based on the competence of the system are able to generate a better guidance for skill acquisition with respect to the signals based on the knowledge of the agent; 2) they identify a proper mechanism to generate a competence-based IM signal, showing that the stronger the link between the IM signal and the competence of the system, the better the performance. Following the aim of widening the autonomy and the versatility of artificial agents, in a third work I focus on the improvement of the control architecture of the robot. I build a new 3-level architecture that allows the system to select the goals to pursue, to search for the best way to achieve them, and acquire the related skills. I implement this architecture in a simulated iCub robot and test it in a 3D experimental scenario where the agent has to learn, on the basis of IMs, a reaching task where it is not clear which arm of the robot is the most suitable to reach the different targets. The performance of the system is compared to the one of my previous 2-level architecture system, where tasks and computational resources are associated at design time. The better performance of the system endowed with the new 3-level architecture highlights the importance of developing robots with different levels of autonomy, and in particular both the high-level of goal selection and the low-level of motor control. Finally, I focus on a crucial issue for autonomous robotics: the development of a system that is able not only to select its own goals, but also to discover them through the interaction with the environment. In the last work I present GRAIL, a Goal-discovering Robotic Architecture for Intrisically-motivated Learning. Building on the insights provided by my previous research, GRAIL is a 4-level hierarchical architecture that for the first time assembles in unique system different features necessary for the development of truly autonomous robots. GRAIL is able to autonomously 1) discover new goals, 2) create and store representations of the events associated to those goals, 3) select the goal to pursue, 4) select the computational resources to learn to achieve the desired goal, and 5) self-generate its own learning signals on the basis of the achievement of the selected goals. I implement GRAIL in a simulated iCub and test it in three different 3D experimental setup, comparing its performance to my previous systems, showing its capacity to generate new goals in unknown scenarios, and testing its ability to cope with stochastic environments. The experiments highlight on the one hand the importance of an appropriate hierarchical architecture for supporting the development of autonomous robots, and on the other hand how IMs (together with goals) can play a crucial role in the autonomous learning of multiple skills

    Biological cumulative learning through intrinsic motivations: a simulated robotic study on development of visually-guided reaching

    Get PDF
    This work aims to model the ability of biological organisms to achieve cumulative learning, i.e. to learn increasingly more complex skills on the basis of simpler ones. In particular, we studied how a simulated kinematic robotic system composed of an arm and an eye can learn the ability to reach for an object on the basis of the ability to systematically look at the object, which, in our set-up, represented a prerequisite for the reaching task. We designed the system by following several biological constraints and investigated which kind of sub-task reinforcements might facilitate the development of the final skill. We found that the performance in the reaching task was optimized when the reinforcement signal included not only the extrinsic reinforcement provided by touching the object but also an intrinsic reinforcement given by the error in the prediction of fovea activation. We discuss how these results might explain biological data regarding the neural basis of action discovery and reinforcement earning, in particular with respect to the neuromodulator dopamine

    Which is the best intrinsic motivation signal for learning multiple skills?

    Get PDF
    Humans and other biological agents are able to autonomously learn and cache different skills in the absence of any biological pressure or any assigned task. In this respect, Intrinsic Motivations (i.e., motivations not connected to reward-related stimuli) play a cardinal role in animal learning, and can be considered as a fundamental tool for developing more autonomous and more adaptive artificial agents. In this work, we provide an exhaustive analysis of a scarcely investigated problem: which kind of IM reinforcement signal is the most suitable for driving the acquisition of multiple skills in the shortest time? To this purpose we implemented an artificial agent with a hierarchical architecture that allows to learn and cache different skills. We tested the system in a setup with continuous states and actions, in particular, with a kinematic robotic arm that has to learn different reaching tasks. We compare the results of different versions of the system driven by several different intrinsic motivation signals. The results show (a) that intrinsic reinforcements purely based on the knowledge of the system are not appropriate to guide the acquisition of multiple skills, and (b) that the stronger the link between the IM signal and the competence of the system, the better the performance

    A bio-inspired learning signal for the cumulative learning of different skills

    Get PDF
    Building artificial agents able to autonomously learn new skills and to easily adapt in different and complex environments is an important goal for robotics and machine learning. We propose that providing artificial agents with a learning signal that resembles the characteristic of the phasic activations of dopaminergic neurons would be an advancement in the development of more autonomous and versatile systems. In particular, we suggest that the particular composition of such a signal, determined both by intrinsic and extrinsic reinforcements, would be suitable to improve the implementation of cumulative learning. To validate our hypothesis we performed some experiments with a simulated robotic system that has to learn different skills to obtain rewards. We compared different versions of the system varying the composition of the learning signal and we show that only the system that implements our hypothesis is able to reach high performance in the task

    Intrinsic motivation signals for driving the acquisition of multiple tasks: A simulated robotic study

    Get PDF
    Intrinsic Motivations (i.e motivations not connected to rewardrelated stimuli) drive humans and other biological agents to autonomously learn different skills in absence of any biological pressure or any assigned task. In this paper we investigate which is the best learning signal for driving the training of different tasks in a modular architecture controlling a simulated kinematic robotic arm that has to reach for different objects. We compare the performance of the system varying the Intrinsic Motivation signal and we show how a Task Predictor whose learning process is strictly connected to the competence of the system in the tasks is able to generate the most suitable signal for the autonomous learning of multiple skills
    corecore