17,251 research outputs found

    Discounting of reward sequences: a test of competing formal models of hyperbolic discounting

    Get PDF
    Humans are known to discount future rewards hyperbolically in time. Nevertheless, a formal recursive model of hyperbolic discounting has been elusive until recently, with the introduction of the hyperbolically discounted temporal difference (HDTD) model. Prior to that, models of learning (especially reinforcement learning) have relied on exponential discounting, which generally provides poorer fits to behavioral data. Recently, it has been shown that hyperbolic discounting can also be approximated by a summed distribution of exponentially discounted values, instantiated in the μAgents model. The HDTD model and the μAgents model differ in one key respect, namely how they treat sequences of rewards. The μAgents model is a particular implementation of a Parallel discounting model, which values sequences based on the summed value of the individual rewards whereas the HDTD model contains a non-linear interaction. To discriminate among these models, we observed how subjects discounted a sequence of three rewards, and then we tested how well each candidate model fit the subject data. The results show that the Parallel model generally provides a better fit to the human data

    Learning obstacle avoidance with an operant behavioral model

    Get PDF
    Artificial intelligence researchers have been attracted by the idea of having robots learn how to accomplish a task, rather than being told explicitly. Reinforcement learning has been proposed as an appealing framework to be used in controlling mobile agents. Robot learning research, as well as research in biological systems, face many similar problems in order to display high flexibility in performing a variety of tasks. In this work, the controlling of a vehicle in an avoidance task by a previously developed operant learning model (a form of animal learning) is studied. An environment in which a mobile robot with proximity sensors has to minimize the punishment for colliding against obstacles is simulated. The results were compared with the Q-Learning algorithm, and the proposed model had better performance. In this way a new artificial intelligence agent inspired by neurobiology, psychology, and ethology research is proposed.Fil: Gutnisky, D. A.. Universidad de Buenos Aires. Facultad de Ingeniería.Instituto de Ingeniería Biomédica; ArgentinaFil: Zanutto, Bonifacio Silvano. Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Biología y Medicina Experimental. Fundación de Instituto de Biología y Medicina Experimental. Instituto de Biología y Medicina Experimental; Argentina. Universidad de Buenos Aires. Facultad de Ingeniería.Instituto de Ingeniería Biomédica; Argentin
    corecore