Search CORE

11,701 research outputs found

Discounting of reward sequences: a test of competing formal models of hyperbolic discounting

Author: Alexander William
Brown Joshua W
Zarr Noah
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2014
Field of study

Humans are known to discount future rewards hyperbolically in time. Nevertheless, a formal recursive model of hyperbolic discounting has been elusive until recently, with the introduction of the hyperbolically discounted temporal difference (HDTD) model. Prior to that, models of learning (especially reinforcement learning) have relied on exponential discounting, which generally provides poorer fits to behavioral data. Recently, it has been shown that hyperbolic discounting can also be approximated by a summed distribution of exponentially discounted values, instantiated in the μAgents model. The HDTD model and the μAgents model differ in one key respect, namely how they treat sequences of rewards. The μAgents model is a particular implementation of a Parallel discounting model, which values sequences based on the summed value of the individual rewards whereas the HDTD model contains a non-linear interaction. To discriminate among these models, we observed how subjects discounted a sequence of three rewards, and then we tested how well each candidate model fit the subject data. The results show that the Parallel model generally provides a better fit to the human data

Ghent University Academic Bibliography

Directory of Open Access Journals

PubMed Central

Frontiers - Publisher Connector

Geometry of Policy Improvement

Author: JN Tsitsiklis
M Hutter
N Ay
RS Sutton
S Kakade
SM Ross
Publication venue
Publication date: 06/04/2017
Field of study

We investigate the geometry of optimal memoryless time independent decision making in relation to the amount of information that the acting agent has about the state of the system. We show that the expected long term reward, discounted or per time step, is maximized by policies that randomize among at most

k

actions whenever at most

k

world states are consistent with the agent's observation. Moreover, we show that the expected reward per time step can be studied in terms of the expected discounted reward. Our main tool is a geometric version of the policy improvement lemma, which identifies a polyhedral cone of policy changes in which the state value function increases for all states.Comment: 8 page

arXiv.org e-Print Archive

Crossref

Encoding of Marginal Utility across Time in the Human Brain

Author: Bossaerts P
Curran HV
Dolan RJ
Friston KJ
Pine A
Roiser JP
Seymour B
Publication venue: 'Society for Neuroscience'
Publication date: 01/01/2009
Field of study

Marginal utility theory prescribes the relationship between the objective property of the magnitude of rewards and their subjective value. Despite its pervasive influence, however, there is remarkably little direct empirical evidence for such a theory of value, let alone of its neurobiological basis. We show that human preferences in an intertemporal choice task are best described by a model that integrates marginally diminishing utility with temporal discounting. Using functional magnetic resonance imaging, we show that activity in the dorsal striatum encodes both the marginal utility of rewards, over and above that which can be described by their magnitude alone, and the discounting associated with increasing time. In addition, our data show that dorsal striatum may be involved in integrating subjective valuation systems inherent to time and magnitude, thereby providing an overall metric of value used to guide choice behavior. Furthermore, during choice, we show that anterior cingulate activity correlates with the degree of difficulty associated with dissonance between value and time. Our data support an integrative architecture for decision making, revealing the neural representation of distinct subcomponents of value that may contribute to impulsivity and decisiveness

PubMed Central

Caltech Authors

MPG.PuRe

CUED - Cambridge University Engineering Department

A general theory of intertemporal decision-making and the perception of time

Author: Marton Tanya
Mihalas Stefan
Namboodiri Vijay Mohan K
Shuler Marshall G Hussain
Publication venue: 'Frontiers Media SA'
Publication date: 09/11/2013
Field of study

Animals and humans make decisions based on their expected outcomes. Since relevant outcomes are often delayed, perceiving delays and choosing between earlier versus later rewards (intertemporal decision-making) is an essential component of animal behavior. The myriad observations made in experiments studying intertemporal decision-making and time perception have not yet been rationalized within a single theory. Here we present a theory-Training--Integrated Maximized Estimation of Reinforcement Rate (TIMERR)--that explains a wide variety of behavioral observations made in intertemporal decision-making and the perception of time. Our theory postulates that animals make intertemporal choices to optimize expected reward rates over a limited temporal window; this window includes a past integration interval (over which experienced reward rate is estimated) and the expected delay to future reward. Using this theory, we derive a mathematical expression for the subjective representation of time. A unique contribution of our work is in finding that the past integration interval directly determines the steepness of temporal discounting and the nonlinearity of time perception. In so doing, our theory provides a single framework to understand both intertemporal decision-making and time perception.Comment: 37 pages, 4 main figures, 3 supplementary figure

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Frontiers - Publisher Connector