11,701 research outputs found
Discounting of reward sequences: a test of competing formal models of hyperbolic discounting
Humans are known to discount future rewards hyperbolically in time. Nevertheless, a formal recursive model of hyperbolic discounting has been elusive until recently, with the introduction of the hyperbolically discounted temporal difference (HDTD) model. Prior to that, models of learning (especially reinforcement learning) have relied on exponential discounting, which generally provides poorer fits to behavioral data. Recently, it has been shown that hyperbolic discounting can also be approximated by a summed distribution of exponentially discounted values, instantiated in the μAgents model. The HDTD model and the μAgents model differ in one key respect, namely how they treat sequences of rewards. The μAgents model is a particular implementation of a Parallel discounting model, which values sequences based on the summed value of the individual rewards whereas the HDTD model contains a non-linear interaction. To discriminate among these models, we observed how subjects discounted a sequence of three rewards, and then we tested how well each candidate model fit the subject data. The results show that the Parallel model generally provides a better fit to the human data
Geometry of Policy Improvement
We investigate the geometry of optimal memoryless time independent decision
making in relation to the amount of information that the acting agent has about
the state of the system. We show that the expected long term reward, discounted
or per time step, is maximized by policies that randomize among at most
actions whenever at most world states are consistent with the agent's
observation. Moreover, we show that the expected reward per time step can be
studied in terms of the expected discounted reward. Our main tool is a
geometric version of the policy improvement lemma, which identifies a
polyhedral cone of policy changes in which the state value function increases
for all states.Comment: 8 page
Encoding of Marginal Utility across Time in the Human Brain
Marginal utility theory prescribes the relationship between the objective property of the magnitude of rewards and their subjective value. Despite its pervasive influence, however, there is remarkably little direct empirical evidence for such a theory of value, let alone of its neurobiological basis. We show that human preferences in an intertemporal choice task are best described by a model that integrates marginally diminishing utility with temporal discounting. Using functional magnetic resonance imaging, we show that activity in the dorsal striatum encodes both the marginal utility of rewards, over and above that which can be described by their magnitude alone, and the discounting associated with increasing time. In addition, our data show that dorsal striatum may be involved in integrating subjective valuation systems inherent to time and magnitude, thereby providing an overall metric of value used to guide choice behavior. Furthermore, during choice, we show that anterior cingulate activity correlates with the degree of difficulty associated with dissonance between value and time. Our data support an integrative architecture for decision making, revealing the neural representation of distinct subcomponents of value that may contribute to impulsivity and decisiveness
A general theory of intertemporal decision-making and the perception of time
Animals and humans make decisions based on their expected outcomes. Since
relevant outcomes are often delayed, perceiving delays and choosing between
earlier versus later rewards (intertemporal decision-making) is an essential
component of animal behavior. The myriad observations made in experiments
studying intertemporal decision-making and time perception have not yet been
rationalized within a single theory. Here we present a
theory-Training--Integrated Maximized Estimation of Reinforcement Rate
(TIMERR)--that explains a wide variety of behavioral observations made in
intertemporal decision-making and the perception of time. Our theory postulates
that animals make intertemporal choices to optimize expected reward rates over
a limited temporal window; this window includes a past integration interval
(over which experienced reward rate is estimated) and the expected delay to
future reward. Using this theory, we derive a mathematical expression for the
subjective representation of time. A unique contribution of our work is in
finding that the past integration interval directly determines the steepness of
temporal discounting and the nonlinearity of time perception. In so doing, our
theory provides a single framework to understand both intertemporal
decision-making and time perception.Comment: 37 pages, 4 main figures, 3 supplementary figure
- …