12,624 research outputs found

    Encoding of Marginal Utility across Time in the Human Brain

    Get PDF
    Marginal utility theory prescribes the relationship between the objective property of the magnitude of rewards and their subjective value. Despite its pervasive influence, however, there is remarkably little direct empirical evidence for such a theory of value, let alone of its neurobiological basis. We show that human preferences in an intertemporal choice task are best described by a model that integrates marginally diminishing utility with temporal discounting. Using functional magnetic resonance imaging, we show that activity in the dorsal striatum encodes both the marginal utility of rewards, over and above that which can be described by their magnitude alone, and the discounting associated with increasing time. In addition, our data show that dorsal striatum may be involved in integrating subjective valuation systems inherent to time and magnitude, thereby providing an overall metric of value used to guide choice behavior. Furthermore, during choice, we show that anterior cingulate activity correlates with the degree of difficulty associated with dissonance between value and time. Our data support an integrative architecture for decision making, revealing the neural representation of distinct subcomponents of value that may contribute to impulsivity and decisiveness

    Rethinking the Discount Factor in Reinforcement Learning: A Decision Theoretic Approach

    Full text link
    Reinforcement learning (RL) agents have traditionally been tasked with maximizing the value function of a Markov decision process (MDP), either in continuous settings, with fixed discount factor γ<1\gamma < 1, or in episodic settings, with γ=1\gamma = 1. While this has proven effective for specific tasks with well-defined objectives (e.g., games), it has never been established that fixed discounting is suitable for general purpose use (e.g., as a model of human preferences). This paper characterizes rationality in sequential decision making using a set of seven axioms and arrives at a form of discounting that generalizes traditional fixed discounting. In particular, our framework admits a state-action dependent "discount" factor that is not constrained to be less than 1, so long as there is eventual long run discounting. Although this broadens the range of possible preference structures in continuous settings, we show that there exists a unique "optimizing MDP" with fixed γ<1\gamma < 1 whose optimal value function matches the true utility of the optimal policy, and we quantify the difference between value and utility for suboptimal policies. Our work can be seen as providing a normative justification for (a slight generalization of) Martha White's RL task formalism (2017) and other recent departures from the traditional RL, and is relevant to task specification in RL, inverse RL and preference-based RL.Comment: 8 pages + 1 page supplement. In proceedings of AAAI 2019. Slides, poster and bibtex available at https://silviupitis.com/#rethinking-the-discount-factor-in-reinforcement-learning-a-decision-theoretic-approac

    Impulsivity and self-control during intertemporal decision making linked to the neural dynamics of reward value representation

    Get PDF
    A characteristic marker of impulsive decision making is the discounting of delayed rewards, demonstrated via choice preferences and choice-related brain activity. However, delay discounting may also arise from how subjective reward value is dynamically represented in the brain when anticipating an upcoming chosen reward. In the current study, brain activity was continuously monitored as human participants freely selected an immediate or delayed primary liquid reward and then waited for the specified delay before consuming it. The ventromedial prefrontal cortex (vmPFC) exhibited a characteristic pattern of activity dynamics during the delay period, as well as modulation during choice, that is consistent with the time-discounted coding of subjective value. The ventral striatum (VS) exhibited a similar activity pattern, but preferentially in impulsive individuals. A contrasting profile of delay-related and choice activation was observed in the anterior PFC (aPFC), but selectively in patient individuals. Functional connectivity analyses indicated that both vmPFC and aPFC exerted modulatory, but opposite, influences on VS activation. These results link behavioral impulsivity and self-control to dynamically evolving neural representations of future reward value, not just during choice, but also during postchoice delay periods

    Knowledge-aware Complementary Product Representation Learning

    Full text link
    Learning product representations that reflect complementary relationship plays a central role in e-commerce recommender system. In the absence of the product relationships graph, which existing methods rely on, there is a need to detect the complementary relationships directly from noisy and sparse customer purchase activities. Furthermore, unlike simple relationships such as similarity, complementariness is asymmetric and non-transitive. Standard usage of representation learning emphasizes on only one set of embedding, which is problematic for modelling such properties of complementariness. We propose using knowledge-aware learning with dual product embedding to solve the above challenges. We encode contextual knowledge into product representation by multi-task learning, to alleviate the sparsity issue. By explicitly modelling with user bias terms, we separate the noise of customer-specific preferences from the complementariness. Furthermore, we adopt the dual embedding framework to capture the intrinsic properties of complementariness and provide geometric interpretation motivated by the classic separating hyperplane theory. Finally, we propose a Bayesian network structure that unifies all the components, which also concludes several popular models as special cases. The proposed method compares favourably to state-of-art methods, in downstream classification and recommendation tasks. We also develop an implementation that scales efficiently to a dataset with millions of items and customers

    Recursive Smooth Ambiguity Preferences

    Get PDF
    This paper axiomatizes an intertemporal version of the Smooth Ambiguity decision model developed in Klibanoff, Marinacci, and Mukerji (2005). A key feature of the model is that it achieves a separation between ambiguity, identified as a characteristic of the decision maker's subjective beliefs, and ambiguity attitude, a characteristic of the decision maker's tastes. In applications one may thus specify/vary these two characteristics independent of each other, thereby facilitating richer comparative statics and modeling flexibility than possible under other models which accomodate ambiguity sensitive preferences. Another key feature is that the preferences are dynamically consistent and have a recursive representation. Therefore techniques of dynamic programming can be applied when using this model.Ambiguity, Uncertainty, Knightian Uncertainty, Ambiguity Aversion, Uncertainty Aversion, Ellsberg Paradox, Dynamic Decision Making, Dynamic Programming under Ambiguity, Smooth Ambiguity.
    • …
    corecore