95 research outputs found

    General time consistent discounting

    Get PDF
    Modeling inter-temporal choice is a key problem in both computer science and economic theory. The discounted utility model of Samuelson is currently the most popular model for measuring the global utility of a time-series of local utilities. The model is limited by not allowing the discount function to change with the age of the agent. This is despite the fact that many agents, in particular humans, are best modelled with age-dependent discount functions. It is well known that discounting can lead to time-inconsistent behaviour where agents change their preferences over time. In this paper we generalise the discounted utility model to allow age-dependent discount functions. We then extend previous work in time-inconsistency to our new setting, including a complete characterisation of time-(in)consistent discount functions, the existence of sub-game perfect equilibrium policies where the discount function is time-inconsistent and a continuity result showing that “nearly” time-consistent discount rates lead to “nearly” time-consistent behaviour

    Time consistent discounting

    No full text
    A possibly immortal agent tries to maximise its summed discounted rewards over time, where discounting is used to avoid infinite utilities and encourage the agent to value current rewards more than future ones. Some commonly used discount functions lead to time-inconsistent behavior where the agent changes its plan over time. These inconsistencies can lead to very poor behavior. We generalise the usual discounted utility model to one where the discount function changes with the age of the agent. We then give a simple characterisation of time-(in)consistent discount functions and show the existence of a rational policy for an agent that knows its discount function is time-inconsistent

    Equivalent Representations of Non-Exponential Discounting Models

    Get PDF
    I characterize the entire class of consumption rules for finite-horizon models in which consumption is proportional to lifetime wealth. Any such rule can be obtained from a preference model with CRRA period utility. In a steady state with constant interest rates, a proportional consumption rule can be derived from a model with time-consistent preferences or from a model with possibly time-inconsistent preferences in which a household continually reoptimizes future utility discounted relative to the present instant. These two preference models will only coincide for the special case when the discount function is exponential. More generally, there will be two distinct yet observationally equivalent preference models. Hyperbolic-like discounting may arise because that is a simpler way for the brain to process a standard exponential discount function after accounting for mortality risk

    Efficiency in the cake-eating problem with quasi-geometric discounting

    Get PDF
    This paper shows that any equilibrium allocation in the cake-eating problem with quasi-geometric discounting is not Pareto efficient. However, efficiency can be established by introducing a planner who controls the initial endowment and makes transfers over time. It is shown than any Pareto efficient allocation can be supported by a perfect equilibrium with transfers.Pareto efficiency

    Optimistic Agents are Asymptotically Optimal

    Full text link
    We use optimism to introduce generic asymptotically optimal reinforcement learning agents. They achieve, with an arbitrary finite or compact class of environments, asymptotically optimal behavior. Furthermore, in the finite deterministic case we provide finite error bounds.Comment: 13 LaTeX page

    Time-inconsistency and Welfare Program Participation: Evidence from the NLSY

    Get PDF
    We empirically implement a dynamic structural model of labor supply and welfare program participation for agents with potentially time-inconsistent preferences. Using panel data on the choices of single women with children from the NLSY 1979, we provide estimates of the degree of time-inconsistency, and of its influence on the welfare take-up decision. With these estimates, we conduct counterfactual experiments to quantify the utility loss stemming from the inability to commit to future decisions, and the potential utility gains from commitment mechanisms such as welfare time limits and work requirements.Time inconsistent preferences, Welfare reform, Labor supply

    Bad Universal Priors and Notions of Optimality

    Get PDF
    A big open question of algorithmic information theory is the choice of the universal Turing machine (UTM). For Kolmogorov complexity and Solomonoff induction we have invariance theorems: the choice of the UTM changes bounds only by a constant. For the universally intelligent agent AIXI (Hutter, 2005) no invariance theorem is known. Our results are entirely negative: we discuss cases in which unlucky or adversarial choices of the UTM cause AIXI to misbehave drastically. We show that Legg-Hutter intelligence and thus balanced Pareto optimality is entirely subjective, and that every policy is Pareto optimal in the class of all computable environments. This undermines all existing optimality properties for AIXI. While it may still serve as a gold standard for AI, our results imply that AIXI is a relative theory, dependent on the choice of the UTM.Comment: COLT 201

    The Sample-Complexity of General Reinforcement Learning

    Full text link
    We present a new algorithm for general reinforcement learning where the true environment is known to belong to a finite class of N arbitrary models. The algorithm is shown to be near-optimal for all but O(N log^2 N) time-steps with high probability. Infinite classes are also considered where we show that compactness is a key criterion for determining the existence of uniform sample-complexity bounds. A matching lower bound is given for the finite case.Comment: 16 page
    • …
    corecore