315 research outputs found

    Stochastic dynamic programming with non-linear discounting

    Full text link
    In this paper, we study a Markov decision process with a non-linear discount function and with a Borel state space. We define a recursive discounted utility, which resembles non-additive utility functions considered in a number of models in economics. Non-additivity here follows from non-linearity of the discount function. Our study is complementary to the work of Ja\'skiewicz, Matkowski and Nowak (Math. Oper. Res. 38 (2013), 108-121), where also non-linear discounting is used in the stochastic setting, but the expectation of utilities aggregated on the space of all histories of the process is applied leading to a non-stationary dynamic programming model. Our aim is to prove that in the recursive discounted utility case the Bellman equation has a solution and there exists an optimal stationary policy for the problem in the infinite time horizon. Our approach includes two cases: (a)(a) when the one-stage utility is bounded on both sides by a weight function multiplied by some positive and negative constants, and (b)(b) when the one-stage utility is unbounded from below

    Non-linear strategies in a linear quadratic differential game

    Get PDF
    We study non-linear Markov perfect equilibria in a two agent linear quadratic differential game. In contrast to the literature owing to Tsutsui and Mino (1990), we do not associate endogenous subsets of the state space with candidate solutions. Instead, we address the problem of unbounded-below value functions over infinite horizons by use of the `catching up optimality' criterion. We present sufficiency conditions for existence based on results in Dockner, Jorgenson, Long and Sorger (2000). Applying these to our model yields the familiar linear solution as well as a condition under which a continuum of non-linear solutions exist. As this condition is relaxed when agents are more patient, and allows more efficient steady states, it resembles a Folk Theorem for differential games. The model presented here is one of atmospheric pollution; the results apply to differential games more generally.differential game, non-linear strategies, catching up optimal, Folk Theorem

    Stochastic Dynamic Programming with Non-linear Discounting

    Get PDF
    In this paper, we study a Markov decision process with a non-linear discount function and with a Borel state space. We define a recursive discounted utility, which resembles non-additive utility functions considered in a number of models in economics. Non-additivity here follows from non-linearity of the discount function. Our study is complementary to the work of Jaƛkiewicz et al. (Math Oper Res 38:108–121, 2013), where also non-linear discounting is used in the stochastic setting, but the expectation of utilities aggregated on the space of all histories of the process is applied leading to a non-stationary dynamic programming model. Our aim is to prove that in the recursive discounted utility case the Bellman equation has a solution and there exists an optimal stationary policy for the problem in the infinite time horizon. Our approach includes two cases: (a) when the one-stage utility is bounded on both sides by a weight function multiplied by some positive and negative constants, and (b) when the one-stage utility is unbounded from below

    Intergenerational equity and the discount rate for cost-benefit analysis

    Get PDF
    For two independent principles of intergenerational equity, the implied discount rate equals the growth rate of real per-capita income, say 2%, thus falling right into the range suggested by the U.S. Offce of Management and Budget. To prove this, we develop a simple tool to evaluate small policy changes affecting several generations, by reducing the dynamic problem to a static one. A necessary condition is time-invariance, which is satisïŹed by any common solution concept in an overlapping generations model with exogenous growth. This tool is applied to derive the discount rate for cost-beneïŹt analysis under two different utilitarian welfare functions: classical and relative. It is only with relative utilitarianism that the discount rate is well-deïŹned for a heterogeneous society, is corroborated by an independent principle equating values of hugrowth rate of real per-capita income.social welfare function, social welfare functional, overlapping generations, exogenous growth, policy reform, intergenerational equity, intergenerational fairness, cost-benefit analysis, discount rate, social discount rate, utilitarianism, relative utilitarianism, welfarism.

    Certified Reinforcement Learning with Logic Guidance

    Full text link
    This paper proposes the first model-free Reinforcement Learning (RL) framework to synthesise policies for unknown, and continuous-state Markov Decision Processes (MDPs), such that a given linear temporal property is satisfied. We convert the given property into a Limit Deterministic Buchi Automaton (LDBA), namely a finite-state machine expressing the property. Exploiting the structure of the LDBA, we shape a synchronous reward function on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces that probabilistically satisfy the linear temporal property. This probability (certificate) is also calculated in parallel with policy learning when the state space of the MDP is finite: as such, the RL algorithm produces a policy that is certified with respect to the property. Under the assumption of finite state space, theoretical guarantees are provided on the convergence of the RL algorithm to an optimal policy, maximising the above probability. We also show that our method produces ''best available'' control policies when the logical property cannot be satisfied. In the general case of a continuous state space, we propose a neural network architecture for RL and we empirically show that the algorithm finds satisfying policies, if there exist such policies. The performance of the proposed framework is evaluated via a set of numerical examples and benchmarks, where we observe an improvement of one order of magnitude in the number of iterations required for the policy synthesis, compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782

    Global Warming and Extreme Events: Rethinking the Timing and Intensity of Environmental Policy

    Get PDF
    The possibility of low-probability extreme events has reignited the debate over the optimal intensity and timing of climate policy. In this paper we therefore contribute to the literature by assessing the implications of low-probability extreme events on environmental policy in a continuous-time real options model with “tail risk”. In a nutshell, our results indicate the importance of tail risk and call for foresighted pre-emptive climate policies.climate policy, extreme events, real options, Levy process

    Optimal dividend distribution under Markov-regime switching

    Get PDF
    We investigate the problem of optimal dividend distribution for a company in the presence of regime shifts. We consider a company whose cumulative net revenues evolve as a Brownian motion with positive drift that is modulated by a finite state Markov chain, and model the discount rate as a deterministic function of the current state of the chain. In this setting the objective of the company is to maximize the expected cumulative discounted dividend payments until the moment of bankruptcy, which is taken to be the first time that the cash reserves (the cumulative net revenues minus cumulative dividend payments) are zero. We show that, if the drift is positive in each state, it is optimal to adopt a barrier strategy at certain positive regime-dependent levels, and provide an explicit characterization of the value function as the fixed point of a contraction. In the case that the drift is small and negative in one state, the optimal strategy takes a different form, which we explicitly identify if there are two regimes. We also provide a numerical illustration of the sensitivities of the optimal barriers and the influence of regime-switching.Comment: 25 pages, 2 figure
    • 

    corecore