1,944 research outputs found

    Dynamically optimal treatment allocation using Reinforcement Learning

    Full text link
    Devising guidance on how to assign individuals to treatment is an important goal in empirical research. In practice, individuals often arrive sequentially, and the planner faces various constraints such as limited budget/capacity, or borrowing constraints, or the need to place people in a queue. For instance, a governmental body may receive a budget outlay at the beginning of a year, and it may need to decide how best to allocate resources within the year to individuals who arrive sequentially. In this and other examples involving inter-temporal trade-offs, previous work on devising optimal policy rules in a static context is either not applicable, or sub-optimal. Here we show how one can use offline observational data to estimate an optimal policy rule that maximizes expected welfare in this dynamic context. We allow the class of policy rules to be restricted for legal, ethical or incentive compatibility reasons. The problem is equivalent to one of optimal control under a constrained policy class, and we exploit recent developments in Reinforcement Learning (RL) to propose an algorithm to solve this. The algorithm is easily implementable with speedups achieved through multiple RL agents learning in parallel processes. We also characterize the statistical regret from using our estimated policy rule by casting the evolution of the value function under each policy in a Partial Differential Equation (PDE) form and using the theory of viscosity solutions to PDEs. We find that the policy regret decays at a nβˆ’1/2n^{-1/2} rate in most examples; this is the same rate as in the static case.Comment: 67 page

    Deep Reinforcement Learning for Infinite Horizon Mean Field Problems in Continuous Spaces

    Full text link
    We present the development and analysis of a reinforcement learning (RL) algorithm designed to solve continuous-space mean field game (MFG) and mean field control (MFC) problems in a unified manner. The proposed approach pairs the actor-critic (AC) paradigm with a representation of the mean field distribution via a parameterized score function, which can be efficiently updated in an online fashion, and uses Langevin dynamics to obtain samples from the resulting distribution. The AC agent and the score function are updated iteratively to converge, either to the MFG equilibrium or the MFC optimum for a given mean field problem, depending on the choice of learning rates. A straightforward modification of the algorithm allows us to solve mixed mean field control games (MFCGs). The performance of our algorithm is evaluated using linear-quadratic benchmarks in the asymptotic infinite horizon framework

    Adaptive Optimal Control via Continuous-Time Q-Learning for Unknown Nonlinear Affine Systems

    Get PDF
    • …
    corecore