Search CORE

1,944 research outputs found

Dynamically optimal treatment allocation using Reinforcement Learning

Author: Adusumilli Karun
Geiecke Friedrich
Schilter Claudio
Publication venue
Publication date: 30/08/2020
Field of study

Devising guidance on how to assign individuals to treatment is an important goal in empirical research. In practice, individuals often arrive sequentially, and the planner faces various constraints such as limited budget/capacity, or borrowing constraints, or the need to place people in a queue. For instance, a governmental body may receive a budget outlay at the beginning of a year, and it may need to decide how best to allocate resources within the year to individuals who arrive sequentially. In this and other examples involving inter-temporal trade-offs, previous work on devising optimal policy rules in a static context is either not applicable, or sub-optimal. Here we show how one can use offline observational data to estimate an optimal policy rule that maximizes expected welfare in this dynamic context. We allow the class of policy rules to be restricted for legal, ethical or incentive compatibility reasons. The problem is equivalent to one of optimal control under a constrained policy class, and we exploit recent developments in Reinforcement Learning (RL) to propose an algorithm to solve this. The algorithm is easily implementable with speedups achieved through multiple RL agents learning in parallel processes. We also characterize the statistical regret from using our estimated policy rule by casting the evolution of the value function under each policy in a Partial Differential Equation (PDE) form and using the theory of viscosity solutions to PDEs. We find that the policy regret decays at a

n^{-1/2}

rate in most examples; this is the same rate as in the static case.Comment: 67 page

arXiv.org e-Print Archive

Deep Reinforcement Learning for Infinite Horizon Mean Field Problems in Continuous Spaces

Author: Angiuli Andrea
Fouque Jean-Pierre
Hu Ruimeng
Raydan Alan
Publication venue
Publication date: 19/09/2023
Field of study

We present the development and analysis of a reinforcement learning (RL) algorithm designed to solve continuous-space mean field game (MFG) and mean field control (MFC) problems in a unified manner. The proposed approach pairs the actor-critic (AC) paradigm with a representation of the mean field distribution via a parameterized score function, which can be efficiently updated in an online fashion, and uses Langevin dynamics to obtain samples from the resulting distribution. The AC agent and the score function are updated iteratively to converge, either to the MFG equilibrium or the MFC optimum for a given mean field problem, depending on the choice of learning rates. A straightforward modification of the algorithm allows us to solve mixed mean field control games (MFCGs). The performance of our algorithm is evaluated using linear-quadratic benchmarks in the asymptotic infinite horizon framework

arXiv.org e-Print Archive

Adaptive Optimal Control via Continuous-Time Q-Learning for Unknown Nonlinear Affine Systems

Author: Chen Anthony Siming
Herrmann Guido
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/03/2020
Field of study