1,944 research outputs found
Dynamically optimal treatment allocation using Reinforcement Learning
Devising guidance on how to assign individuals to treatment is an important
goal in empirical research. In practice, individuals often arrive sequentially,
and the planner faces various constraints such as limited budget/capacity, or
borrowing constraints, or the need to place people in a queue. For instance, a
governmental body may receive a budget outlay at the beginning of a year, and
it may need to decide how best to allocate resources within the year to
individuals who arrive sequentially. In this and other examples involving
inter-temporal trade-offs, previous work on devising optimal policy rules in a
static context is either not applicable, or sub-optimal. Here we show how one
can use offline observational data to estimate an optimal policy rule that
maximizes expected welfare in this dynamic context. We allow the class of
policy rules to be restricted for legal, ethical or incentive compatibility
reasons. The problem is equivalent to one of optimal control under a
constrained policy class, and we exploit recent developments in Reinforcement
Learning (RL) to propose an algorithm to solve this. The algorithm is easily
implementable with speedups achieved through multiple RL agents learning in
parallel processes. We also characterize the statistical regret from using our
estimated policy rule by casting the evolution of the value function under each
policy in a Partial Differential Equation (PDE) form and using the theory of
viscosity solutions to PDEs. We find that the policy regret decays at a
rate in most examples; this is the same rate as in the static case.Comment: 67 page
Deep Reinforcement Learning for Infinite Horizon Mean Field Problems in Continuous Spaces
We present the development and analysis of a reinforcement learning (RL)
algorithm designed to solve continuous-space mean field game (MFG) and mean
field control (MFC) problems in a unified manner. The proposed approach pairs
the actor-critic (AC) paradigm with a representation of the mean field
distribution via a parameterized score function, which can be efficiently
updated in an online fashion, and uses Langevin dynamics to obtain samples from
the resulting distribution. The AC agent and the score function are updated
iteratively to converge, either to the MFG equilibrium or the MFC optimum for a
given mean field problem, depending on the choice of learning rates. A
straightforward modification of the algorithm allows us to solve mixed mean
field control games (MFCGs). The performance of our algorithm is evaluated
using linear-quadratic benchmarks in the asymptotic infinite horizon framework
- β¦