6,314 research outputs found
Dynamically optimal treatment allocation using Reinforcement Learning
Devising guidance on how to assign individuals to treatment is an important
goal in empirical research. In practice, individuals often arrive sequentially,
and the planner faces various constraints such as limited budget/capacity, or
borrowing constraints, or the need to place people in a queue. For instance, a
governmental body may receive a budget outlay at the beginning of a year, and
it may need to decide how best to allocate resources within the year to
individuals who arrive sequentially. In this and other examples involving
inter-temporal trade-offs, previous work on devising optimal policy rules in a
static context is either not applicable, or sub-optimal. Here we show how one
can use offline observational data to estimate an optimal policy rule that
maximizes expected welfare in this dynamic context. We allow the class of
policy rules to be restricted for legal, ethical or incentive compatibility
reasons. The problem is equivalent to one of optimal control under a
constrained policy class, and we exploit recent developments in Reinforcement
Learning (RL) to propose an algorithm to solve this. The algorithm is easily
implementable with speedups achieved through multiple RL agents learning in
parallel processes. We also characterize the statistical regret from using our
estimated policy rule by casting the evolution of the value function under each
policy in a Partial Differential Equation (PDE) form and using the theory of
viscosity solutions to PDEs. We find that the policy regret decays at a
rate in most examples; this is the same rate as in the static case.Comment: 67 page
Inverse stochastic optimal controls
We study an inverse problem of the stochastic optimal control of general
diffusions with performance index having the quadratic penalty term of the
control process. Under mild conditions on the drift, the volatility, the cost
functions of the state, and under the assumption that the optimal control
belongs to the interior of the control set, we show that our inverse problem is
well-posed using a stochastic maximum principle. Then, with the well-posedness,
we reduce the inverse problem to some root finding problem of the expectation
of a random variable involved with the value function, which has a unique
solution. Based on this result, we propose a numerical method for our inverse
problem by replacing the expectation above with arithmetic mean of observed
optimal control processes and the corresponding state processes. The recent
progress of numerical analyses of Hamilton-Jacobi-Bellman equations enables the
proposed method to be implementable for multi-dimensional cases. In particular,
with the help of the kernel-based collocation method for
Hamilton-Jacobi-Bellman equations, our method for the inverse problems still
works well even when an explicit form of the value function is unavailable.
Several numerical experiments show that the numerical method recover the
unknown weight parameter with high accuracy
A decomposition technique for pursuit evasion games with many pursuers
Here we present a decomposition technique for a class of differential games.
The technique consists in a decomposition of the target set which produces, for
geometrical reasons, a decomposition in the dimensionality of the problem.
Using some elements of Hamilton-Jacobi equations theory, we find a relation
between the regularity of the solution and the possibility to decompose the
problem. We use this technique to solve a pursuit evasion game with multiple
agents
Recommended from our members
Game-Theoretic Safety Assurance for Human-Centered Robotic Systems
In order for autonomous systems like robots, drones, and self-driving cars to be reliably introduced into our society, they must have the ability to actively account for safety during their operation. While safety analysis has traditionally been conducted offline for controlled environments like cages on factory floors, the much higher complexity of open, human-populated spaces like our homes, cities, and roads makes it unviable to rely on common design-time assumptions, since these may be violated once the system is deployed. Instead, the next generation of robotic technologies will need to reason about safety online, constructing high-confidence assurances informed by ongoing observations of the environment and other agents, in spite of models of them being necessarily fallible.This dissertation aims to lay down the necessary foundations to enable autonomous systems to ensure their own safety in complex, changing, and uncertain environments, by explicitly reasoning about the gap between their models and the real world. It first introduces a suite of novel robust optimal control formulations and algorithmic tools that permit tractable safety analysis in time-varying, multi-agent systems, as well as safe real-time robotic navigation in partially unknown environments; these approaches are demonstrated on large-scale unmanned air traffic simulation and physical quadrotor platforms. After this, it draws on Bayesian machine learning methods to translate model-based guarantees into high-confidence assurances, monitoring the reliability of predictive models in light of changing evidence about the physical system and surrounding agents. This principle is first applied to a general safety framework allowing the use of learning-based control (e.g. reinforcement learning) for safety-critical robotic systems such as drones, and then combined with insights from cognitive science and dynamic game theory to enable safe human-centered navigation and interaction; these techniques are showcased on physical quadrotors—flying in unmodeled wind and among human pedestrians—and simulated highway driving. The dissertation ends with a discussion of challenges and opportunities ahead, including the bridging of safety analysis and reinforcement learning and the need to ``close the loop'' around learning and adaptation in order to deploy increasingly advanced autonomous systems with confidence
Exploration noise for learning linear-quadratic mean field games
The goal of this paper is to demonstrate that common noise may serve as an
exploration noise for learning the solution of a mean field game. This concept
is here exemplified through a toy linear-quadratic model, for which a suitable
form of common noise has already been proven to restore existence and
uniqueness. We here go one step further and prove that the same form of common
noise may force the convergence of the learning algorithm called `fictitious
play', and this without any further potential or monotone structure. Several
numerical examples are provided in order to support our theoretical analysis
- …