Search CORE

6,314 research outputs found

Dynamically optimal treatment allocation using Reinforcement Learning

Author: Adusumilli Karun
Geiecke Friedrich
Schilter Claudio
Publication venue
Publication date: 30/08/2020
Field of study

Devising guidance on how to assign individuals to treatment is an important goal in empirical research. In practice, individuals often arrive sequentially, and the planner faces various constraints such as limited budget/capacity, or borrowing constraints, or the need to place people in a queue. For instance, a governmental body may receive a budget outlay at the beginning of a year, and it may need to decide how best to allocate resources within the year to individuals who arrive sequentially. In this and other examples involving inter-temporal trade-offs, previous work on devising optimal policy rules in a static context is either not applicable, or sub-optimal. Here we show how one can use offline observational data to estimate an optimal policy rule that maximizes expected welfare in this dynamic context. We allow the class of policy rules to be restricted for legal, ethical or incentive compatibility reasons. The problem is equivalent to one of optimal control under a constrained policy class, and we exploit recent developments in Reinforcement Learning (RL) to propose an algorithm to solve this. The algorithm is easily implementable with speedups achieved through multiple RL agents learning in parallel processes. We also characterize the statistical regret from using our estimated policy rule by casting the evolution of the value function under each policy in a Partial Differential Equation (PDE) form and using the theory of viscosity solutions to PDEs. We find that the policy regret decays at a

n^{-1/2}

rate in most examples; this is the same rate as in the static case.Comment: 67 page

arXiv.org e-Print Archive

Inverse stochastic optimal controls

Author: Nakano Yumiharu
Publication venue
Publication date: 03/03/2021
Field of study

We study an inverse problem of the stochastic optimal control of general diffusions with performance index having the quadratic penalty term of the control process. Under mild conditions on the drift, the volatility, the cost functions of the state, and under the assumption that the optimal control belongs to the interior of the control set, we show that our inverse problem is well-posed using a stochastic maximum principle. Then, with the well-posedness, we reduce the inverse problem to some root finding problem of the expectation of a random variable involved with the value function, which has a unique solution. Based on this result, we propose a numerical method for our inverse problem by replacing the expectation above with arithmetic mean of observed optimal control processes and the corresponding state processes. The recent progress of numerical analyses of Hamilton-Jacobi-Bellman equations enables the proposed method to be implementable for multi-dimensional cases. In particular, with the help of the kernel-based collocation method for Hamilton-Jacobi-Bellman equations, our method for the inverse problems still works well even when an explicit form of the value function is unavailable. Several numerical experiments show that the numerical method recover the unknown weight parameter with high accuracy

arXiv.org e-Print Archive

A decomposition technique for pursuit evasion games with many pursuers

Author: Festa Adriano
Vinter Richard B.
Publication venue
Publication date: 01/01/2013
Field of study

Here we present a decomposition technique for a class of differential games. The technique consists in a decomposition of the target set which produces, for geometrical reasons, a decomposition in the dimensionality of the problem. Using some elements of Hamilton-Jacobi equations theory, we find a relation between the regularity of the solution and the possibility to decompose the problem. We use this technique to solve a pursuit evasion game with multiple agents

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

HAL-Polytechnique

Recommended from our members

Game-Theoretic Safety Assurance for Human-Centered Robotic Systems

Author: Fernandez Fisac Jaime
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

In order for autonomous systems like robots, drones, and self-driving cars to be reliably introduced into our society, they must have the ability to actively account for safety during their operation. While safety analysis has traditionally been conducted offline for controlled environments like cages on factory floors, the much higher complexity of open, human-populated spaces like our homes, cities, and roads makes it unviable to rely on common design-time assumptions, since these may be violated once the system is deployed. Instead, the next generation of robotic technologies will need to reason about safety online, constructing high-confidence assurances informed by ongoing observations of the environment and other agents, in spite of models of them being necessarily fallible.This dissertation aims to lay down the necessary foundations to enable autonomous systems to ensure their own safety in complex, changing, and uncertain environments, by explicitly reasoning about the gap between their models and the real world. It first introduces a suite of novel robust optimal control formulations and algorithmic tools that permit tractable safety analysis in time-varying, multi-agent systems, as well as safe real-time robotic navigation in partially unknown environments; these approaches are demonstrated on large-scale unmanned air traffic simulation and physical quadrotor platforms. After this, it draws on Bayesian machine learning methods to translate model-based guarantees into high-confidence assurances, monitoring the reliability of predictive models in light of changing evidence about the physical system and surrounding agents. This principle is first applied to a general safety framework allowing the use of learning-based control (e.g. reinforcement learning) for safety-critical robotic systems such as drones, and then combined with insights from cognitive science and dynamic game theory to enable safe human-centered navigation and interaction; these techniques are showcased on physical quadrotors—flying in unmodeled wind and among human pedestrians—and simulated highway driving. The dissertation ends with a discussion of challenges and opportunities ahead, including the bridging of safety analysis and reinforcement learning and the need to ``close the loop'' around learning and adaptation in order to deploy increasingly advanced autonomous systems with confidence

eScholarship - University of California

Exploration noise for learning linear-quadratic mean field games

Author: Delarue François
Vasileiadis Athanasios
Publication venue
Publication date: 02/07/2021
Field of study

The goal of this paper is to demonstrate that common noise may serve as an exploration noise for learning the solution of a mean field game. This concept is here exemplified through a toy linear-quadratic model, for which a suitable form of common noise has already been proven to restore existence and uniqueness. We here go one step further and prove that the same form of common noise may force the convergence of the learning algorithm called `fictitious play', and this without any further potential or monotone structure. Several numerical examples are provided in order to support our theoretical analysis

arXiv.org e-Print Archive