Search CORE

29,040 research outputs found

Optimal treatment allocations in space and time for on-line control of an emerging infectious disease

Author: Agarwal A.
Anderson R. M.
Bertsekas D. P.
Borth D. M.
Chapelle O.
Chapelle O.
Chesterton G. K.
Choi A. L.
Cox D. R.
Deardon R.
Estrada E.
Field K.
Gelman A.
Ghavamzadeh M.
Ghavamzadeh M.
Huang C.‐Y.
Kushner H. J.
Law A. M.
Little R. J.
Lusher D.
Mahadevan S.
May B. C.
Murphy S. A.
Murphy S. A.
Nahum‐Shani I.
Newton M. A.
Orellana L.
Osband I.
Palmer J. M.
Poupart P.
Ross S.
Russo D.
Sen A.
Spall J. C.
Subcommittee on Fisheries Wildlife, and Oceans
Sutton R.
Sutton R. S.
West M.
Yin G.
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

A key component in controlling the spread of an epidemic is deciding where, whenand to whom to apply an intervention.We develop a framework for using data to informthese decisionsin realtime.We formalize a treatment allocation strategy as a sequence of functions, oneper treatment period, that map up-to-date information on the spread of an infectious diseaseto a subset of locations where treatment should be allocated. An optimal allocation strategyoptimizes some cumulative outcome, e.g. the number of uninfected locations, the geographicfootprint of the disease or the cost of the epidemic. Estimation of an optimal allocation strategyfor an emerging infectious disease is challenging because spatial proximity induces interferencebetween locations, the number of possible allocations is exponential in the number oflocations, and because disease dynamics and intervention effectiveness are unknown at outbreak.We derive a Bayesian on-line estimator of the optimal allocation strategy that combinessimulation–optimization with Thompson sampling.The estimator proposed performs favourablyin simulation experiments. This work is motivated by and illustrated using data on the spread ofwhite nose syndrome, which is a highly fatal infectious disease devastating bat populations inNorth America

Crossref

eScholarship - University of California

PAC-Bayesian Analysis of the Exploration-Exploitation Trade-off

Author: Auer Peter
Cesa-Bianchi Nicolò
Laviolette François
Peters Jan
Seldin Yevgeny
Shawe-Taylor John
Publication venue
Publication date: 01/01/2011
Field of study

We develop a coherent framework for integrative simultaneous analysis of the exploration-exploitation and model order selection trade-offs. We improve over our preceding results on the same subject (Seldin et al., 2011) by combining PAC-Bayesian analysis with Bernstein-type inequality for martingales. Such a combination is also of independent interest for studies of multiple simultaneously evolving martingales.Comment: On-line Trading of Exploration and Exploitation 2 - ICML-2011 workshop. http://explo.cs.ucl.ac.uk/workshop

arXiv.org e-Print Archive

MPG.PuRe

An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning

Author: Mahmood A. Rupam
Sutton Richard S.
White Martha
Publication venue
Publication date: 20/04/2015
Field of study

In this paper we introduce the idea of improving the performance of parametric temporal-difference (TD) learning algorithms by selectively emphasizing or de-emphasizing their updates on different time steps. In particular, we show that varying the emphasis of linear TD(

\lambda

)'s updates in a particular way causes its expected update to become stable under off-policy training. The only prior model-free TD methods to achieve this with per-step computation linear in the number of function approximation parameters are the gradient-TD family of methods including TDC, GTD(

\lambda

), and GQ(

\lambda

). Compared to these methods, our _emphatic TD(

\lambda

)_ is simpler and easier to use; it has only one learned parameter vector and one step-size parameter. Our treatment includes general state-dependent discounting and bootstrapping functions, and a way of specifying varying degrees of interest in accurately valuing different states.Comment: 29 pages This is a significant revision based on the first set of reviews. The most important change was to signal early that the main result is about stability, not convergenc

arXiv.org e-Print Archive

CiteSeerX

Automating Vehicles by Deep Reinforcement Learning using Task Separation with Hill Climbing

Author: A Liniger
B Paden
C Urmson
CW Anderson
D Dolgov
D Wierstra
DQ Mayne
E Frazzoli
HT Siegelmann
J Xu
P Falcone
R Tedrake
T Schouwenaars
Publication venue
Publication date: 02/08/2018
Field of study

Within the context of autonomous driving a model-based reinforcement learning algorithm is proposed for the design of neural network-parameterized controllers. Classical model-based control methods, which include sampling- and lattice-based algorithms and model predictive control, suffer from the trade-off between model complexity and computational burden required for the online solution of expensive optimization or search problems at every short sampling time. To circumvent this trade-off, a 2-step procedure is motivated: first learning of a controller during offline training based on an arbitrarily complicated mathematical system model, before online fast feedforward evaluation of the trained controller. The contribution of this paper is the proposition of a simple gradient-free and model-based algorithm for deep reinforcement learning using task separation with hill climbing (TSHC). In particular, (i) simultaneous training on separate deterministic tasks with the purpose of encoding many motion primitives in a neural network, and (ii) the employment of maximally sparse rewards in combination with virtual velocity constraints (VVCs) in setpoint proximity are advocated.Comment: 10 pages, 6 figures, 1 tabl

arXiv.org e-Print Archive

Crossref