Search CORE

23,479 research outputs found

Q-learning: flexible learning about useful utilities

Author: Dean Nema
Moodie Erica E.M.
Sun Yue Ru
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2014
Field of study

Dynamic treatment regimes are fast becoming an important part of medicine, with the corresponding change in emphasis from treatment of the disease to treatment of the individual patient. Because of the limited number of trials to evaluate personally tailored treatment sequences, inferring optimal treatment regimes from observational data has increased importance. Q-learning is a popular method for estimating the optimal treatment regime, originally in randomized trials but more recently also in observational data. Previous applications of Q-learning have largely been restricted to continuous utility end-points with linear relationships. This paper is the first attempt at both extending the framework to discrete utilities and implementing the modelling of covariates from linear to more flexible modelling using the generalized additive model (GAM) framework. Simulated data results show that the GAM adapted Q-learning typically outperforms Q-learning with linear models and other frequently-used methods based on propensity scores in terms of coverage and bias/MSE. This represents a promising step toward a more fully general Q-learning approach to estimating optimal dynamic treatment regimes

Enlighten

iqLearn: Interactive Q-Learning in R

Author: Laber Eric B.
Linn Kristin A.
Stefanski Leonard A.
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 01/02/2015
Field of study

Chronic illness treatment strategies must adapt to the evolving health status of the patient receiving treatment. Data-driven dynamic treatment regimes can offer guidance for clinicians and intervention scientists on how to treat patients over time in order to bring about the most favorable clinical outcome on average. Methods for estimating optimal dynamic treatment regimes, such as Q-learning, typically require modeling non- smooth, nonmonotone transformations of data. Thus, building well-fitting models can be challenging and in some cases may result in a poor estimate of the optimal treatment regime. Interactive Q-learning (IQ-learning) is an alternative to Q-learning that only requires modeling smooth, monotone transformations of the data. The R package iqLearn provides functions for implementing both the IQ-learning and Q-learning algorithms. We demonstrate how to estimate a two-stage optimal treatment policy with iqLearn using a generated data set bmiData which mimics a two-stage randomized body mass index reduction trial with binary treatments at each stage

Directory of Open Access Journals

PubMed Central

Journal of Statistical Software

Quality adjusted Q-learning and conditional structural mean models for optimizing dynamic treatment regimes

Author: Johnson Geoffrey
Publication venue
Publication date: 09/09/2016
Field of study

The focus of this work is to investigate a form of Q-learning using estimating equations for quality adjusted survival time, and to generalize these methods to quality adjust other outcomes. We use the m-out-of-n bootstrap and threshold utility analysis to show how the patient-specific optimal regime varies according to treatment characteristics (e.g. cost, side effects). Methodologies investigated are demonstrated to construct optimal treatment regimes for the treatment of children's neuroblastoma. We also propose a new method for optimizing dynamic treatment regimes using conditional structural mean models. The inverse-probability-of-treatment weighted (IPTW) or g-computation estimator is used at each stage to estimate what we call the `preliminary' optimal treatment regime, given patient information up to the current stage and prior treatment assignment. Essentially this tailors the optimal treatment assignment at the current stage, and provides an optimal strategy for the remaining stages given the information currently available. We compare this method for optimizing a dynamic treatment regime to Q-learning. Additionally, we propose a two step prescriptive variable selection procedure that supports the tailored optimization of dynamic treatment regimes using conditional structural mean models by eliminating from consideration any suboptimal treatment regimes and sifting out the covariates that prescribe the optimal treatment regimes. The methods described herein are meant to advance the field of dynamic treatment regimes, a field that has a substantial impact on public health. The treatment policies that come from DTRs, whether determined for the population as a whole or tailored for specific subgroups, can be used to guide and shape health policies that will ultimately lead to greater public health and safety

D-Scholarship@Pitt

Extending Dynamic Treatment Regimes to Incorporate Longitudinal Data Observed Between Decision Times

Author: Li Mengbing
Publication venue: University of North Carolina at Chapel Hill
Publication date: 01/01/2017
Field of study

Personalized medicine refers to the medical scheme that tailors treatment to individuals based on individual characteristics, predicted risks, and expected outcomes. Two important components of personalized medicine involve the estimation of individualized treatment rules (ITRs) and the design of adaptive clinical trials. Dynamic treatment regimes (DTRs) are sequential treatment rules for individual patients that are adaptive over their disease progresses. Much research on estimation of the optimal DTRs has been carried out in the recent decade, and machine learning methods have been employed in the estimation. It should be noted that when estimating the optimal DTRs, we usually face the issue of sparsity in asynchronously collected data, which standard statistical methods for longitudinal data may not be applicable. In this thesis, we first review existing two major machine learning methods, Q-learning and outcome weighted learning, that are applicable to estimating ITRs with longitudinal data. Then we propose a new learning method that deal with asynchronous sparse longitudinal data when the treatment option is binary. This method uses a counting process to generate new features, and then utilizes a Q-learning-like approach to estimate parameters in the decision function. We also discuss advantages and limitations of the proposed method, as well as possible directions of future research.Bachelor of Science in Public Healt

Carolina Digital Repository

Optimal treatment allocations in space and time for on-line control of an emerging infectious disease

Author: Agarwal A.
Anderson R. M.
Bertsekas D. P.
Borth D. M.
Chapelle O.
Chapelle O.
Chesterton G. K.
Choi A. L.
Cox D. R.
Deardon R.
Estrada E.
Field K.
Gelman A.
Ghavamzadeh M.
Ghavamzadeh M.
Huang C.‐Y.
Kushner H. J.
Law A. M.
Little R. J.
Lusher D.
Mahadevan S.
May B. C.
Murphy S. A.
Murphy S. A.
Nahum‐Shani I.
Newton M. A.
Orellana L.
Osband I.
Palmer J. M.
Poupart P.
Ross S.
Russo D.
Sen A.
Spall J. C.
Subcommittee on Fisheries Wildlife, and Oceans
Sutton R.
Sutton R. S.
West M.
Yin G.
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

A key component in controlling the spread of an epidemic is deciding where, whenand to whom to apply an intervention.We develop a framework for using data to informthese decisionsin realtime.We formalize a treatment allocation strategy as a sequence of functions, oneper treatment period, that map up-to-date information on the spread of an infectious diseaseto a subset of locations where treatment should be allocated. An optimal allocation strategyoptimizes some cumulative outcome, e.g. the number of uninfected locations, the geographicfootprint of the disease or the cost of the epidemic. Estimation of an optimal allocation strategyfor an emerging infectious disease is challenging because spatial proximity induces interferencebetween locations, the number of possible allocations is exponential in the number oflocations, and because disease dynamics and intervention effectiveness are unknown at outbreak.We derive a Bayesian on-line estimator of the optimal allocation strategy that combinessimulation–optimization with Thompson sampling.The estimator proposed performs favourablyin simulation experiments. This work is motivated by and illustrated using data on the spread ofwhite nose syndrome, which is a highly fatal infectious disease devastating bat populations inNorth America

Crossref

eScholarship - University of California