1,243 research outputs found
Causal Effect Random Forest Of Interaction Trees For Learning Individualized Treatment Regimes In Observational Studies: With Applications To Education Study Data
Learning individualized treatment regimes (ITR) using observational data holds great interest in various fields, as treatment recommendations based on individual characteristics may improve individual treatment benefits with a reduced cost. It has long been observed that different individuals may respond to a certain treatment with significant heterogeneity. ITR can be defined as a mapping between individual characteristics to a treatment assignment. The optimal ITR is the treatment assignment that maximizes expected individual treatment effects. Rooted from personalized medicine, many studies and applications of ITR are in medical fields and clinical practice. Heterogeneous responses are also well documented in educational interventions. However, unlike the efficacy study in medical studies, educational interventions are often not randomized. Study results often suffer greatly from self-selection bias. Besides the intervention itself, the efficacy and effectiveness of interventions usually interact with a wide range of confounders.
In this study, we propose a novel algorithm to extend random forest of interaction trees to Casual Effect Random Forest of Interaction Trees (CERFIT) for learning individualized treatment effects and regimes. We first consider the study under a binary treatment setting. Each interaction tree recursively partitions the data into two subgroups with greatest heterogeneity of treatment effect. By integrating propensity score into the tree growing process, subgroups from the proposed CERFIT not only have maximized treatment effect differences, but also similar baseline covariates. Thus it allows for the estimation of the individualized treatment effects using observational data. In addition, we also propose to use residuals from linear models instead of the original responses in the algorithm. By doing so, the numerical stability of the algorithm is greatly improved, which leads to an improved prediction accuracy. We then consider the learning problem under non-binary treatment settings. For multiple treatments, through recursively partitioning data into two subgroups with greatest treatment effects heterogeneity with respect to two randomly selected treatment groups, the algorithm transforms the multiple learning ITR into a binary task. Similarly, continuous treatment can be handled through recursively partitioning the data into subgroups with greatest homogeneity in terms of the association between the response and the treatment within a child node. For all treatment settings, the CERFIT provides variable importance ranking in terms of treatment effects. Extensive simulation studies for assessing estimation accuracy and variable importance ranking are presented. CERFIT demonstrates competitive performance among all competing methods in simulation studies. The methods are also illustrated through an assessment of a voluntary education intervention for binary treatment setting and learning optimal ITR among multiple interventions for non-binary treatments using data from a large public university
Modern approaches for evaluating treatment effect heterogeneity from clinical trials and observational data
In this paper we review recent advances in statistical methods for the
evaluation of the heterogeneity of treatment effects (HTE), including subgroup
identification and estimation of individualized treatment regimens, from
randomized clinical trials and observational studies. We identify several types
of approaches using the features introduced in Lipkovich, Dmitrienko and
D'Agostino (2017) that distinguish the recommended principled methods from
basic methods for HTE evaluation that typically rely on rules of thumb and
general guidelines (the methods are often referred to as common practices). We
discuss the advantages and disadvantages of various principled methods as well
as common measures for evaluating their performance. We use simulated data and
a case study based on a historical clinical trial to illustrate several new
approaches to HTE evaluation
Causal Inference under Data Restrictions
This dissertation focuses on modern causal inference under uncertainty and
data restrictions, with applications to neoadjuvant clinical trials,
distributed data networks, and robust individualized decision making.
In the first project, we propose a method under the principal stratification
framework to identify and estimate the average treatment effects on a binary
outcome, conditional on the counterfactual status of a post-treatment
intermediate response. Under mild assumptions, the treatment effect of interest
can be identified. We extend the approach to address censored outcome data. The
proposed method is applied to a neoadjuvant clinical trial and its performance
is evaluated via simulation studies.
In the second project, we propose a tree-based model averaging approach to
improve the estimation accuracy of conditional average treatment effects at a
target site by leveraging models derived from other potentially heterogeneous
sites, without them sharing subject-level data. The performance of this
approach is demonstrated by a study of the causal effects of oxygen therapy on
hospital survival rates and backed up by comprehensive simulations.
In the third project, we propose a robust individualized decision learning
framework with sensitive variables to improve the worst-case outcomes of
individuals caused by sensitive variables that are unavailable at the time of
decision. Unlike most existing work that uses mean-optimal objectives, we
propose a robust learning framework by finding a newly defined quantile- or
infimum-optimal decision rule. From a causal perspective, we also generalize
the classic notion of (average) fairness to conditional fairness for individual
subjects. The reliable performance of the proposed method is demonstrated
through synthetic experiments and three real-data applications.Comment: PhD dissertation, University of Pittsburgh. The contents are mostly
based on arXiv:2211.06569, arXiv:2103.06261 and arXiv:2103.04175 with
extended discussion
Recommended from our members
Machine Learning Methods for Personalized Medicine Using Electronic Health Records
The theme of this dissertation focuses on methods for estimating personalized treatment using machine learning algorithms leveraging information from electronic health records (EHRs). Current guidelines for medical decision making largely rely on data from randomized controlled trials (RCTs) studying average treatment effects. However, RCTs are usually conducted under specific inclusion/exclusion criteria, they may be inadequate to make individualized treatment decisions in real-world settings. Large-scale EHR provides opportunities to fulfill the goals of personalized medicine and learn individualized treatment rules (ITRs) depending on patient-specific characteristics from real-world patient data. On the other hand, since patients' electronic health records (EHRs) document treatment prescriptions in the real world, transferring information in EHRs to RCTs, if done appropriately, could potentially improve the performance of ITRs, in terms of precision and generalizability. Furthermore, EHR data domain usually consists text notes or similar structures, thus topic modeling techniques can be adapted to engineer features.
In the first part of this work, we address challenges with EHRs and propose a machine learning approach based on matching techniques (referred as M-learning) to estimate optimal ITRs from EHRs. This new learning method performs matching method instead of inverse probability weighting as commonly used in many existing methods for estimating ITRs to more accurately assess individuals' treatment responses to alternative treatments and alleviate confounding. Matching-based value functions are proposed to compare matched pairs under a unified framework, where various types of outcomes for measuring treatment response (including continuous, ordinal, and discrete outcomes) can easily be accommodated. We establish the Fisher consistency and convergence rate of M-learning. Through extensive simulation studies, we show that M-learning outperforms existing methods when propensity scores are misspecified or when unmeasured confounders are present in certain scenarios. In the end of this part, we apply M-learning to estimate optimal personalized second-line treatments for type 2 diabetes patients to achieve better glycemic control or reduce major complications using EHRs from New York Presbyterian Hospital (NYPH).
In the second part, we propose a new domain adaptation method to learn ITRs in by incorporating information from EHRs. Unless assuming no unmeasured confounding in EHRs, we cannot directly learn the optimal ITR from the combined EHR and RCT data. Instead, we first pre-train “super" features from EHRs that summarize physicians' treatment decisions and patients' observed benefits in the real world, which are likely to be informative of the optimal ITRs. We then augment the feature space of the RCT and learn the optimal ITRs stratifying by these features using RCT patients only. We adopt Q-learning and a modified matched-learning algorithm for estimation. We present theoretical justifications and conduct simulation studies to demonstrate the performance of our proposed method. Finally, we apply our method to transfer information learned from EHRs of type 2 diabetes (T2D) patients to improve learning individualized insulin therapies from an RCT.
In the last part of this work, we report M-learning proposed in the first part to learn ITRs using interpretable features extracted from EHR documentation of medications and ICD diagnoses codes. We use a latent Dirichlet allocation (LDA) model to extract latent topics and weights as features for learning ITRs. Our method achieves confounding reduction in observational studies through matching treated and untreated individuals and improves treatment optimization by augmenting feature space with clinically meaningful LDA-based features. We apply the method to extract LDA-based features in EHR data collected at NYPH clinical data warehouse in studying optimal second-line treatment for T2D patients. We use cross validation to show that ITRs outperforms uniform treatment strategies (i.e., assigning insulin or another class of oral organic compounds to all individuals), and including topic modeling features leads to more reduction of post-treatment complications
Clinical Pathways in Stroke Rehabilitation
This open access book focuses on practical clinical problems that are frequently encountered in stroke rehabilitation. Consequences of diseases, e.g. impairments and activity limitations, are addressed in rehabilitation with the overall goal to reduce disability and promote participation. Based on the available best external evidence, clinical pathways are described for stroke rehabilitation bridging the gap between clinical evidence and clinical decision-making. The clinical pathways answer the questions which rehabilitation treatment options are beneficial to overcome specific impairment constellations and activity limitations and are well acceptable to stroke survivors, as well as when and in which settings to provide rehabilitation over the course of recovery post stroke. Each chapter starts with a description of the clinical problem encountered. This is followed by a systematic, but concise review of the evidence (RCTs, systematic reviews and meta-analyses) that is relevant for clinical decision-making, and comments on assessment, therapy (training, technology, medication), and the use of technical aids as appropriate. Based on these summaries, clinical algorithms / pathways are provided and the main clinical-decision situations are portrayed. The book is invaluable for all neurorehabilitation team members, clinicians, nurses, and therapists in neurology, physical medicine and rehabilitation, and related fields. It is a World Federation for NeuroRehabilitation (WFNR) educational initiative, bridging the gap between the rapidly expanding clinical research in stroke rehabilitation and clinical practice across societies and continents. It can be used for both clinical decision-making for individuals and as well as clinical background knowledge for stroke rehabilitation service development initiatives. ; Provides evidence-based clinical practice guidelines for stroke rehabilitation Discusses clinical problems and evidence, comments on assessment, therapy and technical aids Written by experienced experts with a background in clinical practic
The role of key pharmacodynamic and pharmacokinetic parameters in drug response prediction of pediatric tumors in the precision oncology study INFORM
The first results of the German pediatric precision oncology program INdividualized Therapy FOr Relapsed Malignancies in Childhood (INFORM) showed the significance of high evidence levels for successfully matched targeted therapy based solely on molecular diagnostics. Yet, only a small number of patients (8%, 42/519) (1) actually present with a high evidence target, highlighting an unmet need to improve drug response predictions and clinical treatment recommendations. Therefore, the aim of this thesis is to integrate pharmacodynamic (PD) parameters from Drug Sensitivity Profiling (DSP) with pharmacokinetic (PK) parameters, and improve drug response prediction in high risk pediatric patients.
To achieve this aim, a literature review was conducted, and nine PK parameters focused on the pediatric population were collected for the drugs from the DSP drug library in the INFORM study. In addition, a database of primary patient tumor (PPT) samples (n=68) and a database of positive control cell (PCC) line models (n=7) were generated. The PCC models habor a specific molecular alteration (e.g., BRAF V600E, NTRK fusion) with a clinically proven drug- target relationship. Among the 68 PPT samples, five samples (PPT subgroup I) harbored a very high priorty (INFORM priorty score 1) alteration with a clinically proven drug-target relationship. Both the PPT samples and PCC models underwent DSP using a library of 79 clinically relevant oncology drugs. Hit selection was based on dose-response curves-derived PD parameters and PD-PK integrated parameters. These parameters were evaluated for their predictive value in the PCC models and the PPT subgroup I samples. Subsequently, the parameter with the best predictive value was investigated in the PPT samples without a defined drug-target relationship.
A PK database of 74 drugs and nine PK parameters for each drug focusing on the pediatric population was successfully created and published for the scientific community. When investigating the predictive power of PD parameters, the drug sensitivity score (DSS) z-score showed the best predictive power in identifying the matching drug in the PPT subgroup I samples based on the molecular background. However, the DSS z-score could not capture the patient's clinical history. Conversely, the integrated PD-PK parameter, the DSS Cmax z- score, could effectively capture the patient's clinical history in the PPT subgroup I samples. In the PPT samples without a defined drug target match and no clinical treatment history, the DSS Cmax z-score provided additional insights for 77% (n=53/68) of the patient samples that were not detected by NGS molecular analysis.
In summary, a previously unavailable and comprehensive pediatric PD database was generated and published to serve the scientific community. The PK parameter Cmax was identified and successfully integrated with the DSS, introducing a novel DSP metric for drug response prediction. The groundwork established by testing and describing the DSS Cmax z- score in this thesis serves as a foundation for further investigation in larger datasets with clinical outcomes. This could refine the prediction of drug response for pediatric high-risk patients and improve their treatment selection without relying on time-consuming and costly techniques
The Use of Routinely Collected Data in Clinical Trial Research
RCTs are the gold standard for assessing the effects of medical interventions, but they also pose many challenges, including the often-high costs in conducting them and a potential lack of generalizability of their findings. The recent increase in the availability of so called routinely collected data (RCD) sources has led to great interest in their application to support RCTs in an effort to increase the efficiency of conducting clinical trials. We define all RCTs augmented by RCD in any form as RCD-RCTs. A major subset of RCD-RCTs are performed at the point of care using electronic health records (EHRs) and are referred to as point-of-care research (POC-R). RCD-RCTs offer several advantages over traditional trials regarding patient recruitment and data collection, and beyond. Using highly standardized EHR and registry data allows to assess patient characteristics for trial eligibility and to examine treatment effects through routinely collected endpoints or by linkage to other data sources like mortality registries. Thus, RCD can be used to augment traditional RCTs by providing a sampling framework for patient recruitment and by directly measuring patient relevant outcomes. The result of these efforts is the generation of real-world evidence (RWE).
Nevertheless, the utilization of RCD in clinical research brings novel methodological challenges, and issues related to data quality are frequently discussed, which need to be considered for RCD-RCTs. Some of the limitations surrounding RCD use in RCTs relate to data quality, data availability, ethical and informed consent challenges, and lack of endpoint adjudication which may all lead to uncertainties in the validity of their results.
The purpose of this thesis is to help fill the aforementioned research gaps in RCD-RCTs, encompassing tasks such as assessing their current application in clinical research and evaluating the methodological and technical challenges in performing them. Furthermore, it aims to assess the reporting quality of published reports on RCD-RCTs
A Bundled Approach to Integrative Care for Peripherally Inserted Extracorporeal Membrane Oxygenation Cannula Insertion Site
Purpose and Rationale: This project is designed to translate the collective knowledge and evidence-based interventions surrounding the reduction of central line-associated bloodstream infections (CLABSIs) and use these evidence-based practices to create an extracorporeal membrane oxygenation (ECMO) cannula site bundle. The desired outcome of creating an ECMO cannula-site bundle is increased cannula site integrity, decreased frequency of dressing changes, and minimized risk of cannula-site infections.
Synthesis of Evidence: Currently, there are no clinical practice guidelines for ECMO cannula site care to guide bedside practice. The ELSO Infectious Disease Taskforce recommends implementation of a CLABSI bundle for ECMO site care, but many of the products used for CLABSI prevention are not intended for large ECMO cannula sites.
Practice Change and Implementation Strategies: The DNP students will create an ECMO cannula site bundle inspired by the CLABSI evidence available, as there is currently no ECMO specific products available.
Evaluation: To evaluate project success, the DNP students will investigate the number of dressing changes required for ECMO insertion sites due to saturated and/or not intact dressings comparing pre and post implementation cohorts.
Conclusion and Implications for Practice: This project would serve as a mean to standardize approaches to dressing the ECMO cannula insertion site, and potential serve as a clinical practice guideline to cannula site care
- …