918 research outputs found
Targeted Maximum Likelihood Estimation: A Gentle Introduction
This paper provides a concise introduction to targeted maximum likelihood estimation (TMLE) of causal effect parameters. The interested analyst should gain sufficient understanding of TMLE from this introductory tutorial to be able to apply the method in practice. A program written in R is provided. This program implements a basic version of TMLE that can be used to estimate the effect of a binary point treatment on a continuous or binary outcome
tmle: An R Package for Targeted Maximum Likelihood Estimation
Targeted maximum likelihood estimation (TMLE) presents an approach for construction of an efficient double-robust semi-parametric substitution estimator of a target feature of the data generating distribution, such as a statistical association measure or a causal effect parameter. tmle is a recently developed R package that implements TMLE for estimation of the effect of a binary treatment at a single point in time on an outcome of interest, controlling for user supplied covariates: the additive treatment effect, the relative risk, the odds ratio. The package allows outcome data with missingness, and experimental units that contribute repeated records of the point-treatment data structure, thereby allowing this package to analyze longitudinal data structures. The TMLE of the direct effect of the binary treatment, controlling for a binary intermediate variable on the pathway from treatment to the outcome, is also implemented. Estimation of the parameters of a marginal structural model for binary treatments is also provided. Relevant factors of the likelihood may be modeled or fit by user-specified commands, or fit data-adaptively internally. Effect estimates, variances, p-values, and 95% confidence intervals are provided by the software
Readings in Targeted Maximum Likelihood Estimation
This is a compilation of current and past work on targeted maximum likelihood estimation. It features the original targeted maximum likelihood learning paper as well as chapters on super (machine) learning using cross validation, randomized controlled trials, realistic individualized treatment rules in observational studies, biomarker discovery, case-control studies, and time-to-event outcomes with censored data, among others. We hope this collection is helpful to the interested reader and stimulates additional research in this important area
Collaborative Targeted Maximum Likelihood Estimation
Collaborative double robust targeted maximum likelihood estimators represent a fundamental further advance over standard targeted maximum likelihood estimators of causal inference and variable importance parameters. The targeted maximum likelihood approach involves fluctuating an initial density estimate, (Q), in order to make a bias/variance tradeoff targeted towards a specific parameter in a semi-parametric model. The fluctuation involves estimation of a nuisance parameter portion of the likelihood, g. TMLE and other double robust estimators have been shown to be consistent and asymptotically normally distributed (CAN) under regularity conditions, when either one of these two factors of the likelihood of the data is correctly specified.
In this article we provide a template for applying collaborative targeted maximum likelihood estimation (C-TMLE) to the estimation of pathwise differentiable parameters in semi-parametric models. The procedure creates a sequence of candidate targeted maximum likelihood estimators based on an initial estimate for Q coupled with a succession of increasingly non-parametric estimates for g. In a departure from current state of the art nuisance parameter estimation, C-TMLE estimates of g are constructed based on a loss function for the relevant factor Q_0, instead of a loss function for the nuisance parameter itself. Likelihood-based cross-validation is used to select the best estimator among all candidate TMLE estimators in this sequence. A penalized-likelihood loss function for Q_0 is suggested when the parameter of interest is borderline-identifiable.
We present theoretical results for collaborative double robustness, demonstrating that the collaborative targeted maximum likelihood estimator is CAN when Q and g are both mis-specified, providing that g solves a specified score equation implied by the difference between the Q and the true Q_0.
This marks an improvement over the current definition of double robustness in the estimating equation literature.
We also establish an asymptotic linearity theorem for the C-DR-TMLE of the target parameter, showing that the C-DR-TMLE is more adaptive to the truth, and, as a consequence, can even be super efficient if the first stage density estimator does an excellent job itself with respect to the target parameter.
This research provides a template for targeted efficient and robust loss-based learning of a particular target feature of the probability distribution of the data within large (infinite dimensional) semi-parametric models, while still providing statistical inference in terms of confidence intervals and p-values. This research also breaks with a taboo (e.g., in the propensity score literature in the field of causal inference) on using the relevant part of likelihood to fine-tune the fitting of the nuisance parameter/censoring mechanism/treatment mechanism
Targeted Minimum Loss Based Estimation of an Intervention Specific Mean Outcome
Targeted minimum loss based estimation (TMLE) provides a template for the construction of semiparametric locally efficient double robust substitution estimators of the target parameter of the data generating distribution in a semiparametric censored data or causal inference model based on a sample of independent and identically distributed copies from this data generating distribution (van der Laan and Rubin (2006), van der Laan (2008), van der Laan and Rose (2011)). TMLE requires 1) writing the target parameter as a particular mapping from a typically infinite dimensional parameter of the probability distribution of the unit data structure into the parameter space, 2) computing the canonical gradient/efficient influence curve of the pathwise derivative of the target parameter mapping, 3) specifying a loss function for this parameter that is possibly indexed by unknown nuisance parameters, 4) a least favorable parametric submodel/path through an initial/current estimator of the parameter chosen so that the linear span of the generalized loss-based score at zero fluctuation includes the efficient influence curve, and 5) an updating algorithm involving the iterative minimization of the loss-specific empirical risk over the fluctuation parameters of the least favorable parametric submodel/path. By the generalized loss-based score condition 4) on the submodel and loss function, it follows that the resulting estimator of the infinite dimensional parameter solves the efficient influence curve (i.e., efficient score) equation, providing the basis for the double robustness and asymptotic efficiency of the corresponding substitution estimator of the target parameter obtained by plugging in the updated estimator of the infinite dimensional parameter in the target parameter mapping.
To enhance the finite sample performance of the TMLE of the target parameter, it is of interest to choose the parameter and the nuisance parameter of the loss function as low dimensional as possible. Inspired by this goal, we present a particular closed form TMLE of an intervention specific mean outcome based on general longitudinal data structures. %We also present its generalization of this type of TMLE to other causal parameters. This TMLE provides an alternative to the closed form TMLE presented in van der Laan and Gruber (2010) and Stitelman and vanderLaan (2011) based on the log-likelihood loss function. The theoretical properties of the TMLE are also practically demonstrated with a small scale simulation study. The proposed TMLE builds upon a previously proposed estimator by Bang and Robins (2005) by integrating some of its key and innovative ideas into the TMLE framework
Recommended from our members
Intrinsic Frontolimbic Connectivity and Mood Symptoms in Young Adult Cannabis Users.
Objective: The endocannbinoid system and cannabis exposure has been implicated in emotional processing. The current study examined whether regular cannabis users demonstrated abnormal intrinsic (a.k.a. resting state) frontolimbic connectivity compared to non-users. A secondary aim examined the relationship between cannabis group connectivity differences and self-reported mood and affect symptoms. Method: Participants included 79 cannabis-using and 80 non-using control emerging adults (ages of 18-30), balanced for gender, reading ability, and age. Standard multiple regressions were used to predict if cannabis group status was associated with frontolimbic connectivity after controlling for site, past month alcohol and nicotine use, and days of abstinence from cannabis. Results: After controlling for research site, past month alcohol and nicotine use, and days of abstinence from cannabis, cannabis users demonstrated significantly greater connectivity between left rACC and the following: right rACC (p = 0.001; corrected p = 0.05; f 2 = 0.55), left amygdala (p = 0.03; corrected p = 0.47; f 2 = 0.17), and left insula (p = 0.03; corrected p = 0.47; f 2 = 0.16). Among cannabis users, greater bilateral rACC connectivity was significantly associated with greater subthreshold depressive symptoms (p = 0.02). Conclusions: Cannabis using young adults demonstrated greater connectivity within frontolimbic regions compared to controls. In cannabis users, greater bilateral rACC intrinsic connectivity was associated with greater levels of subthreshold depression symptoms. Current findings suggest that regular cannabis use during adolescence is associated with abnormal frontolimbic connectivity, especially in cognitive control and emotion regulation regions
Integrating Cooperative Learning and Structured Learning: Effective Approaches to Teaching Social Skills
Evaluating treatment effectiveness under model misspecification : a comparison of targeted maximum likelihood estimation with bias-corrected matching
Statistical approaches for estimating treatment effectiveness commonly model the endpoint, or the propensity score, using parametric regressions such as generalised linear models. Misspecification of these models can lead to biased parameter estimates. We compare two approaches that combine the propensity score and the endpoint regression, and can make weaker modelling assumptions, by using machine learning approaches to estimate the regression function and the propensity score. Targeted maximum likelihood estimation is a double-robust method designed to reduce bias in the estimate of the parameter of interest. Bias-corrected matching reduces bias due to covariate imbalance between matched pairs by using regression predictions. We illustrate the methods in an evaluation of different types of hip prosthesis on the health-related quality of life of patients with osteoarthritis. We undertake a simulation study, grounded in the case study, to compare the relative bias, efficiency and confidence interval coverage of the methods. We consider data generating processes with non-linear functional form relationships, normal and non-normal endpoints. We find that across the circumstances considered, bias-corrected matching generally reported less bias, but higher variance than targeted maximum likelihood estimation. When either targeted maximum likelihood estimation or bias-corrected matching incorporated machine learning, bias was much reduced, compared to using misspecified parametric models
Targeted Maximum Likelihood Estimation for Dynamic and Static Longitudinal Marginal Structural Working Models
This paper describes a targeted maximum likelihood estimator (TMLE) for the parameters of longitudinal static and dynamic marginal structural models. We consider a longitudinal data structure consisting of baseline covariates, time-dependent intervention nodes, intermediate time-dependent covariates, and a possibly time-dependent outcome. The intervention nodes at each time point can include a binary treatment as well as a right-censoring indicator. Given a class of dynamic or static interventions, a marginal structural model is used to model the mean of the intervention-specific counterfactual outcome as a function of the intervention, time point, and possibly a subset of baseline covariates. Because the true shape of this function is rarely known, the marginal structural model is used as a working model. The causal quantity of interest is defined as the projection of the true function onto this working model. Iterated conditional expectation double robust estimators for marginal structural model parameters were previously proposed by Robins (2000, 2002) and Bang and Robins (2005). Here we build on this work and present a pooled TMLE for the parameters of marginal structural working models. We compare this pooled estimator to a stratified TMLE (Schnitzer et al. 2014) that is based on estimating the intervention-specific mean separately for each intervention of interest. The performance of the pooled TMLE is compared to the performance of the stratified TMLE and the performance of inverse probability weighted (IPW) estimators using simulations. Concepts are illustrated using an example in which the aim is to estimate the causal effect of delayed switch following immunological failure of first line antiretroviral therapy among HIV-infected patients. Data from the International Epidemiological Databases to Evaluate AIDS, Southern Africa are analyzed to investigate this question using both TML and IPW estimators. Our results demonstrate practical advantages of the pooled TMLE over an IPW estimator for working marginal structural models for survival, as well as cases in which the pooled TMLE is superior to its stratified counterpar
The Relative Performance of Targeted Maximum Likelihood Estimators
There is an active debate in the literature on censored data about the relative performance of model based maximum likelihood estimators, IPCW-estimators, and a variety of double robust semiparametric efficient estimators. Kang and Schafer (2007) demonstrate the fragility of double robust and IPCW-estimators in a simulation study with positivity violations. They focus on a simple missing data problem with covariates where one desires to estimate the mean of an outcome that is subject to missingness. Responses by Robins et al. (2007), Tsiatis and Davidian (2007), Tan (2007a) and Ridgeway and McCaffrey (2007) further explore the challenges faced by double robust estimators and offer suggestions for improving their stability. In this article, we join the debate by presenting targeted maximum likelihood estimators (TMLEs). We demonstrate that TMLEs that guarantee that the parametric submodel employed by the TMLE-procedure respects the global bounds on the continuous outcomes, are especially suitable for dealing with positivity violations because in addition to being double robust and semiparametric efficient, they are substitution estimators. We demonstrate the practical performance of TMLEs relative to other estimators in the simulations designed by Kang and Schafer (2007) and in modified simulations with even greater estimation challenges
- …
