Search CORE

70 research outputs found

The Causal Roadmap and simulation studies to inform the Statistical Analysis Plan for real-data applications

Author: Balzer Laura
Nance Nerissa
Publication venue
Publication date: 07/09/2023
Field of study

The Causal Roadmap outlines a systematic approach to our research endeavors: define quantity of interest, evaluate needed assumptions, conduct statistical estimation, and carefully interpret of results. At the estimation step, it is essential that the estimation algorithm be chosen thoughtfully for its theoretical properties and expected performance. Simulations can help researchers gain a better understanding of an estimator's statistical performance under conditions unique to the real-data application. This in turn can inform the rigorous pre-specification of a Statistical Analysis Plan (SAP), not only stating the estimand (e.g., G-computation formula), the estimator (e.g., targeted minimum loss-based estimation [TMLE]), and adjustment variables, but also the implementation of the estimator -- including nuisance parameter estimation and approach for variance estimation. Doing so helps ensure valid inference (e.g., 95% confidence intervals with appropriate coverage). Failing to pre-specify estimation can lead to data dredging and inflated Type-I error rates

arXiv.org e-Print Archive

ROBUST ESTIMATION OF THE AVERAGE TREATMENT EFFECT IN ALZHEIMER\u27S DISEASE CLINICAL TRIALS

Author: Colantuoni Elizabeth
McDermont Aidan
Rosenblum Michael
Publication venue: Collection of Biostatistics Research Archive
Publication date: 26/03/2018
Field of study

The primary analysis of Alzheimer\u27s disease clinical trials often involves a mixed-model repeated measure (MMRM) approach. We consider another estimator of the average treatment effect, called targeted minimum loss based estimation (TMLE). This estimator is more robust to violations of assumptions about missing data than MMRM. We compare TMLE versus MMRM by analyzing data from a completed Alzheimer\u27s disease trial data set and by simulation studies. The simulations involved different missing data distributions, where loss to followup at a given visit could depend on baseline variables, treatment assignment, and the outcome measured at previous visits. The TMLE generally has improved robustness in our simulated settings, i.e., less bias and mean squared error, and better confidence interval coverage probability. The robustness is due to the TMLE correctly modeling the dropout distribution. We illustrate the tradeoffs between these estimators and give recommendations for how to use these estimators in practice

Collection Of Biostatistics Research Archive

A generalization of moderated statistics to data adaptive semiparametric estimation in high-dimensional biology

Author: Hejazi Nima S.
Hubbard Alan E.
van der Laan Mark J.
Publication venue
Publication date: 28/05/2020
Field of study

The widespread availability of high-dimensional biological data has made the simultaneous screening of numerous biological characteristics a central statistical problem in computational biology. While the dimensionality of such datasets continues to increase, the problem of teasing out the effects of biomarkers in studies measuring baseline confounders while avoiding model misspecification remains only partially addressed. Efficient estimators constructed from data adaptive estimates of the data-generating distribution provide an avenue for avoiding model misspecification; however, in the context of high-dimensional problems requiring simultaneous estimation of numerous parameters, standard variance estimators have proven unstable, resulting in unreliable Type-I error control under standard multiple testing corrections. We present the formulation of a general approach for applying empirical Bayes shrinkage approaches to asymptotically linear estimators of parameters defined in the nonparametric model. The proposal applies existing shrinkage estimators to the estimated variance of the influence function, allowing for increased inferential stability in high-dimensional settings. A methodology for nonparametric variable importance analysis for use with high-dimensional biological datasets with modest sample sizes is introduced and the proposed technique is demonstrated to be robust in small samples even when relying on data adaptive estimators that eschew parametric forms. Use of the proposed variance moderation strategy in constructing stabilized variable importance measures of biomarkers is demonstrated by application to an observational study of occupational exposure. The result is a data adaptive approach for robustly uncovering stable associations in high-dimensional data with limited sample sizes

arXiv.org e-Print Archive

Second-Order Inference for the Mean of a Variable Missing at Random

Author: Carone Marco
Díaz Iván
van der Laan Mark J.
Publication venue
Publication date: 26/05/2015
Field of study

We present a second-order estimator of the mean of a variable subject to missingness, under the missing at random assumption. The estimator improves upon existing methods by using an approximate second-order expansion of the parameter functional, in addition to the first-order expansion employed by standard doubly robust methods. This results in weaker assumptions about the convergence rates necessary to establish consistency, local efficiency, and asymptotic linearity. The general estimation strategy is developed under the targeted minimum loss-based estimation (TMLE) framework. We present a simulation comparing the sensitivity of the first and second order estimators to the convergence rate of the initial estimators of the outcome regression and missingness score. In our simulation, the second-order TMLE improved the coverage probability of a confidence interval by up to 85%. In addition, we present a first-order estimator inspired by a second-order expansion of the parameter functional. This estimator only requires one-dimensional smoothing, whereas implementation of the second-order TMLE generally requires kernel smoothing on the covariate space. The first-order estimator proposed is expected to have improved finite sample performance compared to existing first-order estimators. In our simulations, the proposed first-order estimator improved the coverage probability by up to 90%. We provide an illustration of our methods using a publicly available dataset to determine the effect of an anticoagulant on health outcomes of patients undergoing percutaneous coronary intervention. We provide R code implementing the proposed estimator

arXiv.org e-Print Archive

Collection Of Biostatistics Research Archive

Targeted learning in observational studies with multi-valued treatments: An evaluation of antipsychotic drug treatment safety

Author: Cristea-Platon Tudor
Diaz Jordi
Horvitz-Lennon Marcela
Huijskens Thomas
Normand Sharon-Lise
Poulos Jason
Tyagi Pooja
Yan Jiaju
Zelevinsky Katya
Publication venue
Publication date: 28/11/2023
Field of study

We investigate estimation of causal effects of multiple competing (multi-valued) treatments in the absence of randomization. Our work is motivated by an intention-to-treat study of the relative cardiometabolic risk of assignment to one of six commonly prescribed antipsychotic drugs in a cohort of nearly 39,000 adults with serious mental illnesses. Doubly-robust estimators, such as targeted minimum loss-based estimation (TMLE), require correct specification of either the treatment model or outcome model to ensure consistent estimation; however, common TMLE implementations estimate treatment probabilities using multiple binomial regressions rather than multinomial regression. We implement a TMLE estimator that uses multinomial treatment assignment and ensemble machine learning to estimate average treatment effects. Our multinomial implementation improves coverage, but does not necessarily reduce bias, relative to the binomial implementation in simulation experiments with varying treatment propensity overlap and event rates. Evaluating the causal effects of the antipsychotics on three-year diabetes risk or death, we find a safety benefit of moving from a second-generation drug considered among the safest of the second-generation drugs to an infrequently prescribed first-generation drug known for having low cardiometabolic risk

arXiv.org e-Print Archive

Adaptive Pair-Matching in the SEARCH Trial and Estimation of the Intervention Effect

Author: Balzer Laura
Petersen Maya L.
van der Laan Mark J.
Publication venue: Collection of Biostatistics Research Archive
Publication date: 31/01/2014
Field of study

In randomized trials, pair-matching is an intuitive design strategy to protect study validity and to potentially increase study power. In a common design, candidate units are identified, and their baseline characteristics used to create the best n/2 matched pairs. Within the resulting pairs, the intervention is randomized, and the outcomes measured at the end of follow-up. We consider this design to be adaptive, because the construction of the matched pairs depends on the baseline covariates of all candidate units. As consequence, the observed data cannot be considered as n/2 independent, identically distributed (i.i.d.) pairs of units, as current practice assumes. Instead, the observed data consist of n dependent units. This paper explores the consequences of adaptive pair-matching in randomized trials for estimation of the average treatment effect, given the baseline covariates of the n study units. We contrast the unadjusted estimator with targeted minimum loss-based estimation (TMLE) and show substantial efficiency gains from matching and further gains with adjustment. This work is motivated by the Sustainable East Africa Research in Community Health (SEARCH) study, a community randomized trial to evaluate the impact of immediate and streamlined antiretroviral therapy on HIV incidence in rural East Africa

Collection Of Biostatistics Research Archive