70 research outputs found
The Causal Roadmap and simulation studies to inform the Statistical Analysis Plan for real-data applications
The Causal Roadmap outlines a systematic approach to our research endeavors:
define quantity of interest, evaluate needed assumptions, conduct statistical
estimation, and carefully interpret of results. At the estimation step, it is
essential that the estimation algorithm be chosen thoughtfully for its
theoretical properties and expected performance. Simulations can help
researchers gain a better understanding of an estimator's statistical
performance under conditions unique to the real-data application. This in turn
can inform the rigorous pre-specification of a Statistical Analysis Plan (SAP),
not only stating the estimand (e.g., G-computation formula), the estimator
(e.g., targeted minimum loss-based estimation [TMLE]), and adjustment
variables, but also the implementation of the estimator -- including nuisance
parameter estimation and approach for variance estimation. Doing so helps
ensure valid inference (e.g., 95% confidence intervals with appropriate
coverage). Failing to pre-specify estimation can lead to data dredging and
inflated Type-I error rates
ROBUST ESTIMATION OF THE AVERAGE TREATMENT EFFECT IN ALZHEIMER\u27S DISEASE CLINICAL TRIALS
The primary analysis of Alzheimer\u27s disease clinical trials often involves a mixed-model repeated measure (MMRM) approach. We consider another estimator of the average treatment effect, called targeted minimum loss based estimation (TMLE). This estimator is more robust to violations of assumptions about missing data than MMRM.
We compare TMLE versus MMRM by analyzing data from a completed Alzheimer\u27s disease trial data set and by simulation studies. The simulations involved different missing data distributions, where loss to followup at a given visit could depend on baseline variables, treatment assignment, and the outcome measured at previous visits. The TMLE generally has improved robustness in our simulated settings, i.e., less bias and mean squared error, and better confidence interval coverage probability. The robustness is due to the TMLE correctly modeling the dropout distribution. We illustrate the tradeoffs between these estimators and give recommendations for how to use these estimators in practice
A generalization of moderated statistics to data adaptive semiparametric estimation in high-dimensional biology
The widespread availability of high-dimensional biological data has made the
simultaneous screening of numerous biological characteristics a central
statistical problem in computational biology. While the dimensionality of such
datasets continues to increase, the problem of teasing out the effects of
biomarkers in studies measuring baseline confounders while avoiding model
misspecification remains only partially addressed. Efficient estimators
constructed from data adaptive estimates of the data-generating distribution
provide an avenue for avoiding model misspecification; however, in the context
of high-dimensional problems requiring simultaneous estimation of numerous
parameters, standard variance estimators have proven unstable, resulting in
unreliable Type-I error control under standard multiple testing corrections. We
present the formulation of a general approach for applying empirical Bayes
shrinkage approaches to asymptotically linear estimators of parameters defined
in the nonparametric model. The proposal applies existing shrinkage estimators
to the estimated variance of the influence function, allowing for increased
inferential stability in high-dimensional settings. A methodology for
nonparametric variable importance analysis for use with high-dimensional
biological datasets with modest sample sizes is introduced and the proposed
technique is demonstrated to be robust in small samples even when relying on
data adaptive estimators that eschew parametric forms. Use of the proposed
variance moderation strategy in constructing stabilized variable importance
measures of biomarkers is demonstrated by application to an observational study
of occupational exposure. The result is a data adaptive approach for robustly
uncovering stable associations in high-dimensional data with limited sample
sizes
Second-Order Inference for the Mean of a Variable Missing at Random
We present a second-order estimator of the mean of a variable subject to
missingness, under the missing at random assumption. The estimator improves
upon existing methods by using an approximate second-order expansion of the
parameter functional, in addition to the first-order expansion employed by
standard doubly robust methods. This results in weaker assumptions about the
convergence rates necessary to establish consistency, local efficiency, and
asymptotic linearity. The general estimation strategy is developed under the
targeted minimum loss-based estimation (TMLE) framework. We present a
simulation comparing the sensitivity of the first and second order estimators
to the convergence rate of the initial estimators of the outcome regression and
missingness score. In our simulation, the second-order TMLE improved the
coverage probability of a confidence interval by up to 85%. In addition, we
present a first-order estimator inspired by a second-order expansion of the
parameter functional. This estimator only requires one-dimensional smoothing,
whereas implementation of the second-order TMLE generally requires kernel
smoothing on the covariate space. The first-order estimator proposed is
expected to have improved finite sample performance compared to existing
first-order estimators. In our simulations, the proposed first-order estimator
improved the coverage probability by up to 90%. We provide an illustration of
our methods using a publicly available dataset to determine the effect of an
anticoagulant on health outcomes of patients undergoing percutaneous coronary
intervention. We provide R code implementing the proposed estimator
Targeted learning in observational studies with multi-valued treatments: An evaluation of antipsychotic drug treatment safety
We investigate estimation of causal effects of multiple competing
(multi-valued) treatments in the absence of randomization. Our work is
motivated by an intention-to-treat study of the relative cardiometabolic risk
of assignment to one of six commonly prescribed antipsychotic drugs in a cohort
of nearly 39,000 adults with serious mental illnesses. Doubly-robust
estimators, such as targeted minimum loss-based estimation (TMLE), require
correct specification of either the treatment model or outcome model to ensure
consistent estimation; however, common TMLE implementations estimate treatment
probabilities using multiple binomial regressions rather than multinomial
regression. We implement a TMLE estimator that uses multinomial treatment
assignment and ensemble machine learning to estimate average treatment effects.
Our multinomial implementation improves coverage, but does not necessarily
reduce bias, relative to the binomial implementation in simulation experiments
with varying treatment propensity overlap and event rates. Evaluating the
causal effects of the antipsychotics on three-year diabetes risk or death, we
find a safety benefit of moving from a second-generation drug considered among
the safest of the second-generation drugs to an infrequently prescribed
first-generation drug known for having low cardiometabolic risk
Adaptive Pair-Matching in the SEARCH Trial and Estimation of the Intervention Effect
In randomized trials, pair-matching is an intuitive design strategy to protect study validity and to potentially increase study power. In a common design, candidate units are identified, and their baseline characteristics used to create the best n/2 matched pairs. Within the resulting pairs, the intervention is randomized, and the outcomes measured at the end of follow-up. We consider this design to be adaptive, because the construction of the matched pairs depends on the baseline covariates of all candidate units. As consequence, the observed data cannot be considered as n/2 independent, identically distributed (i.i.d.) pairs of units, as current practice assumes. Instead, the observed data consist of n dependent units. This paper explores the consequences of adaptive pair-matching in randomized trials for estimation of the average treatment effect, given the baseline covariates of the n study units. We contrast the unadjusted estimator with targeted minimum loss-based estimation (TMLE) and show substantial efficiency gains from matching and further gains with adjustment. This work is motivated by the Sustainable East Africa Research in Community Health (SEARCH) study, a community randomized trial to evaluate the impact of immediate and streamlined antiretroviral therapy on HIV incidence in rural East Africa
- …