46 research outputs found
Robust Inference for Mediated Effects in Partially Linear Models
We consider mediated effects of an exposure, X on an outcome, Y, via a
mediator, M, under no unmeasured confounding assumptions in the setting where
models for the conditional expectation of the mediator and outcome are
partially linear. We propose G-estimators for the direct and indirect effect
and demonstrate consistent asymptotic normality for indirect effects when
models for the conditional means of M, or X and Y are correctly specified, and
for direct effects, when models for the conditional means of Y, or X and M are
correct. This marks an improvement, in this particular setting, over previous
`triple' robust methods, which do not assume partially linear mean models.
Testing of the no-mediation hypothesis is inherently problematic due to the
composite nature of the test (either X has no effect on M or M no effect on Y),
leading to low power when both effect sizes are small. We use Generalized
Methods of Moments (GMM) results to construct a new score testing framework,
which includes as special cases the no-mediation and the no-direct-effect
hypotheses. The proposed tests rely on an orthogonal estimation strategy for
estimating nuisance parameters. Simulations show that the GMM based tests
perform better in terms of power and small sample performance compared with
traditional tests in the partially linear setting, with drastic improvement
under model misspecification. New methods are illustrated in a mediation
analysis of data from the COPERS trial, a randomized trial investigating the
effect of a non-pharmacological intervention of patients suffering from chronic
pain. An accompanying R package implementing these methods can be found at
github.com/ohines/plmed
Data-adaptive doubly robust instrumental variable methods for treatment effect heterogeneity
We consider the estimation of the average treatment effect in the treated as a function of baseline covariates,
where there is a valid (conditional) instrument.
We describe two doubly-robust (DR) estimators: a g-estimator and a targeted minimum loss-based estimator
(TMLE). These estimators can be viewed as generalisations of the two-stage least squares (TSLS) method to semiparametric
models that make weaker assumptions. We exploit recent theoretical results and use data-adaptive estimation
of the nuisance parameters for the g-estimator.
A simulation study is used to compare standard TSLS with the two DR estimators’ finite-sample performance
when using (1) parametric or (2) data-adaptive estimation of the nuisance parameters.
Data-adaptive DR estimators have lower bias and improved coverage, when compared to incorrectly specified
parametric DR estimators and TSLS. When the parametric model for the treatment effect curve is correctly specified,
the g-estimator outperforms all others, but when this model is misspecified, TMLE performs best, while TSLS can
result in large biases and zero coverage.
The methods are also applied to the COPERS (COping with persistent Pain, Effectiveness Research in Selfmanagement)
trial to make inferences about the causal effect of treatment actually received, and the extent to which
this is modified by depression at baseline.We consider the estimation of the average treatment effect in the treated as a function of baseline covariates,
where there is a valid (conditional) instrument.
We describe two doubly-robust (DR) estimators: a g-estimator and a targeted minimum loss-based estimator
(TMLE). These estimators can be viewed as generalisations of the two-stage least squares (TSLS) method to semiparametric
models that make weaker assumptions. We exploit recent theoretical results and use data-adaptive estimation
of the nuisance parameters for the g-estimator.
A simulation study is used to compare standard TSLS with the two DR estimators’ finite-sample performance
when using (1) parametric or (2) data-adaptive estimation of the nuisance parameters.
Data-adaptive DR estimators have lower bias and improved coverage, when compared to incorrectly specified
parametric DR estimators and TSLS. When the parametric model for the treatment effect curve is correctly specified,
the g-estimator outperforms all others, but when this model is misspecified, TMLE performs best, while TSLS can
result in large biases and zero coverage.
The methods are also applied to the COPERS (COping with persistent Pain, Effectiveness Research in Selfmanagement)
trial to make inferences about the causal effect of treatment actually received, and the extent to which
this is modified by depression at baseline
A Machine-Learning Approach for Estimating Subgroup- and Individual-Level Treatment Effects: An Illustration Using the 65 Trial.
HIGHLIGHTS: This article examines a causal machine-learning approach, causal forests (CF), for exploring the heterogeneity of treatment effects, without prespecifying a specific functional form.The CF approach is considered in the reanalysis of the 65 Trial and was found to provide similar estimates of subgroup effects to using a fixed parametric model.The CF approach also provides estimates of individual-level treatment effects that suggest that for most patients in the 65 Trial, the intervention is expected to reduce 90-d mortality but with wide levels of statistical uncertainty.The study illustrates how individual-level treatment effect estimates can be analyzed to generate hypotheses for further research about those patients who are likely to benefit most from an intervention
Dynamic updating of clinical survival prediction models in a changing environment
BackgroundOver time, the performance of clinical prediction models may deteriorate due to changes in clinical management, data quality, disease risk and/or patient mix. Such prediction models must be updated in order to remain useful. In this study, we investigate dynamic model updating of clinical survival prediction models. In contrast to discrete or one-time updating, dynamic updating refers to a repeated process for updating a prediction model with new data. We aim to extend previous research which focused largely on binary outcome prediction models by concentrating on time-to-event outcomes. We were motivated by the rapidly changing environment seen during the COVID-19 pandemic where mortality rates changed over time and new treatments and vaccines were introduced. MethodsWe illustrate three methods for dynamic model updating: Bayesian dynamic updating, recalibration, and full refitting. We use a simulation study to compare performance in a range of scenarios including changing mortality rates, predictors with low prevalence and the introduction of a new treatment. Next, the updating strategies were applied to a model for predicting 70-day COVID-19-related mortality using patient data from QResearch, an electronic health records database from general practices in the UK. ResultsIn simulated scenarios with mortality rates changing over time, all updating methods resulted in better calibration than not updating. Moreover, dynamic updating outperformed ad hoc updating. In the simulation scenario with a new predictor and a small updating dataset, Bayesian updating improved the C-index over not updating and refitting. In the motivating example with a rare outcome, no single updating method offered the best performance. ConclusionsWe found that a dynamic updating process outperformed one-time discrete updating in the simulations. Bayesian updating offered good performance overall, even in scenarios with new predictors and few events. Intercept recalibration was effective in scenarios with smaller sample size and changing baseline hazard. Refitting performance depended on sample size and produced abrupt changes in hazard ratio estimates between periods
Multilevel models for cost-effectiveness analyses that use cluster randomised trial data: An approach to model choice.
Multilevel models provide a flexible modelling framework for cost-effectiveness analyses that use cluster randomised trial data. However, there is a lack of guidance on how to choose the most appropriate multilevel models. This paper illustrates an approach for deciding what level of model complexity is warranted; in particular how best to accommodate complex variance-covariance structures, right-skewed costs and missing data. Our proposed models differ according to whether or not they allow individual-level variances and correlations to differ across treatment arms or clusters and by the assumed cost distribution (Normal, Gamma, Inverse Gaussian). The models are fitted by Markov chain Monte Carlo methods. Our approach to model choice is based on four main criteria: the characteristics of the data, model pre-specification informed by the previous literature, diagnostic plots and assessment of model appropriateness. This is illustrated by re-analysing a previous cost-effectiveness analysis that uses data from a cluster randomised trial. We find that the most useful criterion for model choice was the deviance information criterion, which distinguishes amongst models with alternative variance-covariance structures, as well as between those with different cost distributions. This strategy for model choice can help cost-effectiveness analyses provide reliable inferences for policy-making when using cluster trials, including those with missing data
Causal graphs for the analysis of genetic cohort data.
The increasing availability of genetic cohort data has led to many genome-wide association studies (GWAS) successfully identifying genetic associations with an ever-expanding list of phenotypic traits. Association, however, does not imply causation, and therefore methods have been developed to study the issue of causality. Under additional assumptions, Mendelian randomization (MR) studies have proved popular in identifying causal effects between two phenotypes, often using GWAS summary statistics. Given the widespread use of these methods, it is more important than ever to understand, and communicate, the causal assumptions upon which they are based, so that methods are transparent, and findings are clinically relevant. Causal graphs can be used to represent causal assumptions graphically and provide insights into the limitations associated with different analysis methods. Here we review GWAS and MR from a causal perspective, to build up intuition for causal diagrams in genetic problems. We also examine issues of confounding by ancestry and comment on approaches for dealing with such confounding, as well as discussing approaches for dealing with selection biases arising from study design
Increased mortality in community-tested cases of SARS-CoV-2 lineage B.1.1.7.
SARS-CoV-2 lineage B.1.1.7, a variant that was first detected in the UK in September 20201, has spread to multiple countries worldwide. Several studies have established that B.1.1.7 is more transmissible than pre-existing variants, but have not identified whether it leads to any change in disease severity2. Here we analyse a dataset that links 2,245,263 positive SARS-CoV-2 community tests and 17,452 deaths associated with COVID-19 in England from 1 November 2020 to 14 February 2021. For 1,146,534 (51%) of these tests, the presence or absence of B.1.1.7 can be identified because mutations in this lineage prevent PCR amplification of the spike (S) gene target (known as S gene target failure (SGTF)1). On the basis of 4,945 deaths with known SGTF status, we estimate that the hazard of death associated with SGTF is 55% (95% confidence interval, 39-72%) higher than in cases without SGTF after adjustment for age, sex, ethnicity, deprivation, residence in a care home, the local authority of residence and test date. This corresponds to the absolute risk of death for a 55-69-year-old man increasing from 0.6% to 0.9% (95% confidence interval, 0.8-1.0%) within 28 days of a positive test in the community. Correcting for misclassification of SGTF and missingness in SGTF status, we estimate that the hazard of death associated with B.1.1.7 is 61% (42-82%) higher than with pre-existing variants. Our analysis suggests that B.1.1.7 is not only more transmissible than pre-existing SARS-CoV-2 variants, but may also cause more severe illness
Increased hazard of death in community-tested cases of SARS-CoV-2 Variant of Concern 202012/01.
VOC 202012/01, a SARS-CoV-2 variant first detected in the United Kingdom in September 2020, has spread to multiple countries worldwide. Several studies have established that this novel variant is more transmissible than preexisting variants of SARS-CoV-2, but have not identified whether the new variant leads to any change in disease severity. We analyse a large database of SARS-CoV-2 community test results and COVID-19 deaths for England, representing approximately 47% of all SARS-CoV-2 community tests and 7% of COVID-19 deaths in England from 1 September 2020 to 22 January 2021. Fortuitously, these SARS-CoV-2 tests can identify VOC 202012/01 because mutations in this lineage prevent PCR amplification of the spike gene target (S gene target failure, SGTF). We estimate that the hazard of death among SGTF cases is 30% (95% CI 9-56%) higher than among non-SGTF cases after adjustment for age, sex, ethnicity, deprivation level, care home residence, local authority of residence and date of test. In absolute terms, this increased hazard of death corresponds to the risk of death for a male aged 55-69 increasing from 0.56% to 0.73% (95% CI 0.60-0.86%) over the 28 days following a positive SARS-CoV-2 test in the community. Correcting for misclassification of SGTF, we estimate a 35% (12-64%) higher hazard of death associated with VOC 202012/01. Our analysis suggests that VOC 202012/01 is not only more transmissible than preexisting SARS-CoV-2 variants but may also cause more severe illness