46 research outputs found

    Robust Inference for Mediated Effects in Partially Linear Models

    Full text link
    We consider mediated effects of an exposure, X on an outcome, Y, via a mediator, M, under no unmeasured confounding assumptions in the setting where models for the conditional expectation of the mediator and outcome are partially linear. We propose G-estimators for the direct and indirect effect and demonstrate consistent asymptotic normality for indirect effects when models for the conditional means of M, or X and Y are correctly specified, and for direct effects, when models for the conditional means of Y, or X and M are correct. This marks an improvement, in this particular setting, over previous `triple' robust methods, which do not assume partially linear mean models. Testing of the no-mediation hypothesis is inherently problematic due to the composite nature of the test (either X has no effect on M or M no effect on Y), leading to low power when both effect sizes are small. We use Generalized Methods of Moments (GMM) results to construct a new score testing framework, which includes as special cases the no-mediation and the no-direct-effect hypotheses. The proposed tests rely on an orthogonal estimation strategy for estimating nuisance parameters. Simulations show that the GMM based tests perform better in terms of power and small sample performance compared with traditional tests in the partially linear setting, with drastic improvement under model misspecification. New methods are illustrated in a mediation analysis of data from the COPERS trial, a randomized trial investigating the effect of a non-pharmacological intervention of patients suffering from chronic pain. An accompanying R package implementing these methods can be found at github.com/ohines/plmed

    Data-adaptive doubly robust instrumental variable methods for treatment effect heterogeneity

    Get PDF
    We consider the estimation of the average treatment effect in the treated as a function of baseline covariates, where there is a valid (conditional) instrument. We describe two doubly-robust (DR) estimators: a g-estimator and a targeted minimum loss-based estimator (TMLE). These estimators can be viewed as generalisations of the two-stage least squares (TSLS) method to semiparametric models that make weaker assumptions. We exploit recent theoretical results and use data-adaptive estimation of the nuisance parameters for the g-estimator. A simulation study is used to compare standard TSLS with the two DR estimators’ finite-sample performance when using (1) parametric or (2) data-adaptive estimation of the nuisance parameters. Data-adaptive DR estimators have lower bias and improved coverage, when compared to incorrectly specified parametric DR estimators and TSLS. When the parametric model for the treatment effect curve is correctly specified, the g-estimator outperforms all others, but when this model is misspecified, TMLE performs best, while TSLS can result in large biases and zero coverage. The methods are also applied to the COPERS (COping with persistent Pain, Effectiveness Research in Selfmanagement) trial to make inferences about the causal effect of treatment actually received, and the extent to which this is modified by depression at baseline.We consider the estimation of the average treatment effect in the treated as a function of baseline covariates, where there is a valid (conditional) instrument. We describe two doubly-robust (DR) estimators: a g-estimator and a targeted minimum loss-based estimator (TMLE). These estimators can be viewed as generalisations of the two-stage least squares (TSLS) method to semiparametric models that make weaker assumptions. We exploit recent theoretical results and use data-adaptive estimation of the nuisance parameters for the g-estimator. A simulation study is used to compare standard TSLS with the two DR estimators’ finite-sample performance when using (1) parametric or (2) data-adaptive estimation of the nuisance parameters. Data-adaptive DR estimators have lower bias and improved coverage, when compared to incorrectly specified parametric DR estimators and TSLS. When the parametric model for the treatment effect curve is correctly specified, the g-estimator outperforms all others, but when this model is misspecified, TMLE performs best, while TSLS can result in large biases and zero coverage. The methods are also applied to the COPERS (COping with persistent Pain, Effectiveness Research in Selfmanagement) trial to make inferences about the causal effect of treatment actually received, and the extent to which this is modified by depression at baseline

    A Machine-Learning Approach for Estimating Subgroup- and Individual-Level Treatment Effects: An Illustration Using the 65 Trial.

    Get PDF
    HIGHLIGHTS: This article examines a causal machine-learning approach, causal forests (CF), for exploring the heterogeneity of treatment effects, without prespecifying a specific functional form.The CF approach is considered in the reanalysis of the 65 Trial and was found to provide similar estimates of subgroup effects to using a fixed parametric model.The CF approach also provides estimates of individual-level treatment effects that suggest that for most patients in the 65 Trial, the intervention is expected to reduce 90-d mortality but with wide levels of statistical uncertainty.The study illustrates how individual-level treatment effect estimates can be analyzed to generate hypotheses for further research about those patients who are likely to benefit most from an intervention

    Dynamic updating of clinical survival prediction models in a changing environment

    Get PDF
    BackgroundOver time, the performance of clinical prediction models may deteriorate due to changes in clinical management, data quality, disease risk and/or patient mix. Such prediction models must be updated in order to remain useful. In this study, we investigate dynamic model updating of clinical survival prediction models. In contrast to discrete or one-time updating, dynamic updating refers to a repeated process for updating a prediction model with new data. We aim to extend previous research which focused largely on binary outcome prediction models by concentrating on time-to-event outcomes. We were motivated by the rapidly changing environment seen during the COVID-19 pandemic where mortality rates changed over time and new treatments and vaccines were introduced. MethodsWe illustrate three methods for dynamic model updating: Bayesian dynamic updating, recalibration, and full refitting. We use a simulation study to compare performance in a range of scenarios including changing mortality rates, predictors with low prevalence and the introduction of a new treatment. Next, the updating strategies were applied to a model for predicting 70-day COVID-19-related mortality using patient data from QResearch, an electronic health records database from general practices in the UK. ResultsIn simulated scenarios with mortality rates changing over time, all updating methods resulted in better calibration than not updating. Moreover, dynamic updating outperformed ad hoc updating. In the simulation scenario with a new predictor and a small updating dataset, Bayesian updating improved the C-index over not updating and refitting. In the motivating example with a rare outcome, no single updating method offered the best performance. ConclusionsWe found that a dynamic updating process outperformed one-time discrete updating in the simulations. Bayesian updating offered good performance overall, even in scenarios with new predictors and few events. Intercept recalibration was effective in scenarios with smaller sample size and changing baseline hazard. Refitting performance depended on sample size and produced abrupt changes in hazard ratio estimates between periods

    Multilevel models for cost-effectiveness analyses that use cluster randomised trial data: An approach to model choice.

    Get PDF
    Multilevel models provide a flexible modelling framework for cost-effectiveness analyses that use cluster randomised trial data. However, there is a lack of guidance on how to choose the most appropriate multilevel models. This paper illustrates an approach for deciding what level of model complexity is warranted; in particular how best to accommodate complex variance-covariance structures, right-skewed costs and missing data. Our proposed models differ according to whether or not they allow individual-level variances and correlations to differ across treatment arms or clusters and by the assumed cost distribution (Normal, Gamma, Inverse Gaussian). The models are fitted by Markov chain Monte Carlo methods. Our approach to model choice is based on four main criteria: the characteristics of the data, model pre-specification informed by the previous literature, diagnostic plots and assessment of model appropriateness. This is illustrated by re-analysing a previous cost-effectiveness analysis that uses data from a cluster randomised trial. We find that the most useful criterion for model choice was the deviance information criterion, which distinguishes amongst models with alternative variance-covariance structures, as well as between those with different cost distributions. This strategy for model choice can help cost-effectiveness analyses provide reliable inferences for policy-making when using cluster trials, including those with missing data

    Causal graphs for the analysis of genetic cohort data.

    Get PDF
    The increasing availability of genetic cohort data has led to many genome-wide association studies (GWAS) successfully identifying genetic associations with an ever-expanding list of phenotypic traits. Association, however, does not imply causation, and therefore methods have been developed to study the issue of causality. Under additional assumptions, Mendelian randomization (MR) studies have proved popular in identifying causal effects between two phenotypes, often using GWAS summary statistics. Given the widespread use of these methods, it is more important than ever to understand, and communicate, the causal assumptions upon which they are based, so that methods are transparent, and findings are clinically relevant. Causal graphs can be used to represent causal assumptions graphically and provide insights into the limitations associated with different analysis methods. Here we review GWAS and MR from a causal perspective, to build up intuition for causal diagrams in genetic problems. We also examine issues of confounding by ancestry and comment on approaches for dealing with such confounding, as well as discussing approaches for dealing with selection biases arising from study design

    Increased mortality in community-tested cases of SARS-CoV-2 lineage B.1.1.7.

    Get PDF
    SARS-CoV-2 lineage B.1.1.7, a variant that was first detected in the UK in September 20201, has spread to multiple countries worldwide. Several studies have established that B.1.1.7 is more transmissible than pre-existing variants, but have not identified whether it leads to any change in disease severity2. Here we analyse a dataset that links 2,245,263 positive SARS-CoV-2 community tests and 17,452 deaths associated with COVID-19 in England from 1 November 2020 to 14 February 2021. For 1,146,534 (51%) of these tests, the presence or absence of B.1.1.7 can be identified because mutations in this lineage prevent PCR amplification of the spike (S) gene target (known as S gene target failure (SGTF)1). On the basis of 4,945 deaths with known SGTF status, we estimate that the hazard of death associated with SGTF is 55% (95% confidence interval, 39-72%) higher than in cases without SGTF after adjustment for age, sex, ethnicity, deprivation, residence in a care home, the local authority of residence and test date. This corresponds to the absolute risk of death for a 55-69-year-old man increasing from 0.6% to 0.9% (95% confidence interval, 0.8-1.0%) within 28 days of a positive test in the community. Correcting for misclassification of SGTF and missingness in SGTF status, we estimate that the hazard of death associated with B.1.1.7 is 61% (42-82%) higher than with pre-existing variants. Our analysis suggests that B.1.1.7 is not only more transmissible than pre-existing SARS-CoV-2 variants, but may also cause more severe illness

    Increased hazard of death in community-tested cases of SARS-CoV-2 Variant of Concern 202012/01.

    Get PDF
    VOC 202012/01, a SARS-CoV-2 variant first detected in the United Kingdom in September 2020, has spread to multiple countries worldwide. Several studies have established that this novel variant is more transmissible than preexisting variants of SARS-CoV-2, but have not identified whether the new variant leads to any change in disease severity. We analyse a large database of SARS-CoV-2 community test results and COVID-19 deaths for England, representing approximately 47% of all SARS-CoV-2 community tests and 7% of COVID-19 deaths in England from 1 September 2020 to 22 January 2021. Fortuitously, these SARS-CoV-2 tests can identify VOC 202012/01 because mutations in this lineage prevent PCR amplification of the spike gene target (S gene target failure, SGTF). We estimate that the hazard of death among SGTF cases is 30% (95% CI 9-56%) higher than among non-SGTF cases after adjustment for age, sex, ethnicity, deprivation level, care home residence, local authority of residence and date of test. In absolute terms, this increased hazard of death corresponds to the risk of death for a male aged 55-69 increasing from 0.56% to 0.73% (95% CI 0.60-0.86%) over the 28 days following a positive SARS-CoV-2 test in the community. Correcting for misclassification of SGTF, we estimate a 35% (12-64%) higher hazard of death associated with VOC 202012/01. Our analysis suggests that VOC 202012/01 is not only more transmissible than preexisting SARS-CoV-2 variants but may also cause more severe illness
    corecore