98 research outputs found
The Metropolis algorithm: A useful tool for epidemiologists
The Metropolis algorithm is a Markov chain Monte Carlo (MCMC) algorithm used
to simulate from parameter distributions of interest, such as generalized
linear model parameters. The "Metropolis step" is a keystone concept that
underlies classical and modern MCMC methods and facilitates simple analysis of
complex statistical models. Beyond Bayesian analysis, MCMC is useful for
generating uncertainty intervals, even under the common scenario in causal
inference in which the target parameter is not directly estimated by a single,
fitted statistical model. We demonstrate, with a worked example, pseudo-code,
and R code, the basic mechanics of the Metropolis algorithm. We use the
Metropolis algorithm to estimate the odds ratio and risk difference contrasting
the risk of childhood leukemia among those exposed to high versus low level
magnetic fields. This approach can be used for inference from Bayesian and
frequentist paradigms and, in small samples, offers advantages over
large-sample methods like the bootstrap.Comment: 26 pages, 3 figure
All your data are always missing: incorporating bias due to measurement error into the potential outcomes framework
Epidemiologists often use the potential outcomes framework to cast causal inference as a missing data problem. Here, we demonstrate how bias due to measurement error can be described in terms of potential outcomes and considered in concert with bias from other sources. In addition, we illustrate how acknowledging the uncertainty that arises due to measurement error increases the amount of missing information in causal inference. We use a simple example to show that estimating the average treatment effect requires the investigator to perform a series of hidden imputations based on strong assumptions
Parametric assumptions equate to hidden observations: comparing the efficiency of nonparametric and parametric models for estimating time to AIDS or death in a cohort of HIV-positive women
Abstract
Background
When conducting a survival analysis, researchers might consider two broad classes of models: nonparametric models and parametric models. While nonparametric models are more flexible because they make few assumptions regarding the shape of the data distribution, parametric models are more efficient. Here we sought to make concrete the difference in efficiency between these two model types using effective sample size.
Methods
We compared cumulative risk of AIDS or death estimated using four survival models – nonparametric, generalized gamma, Weibull, and exponential – and data from 1164 HIV patients who were alive and AIDS-free in 1995. We added pseudo-observations to the sample until the spread of the 95% confidence limits for the nonparametric model became less than that for the parametric models.
Results
We found the 3-parameter generalized gamma to be a good fit to the nonparametric risk curve, but the 1-parameter exponential both underestimated and overestimated the risk at different times. Using two year-risk as an example, we had to add 354, 593, and 3960 observations for the nonparametric model to be as efficient as the generalized gamma, Weibull, and exponential models, respectively.
Conclusions
These added observations represent the hidden observations underlying the efficiency gained through parametric model form assumptions. If the model is correctly specified, the efficiency gain may be justified, as appeared to be the case for the generalized gamma model. Otherwise, precision will be improved, but at the cost of specification bias, as was the case for the exponential model
Remdesivir and COVID-19
The Panel on Antiretroviral Guidelines
for Adults and Adolescents with HIV
and the American Association for the
Study of Liver Diseases guidelines
for hepatitis C virus treatment
suggest that combination therapy for
severe acute respiratory syndrome
coronavirus 2 infection will outperform
single drugs
Transportability without positivity: a synthesis of statistical and simulation modeling
When estimating an effect of an action with a randomized or observational
study, that study is often not a random sample of the desired target
population. Instead, estimates from that study can be transported to the target
population. However, transportability methods generally rely on a positivity
assumption, such that all relevant covariate patterns in the target population
are also observed in the study sample. Strict eligibility criteria,
particularly in the context of randomized trials, may lead to violations of
this assumption. Two common approaches to address positivity violations are
restricting the target population and restricting the relevant covariate set.
As neither of these restrictions are ideal, we instead propose a synthesis of
statistical and simulation models to address positivity violations. We propose
corresponding g-computation and inverse probability weighting estimators. The
restriction and synthesis approaches to addressing positivity violations are
contrasted with a simulation experiment and an illustrative example in the
context of sexually transmitted infection testing uptake. In both cases, the
proposed synthesis approach accurately addressed the original research question
when paired with a thoughtfully selected simulation model. Neither of the
restriction approaches were able to accurately address the motivating question.
As public health decisions must often be made with imperfect target population
information, model synthesis is a viable approach given a combination of
empirical data and external information based on the best available knowledge
An Illustration of Inverse Probability Weighting to Estimate Policy-Relevant Causal Effects
Traditional epidemiologic approaches allow us to compare counterfactual outcomes under 2 exposure distributions, usually 100% exposed and 100% unexposed. However, to estimate the population health effect of a proposed intervention, one may wish to compare factual outcomes under the observed exposure distribution to counterfactual outcomes under the exposure distribution produced by an intervention. Here, we used inverse probability weights to compare the 5-year mortality risk under observed antiretroviral therapy treatment plans to the 5-year mortality risk that would had been observed under an intervention in which all patients initiated therapy immediately upon entry into care among patients positive for human immunodeficiency virus in the US Centers for AIDS Research Network of Integrated Clinical Systems multisite cohort study between 1998 and 2013. Therapy-naïve patients (n = 14,700) were followed from entry into care until death, loss to follow-up, or censoring at 5 years or on December 31, 2013. The 5-year cumulative incidence of mortality was 11.65% under observed treatment plans and 10.10% under the intervention, yielding a risk difference of −1.57% (95% confidence interval: −3.08, −0.06). Comparing outcomes under the intervention with outcomes under observed treatment plans provides meaningful information about the potential consequences of new US guidelines to treat all patients with human immunodeficiency virus regardless of CD4 cell count under actual clinical conditions
Occupational Radon Exposure and Lung Cancer Mortality: Estimating Intervention Effects Using the Parametric g-Formula
Traditional regression analysis techniques used to estimate associations between occupational radon exposure and lung cancer focus on estimating the effect of cumulative radon exposure on lung cancer, while public health interventions are typically based on regulating radon concentration rather than workers’ cumulative exposure. Moreover, estimating the direct effect of cumulative occupational exposure on lung cancer may be difficult in situations vulnerable to the healthy worker survivor bias
Accounting for Misclassified Outcomes in Binary Regression Models Using Multiple Imputation With Internal Validation Data
Outcome misclassification is widespread in epidemiology, but methods to account for it are rarely used. We describe the use of multiple imputation to reduce bias when validation data are available for a subgroup of study participants. This approach is illustrated using data from 308 participants in the multicenter Herpetic Eye Disease Study between 1992 and 1998 (48% female; 85% white; median age, 49 years). The odds ratio comparing the acyclovir group with the placebo group on the gold-standard outcome (physician-diagnosed herpes simplex virus recurrence) was 0.62 (95% confidence interval (CI): 0.35, 1.09). We masked ourselves to physician diagnosis except for a 30% validation subgroup used to compare methods. Multiple imputation (odds ratio (OR) = 0.60; 95% CI: 0.24, 1.51) was compared with naive analysis using self-reported outcomes (OR = 0.90; 95% CI: 0.47, 1.73), analysis restricted to the validation subgroup (OR = 0.57; 95% CI: 0.20, 1.59), and direct maximum likelihood (OR = 0.62; 95% CI: 0.26, 1.53). In simulations, multiple imputation and direct maximum likelihood had greater statistical power than did analysis restricted to the validation subgroup, yet all 3 provided unbiased estimates of the odds ratio. The multiple-imputation approach was extended to estimate risk ratios using log-binomial regression. Multiple imputation has advantages regarding flexibility and ease of implementation for epidemiologists familiar with missing data methods
- …