4,283 research outputs found

    Adjusting for Confounding by Neighborhood Using a Proportional Odds Model and Complex Survey Data

    Full text link
    In social epidemiology, an individual\u27s neighborhood is considered to be an important determinant of health behaviors, mediators, and outcomes. Consequently, when investigating health disparities, researchers may wish to adjust for confounding by unmeasured neighborhood factors, such as local availability of health facilities or cultural predispositions. With a simple random sample and a binary outcome, a conditional logistic regression analysis that treats individuals within a neighborhood as a matched set is a natural method to use. The authors present a generalization of this method for ordinal outcomes and complex sampling designs. The method is based on a proportional odds model and is very simple to program using standard software such as SAS PROC SURVEYLOGISTIC (SAS Institute Inc., Cary, North Carolina). The authors applied the method to analyze racial/ethnic differences in dental preventative care, using 2008 Florida Behavioral Risk Factor Surveillance System survey data. The ordinal outcome represented time since last dental cleaning, and the authors adjusted for individual-level confounding by gender, age, education, and health insurance coverage. The authors compared results with and without additional adjustment for confounding by neighborhood, operationalized as zip code. The authors found that adjustment for confounding by neighborhood greatly affected the results in this example

    A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure

    Full text link
    We often seek to estimate the impact of an exposure naturally occurring or randomly assigned at the cluster-level. For example, the literature on neighborhood determinants of health continues to grow. Likewise, community randomized trials are applied to learn about real-world implementation, sustainability, and population effects of interventions with proven individual-level efficacy. In these settings, individual-level outcomes are correlated due to shared cluster-level factors, including the exposure, as well as social or biological interactions between individuals. To flexibly and efficiently estimate the effect of a cluster-level exposure, we present two targeted maximum likelihood estimators (TMLEs). The first TMLE is developed under a non-parametric causal model, which allows for arbitrary interactions between individuals within a cluster. These interactions include direct transmission of the outcome (i.e. contagion) and influence of one individual's covariates on another's outcome (i.e. covariate interference). The second TMLE is developed under a causal sub-model assuming the cluster-level and individual-specific covariates are sufficient to control for confounding. Simulations compare the alternative estimators and illustrate the potential gains from pairing individual-level risk factors and outcomes during estimation, while avoiding unwarranted assumptions. Our results suggest that estimation under the sub-model can result in bias and misleading inference in an observational setting. Incorporating working assumptions during estimation is more robust than assuming they hold in the underlying causal model. We illustrate our approach with an application to HIV prevention and treatment

    Bayesian nonparametric models for spatially indexed data of mixed type

    Get PDF
    We develop Bayesian nonparametric models for spatially indexed data of mixed type. Our work is motivated by challenges that occur in environmental epidemiology, where the usual presence of several confounding variables that exhibit complex interactions and high correlations makes it difficult to estimate and understand the effects of risk factors on health outcomes of interest. The modeling approach we adopt assumes that responses and confounding variables are manifestations of continuous latent variables, and uses multivariate Gaussians to jointly model these. Responses and confounding variables are not treated equally as relevant parameters of the distributions of the responses only are modeled in terms of explanatory variables or risk factors. Spatial dependence is introduced by allowing the weights of the nonparametric process priors to be location specific, obtained as probit transformations of Gaussian Markov random fields. Confounding variables and spatial configuration have a similar role in the model, in that they only influence, along with the responses, the allocation probabilities of the areas into the mixture components, thereby allowing for flexible adjustment of the effects of observed confounders, while allowing for the possibility of residual spatial structure, possibly occurring due to unmeasured or undiscovered spatially varying factors. Aspects of the model are illustrated in simulation studies and an application to a real data set

    A Primer on Causality in Data Science

    Get PDF
    Many questions in Data Science are fundamentally causal in that our objective is to learn the effect of some exposure, randomized or not, on an outcome interest. Even studies that are seemingly non-causal, such as those with the goal of prediction or prevalence estimation, have causal elements, including differential censoring or measurement. As a result, we, as Data Scientists, need to consider the underlying causal mechanisms that gave rise to the data, rather than simply the pattern or association observed in those data. In this work, we review the 'Causal Roadmap' of Petersen and van der Laan (2014) to provide an introduction to some key concepts in causal inference. Similar to other causal frameworks, the steps of the Roadmap include clearly stating the scientific question, defining of the causal model, translating the scientific question into a causal parameter, assessing the assumptions needed to express the causal parameter as a statistical estimand, implementation of statistical estimators including parametric and semi-parametric methods, and interpretation of our findings. We believe that using such a framework in Data Science will help to ensure that our statistical analyses are guided by the scientific question driving our research, while avoiding over-interpreting our results. We focus on the effect of an exposure occurring at a single time point and highlight the use of targeted maximum likelihood estimation (TMLE) with Super Learner.Comment: 26 pages (with references); 4 figure

    Evaluating Risks from Antibacterial Medication Therapy

    Get PDF
    ABSTRACT EVALUATING RISKS FROM ANTIBACTERIAL MEDICATION THERAPY USING AN OBSERVATIONAL PRIMARY CARE DATABASE Sharon B. Meropol Joshua P. Metlay Virtually everyone in the U.S. is exposed to antibacterial drugs at some point in their lives. It is important to understand the benefits and risks related to these medications with nearly universal public exposure. Most information on antibacterial drug-associated adverse events comes from spontaneous reports. Without an unexposed control group, it is impossible to know the real risks for treated vs. untreated patients. We used an electronic medical record database to select a cohort of office visits for non-bacterial acute respiratory tract infections (excluding patients with pneumonia, sinusitis, or acute exacerbations of chronic bronchitis), and compared outcomes of antibacterial drug-exposed vs. -unexposed patients. By limiting our assessment to visits with acute nonspecific respiratory infections, we promoted comparability between exposed and unexposed patients. To further control for confounding by indication and practice, we explored methods to promote further comparability between exposure groups. Our rare outcome presented an additional analytic challenge. Antibacterial drug prescribing for acute nonspecific respiratory infections decreased over the study period, but, in contrast to the U.S., broad spectrum antibacterial prescribing remained low. Conditional fixed effects linear regression provided stable estimates of exposure effects on rare outcomes; results were similar to those using more traditional methods for binary outcomes. Patients with acute nonspecific respiratory infections treated with antibacterial drugs were not at increased risk of severe adverse events compared to untreated patients. Patients with acute nonspecific respiratory infections exposed to antibacterials had a small decreased risk of pneumonia hospitalizations vs. unexposed patients. This very small measurable benefit of antibacterial drug therapy for acute nonspecific respiratory infections at the patient level must be weighed against the public health risk of emerging antibacterial resistance. Our data provide valuable point estimates of risks and benefits that can be used to inform future decision analysis and guideline recommendations for patients with acute nonspecific respiratory infections. Ultimately, improved point-of-care diagnostic testing may help direct antibacterial drugs to the subset of patients most likely to derive benefit

    Comparing the estimates of effect obtained from statistical causal inference methods: An example using bovine respiratory disease in feedlot cattle

    Get PDF
    The causal effect of an exposure on an outcome of interest in an observational study cannot be estimated directly if the confounding variables are not controlled. Many approaches are available for estimating the causal effect of an exposure. In this manuscript, we demonstrate the advantages associated with using inverse probability weighting (IPW) and doubly robust estimation of the odds ratio in terms of reduced bias. IPW approach can be used to adjust for confounding variables and provide unbiased estimates of the exposure’s causal effect. For cluster-structured data, as is common in animal populations, inverse conditional probability weighting (ICPW) approach can provide a robust estimation of the causal effect. Doubly robust estimation can provide a robust method even when the specification of the model form is uncertain. In this paper, the usage of IPW, ICPW, and doubly robust approaches are illustrated with a subset of data with complete covariates from the Australian-based National Bovine Respiratory Disease Initiative as well as simulated data. We evaluate the causal effect of prior bovine viral diarrhea exposure on bovine respiratory disease in feedlot cattle. The results show that the IPW, ICPW and doubly robust approaches would provide a more accurate estimation of the exposure effect than the traditional outcome regression model, and doubly robust approaches are the most preferable overall
    corecore