25 research outputs found

    A practical illustration of the importance of realistic individualized treatment rules in causal inference

    Full text link
    The effect of vigorous physical activity on mortality in the elderly is difficult to estimate using conventional approaches to causal inference that define this effect by comparing the mortality risks corresponding to hypothetical scenarios in which all subjects in the target population engage in a given level of vigorous physical activity. A causal effect defined on the basis of such a static treatment intervention can only be identified from observed data if all subjects in the target population have a positive probability of selecting each of the candidate treatment options, an assumption that is highly unrealistic in this case since subjects with serious health problems will not be able to engage in higher levels of vigorous physical activity. This problem can be addressed by focusing instead on causal effects that are defined on the basis of realistic individualized treatment rules and intention-to-treat rules that explicitly take into account the set of treatment options that are available to each subject. We present a data analysis to illustrate that estimators of static causal effects in fact tend to overestimate the beneficial impact of high levels of vigorous physical activity while corresponding estimators based on realistic individualized treatment rules and intention-to-treat rules can yield unbiased estimates. We emphasize that the problems encountered in estimating static causal effects are not restricted to the IPTW estimator, but are also observed with the GG-computation estimator, the DR-IPTW estimator, and the targeted MLE. Our analyses based on realistic individualized treatment rules and intention-to-treat rules suggest that high levels of vigorous physical activity may confer reductions in mortality risk on the order of 15-30%, although in most cases the evidence for such an effect does not quite reach the 0.05 level of significance.Comment: Published in at http://dx.doi.org/10.1214/07-EJS105 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Estimating the Effect of Vigorous Physical Activity on Mortality in the Elderly Based on Realistic Individualized Treatment and Intention-to-Treat Rules

    Get PDF
    The effect of vigorous physical activity on mortality in the elderly is difficult to estimate using conventional approaches to causal inference that define this effect by comparing the mortality risks corresponding to hypothetical scenarios in which all subjects in the target population engage in a given level of vigorous physical activity. A causal effect defined on the basis of such a static treatment intervention can only be identified from observed data if all subjects in the target population have a positive probability of selecting each of the candidate treatment options, an assumption that is highly unrealistic in this case since subjects with serious health problems will not be able to engage in higher levels of vigorous physical activity. This problem can be addressed by focusing instead on causal effects that are defined on the basis of realistic individualized treatment rules and intention-to-treat rules that explicitly take into account the set of treatment options that are available to each subject. We present a data analysis to illustrate that estimators of static causal effects in fact tend to overestimate the beneficial impact of high levels of vigorous physical activity while corresponding estimators based on realistic individualized treatment rules and intention-to-treat rules can yield unbiased estimates. We emphasize that the problems encountered in estimating static causal effects are not restricted to the IPTW estimator, but are also observed with the G-computation estimator, the DR-IPTW estimator, and the targeted MLE. Our analyses based on realistic individualized treatment rules and intention-to-treat rules suggest that high levels of vigorous physical activity may confer reductions in mortality risk on the order of 15-30%, although in most cases the evidence for such an effect does not quite reach the 0.05 level of significance

    Analyzing Sequentially Randomized Trials Based on Causal Effect Models for Realistic Individualized Treatment Rules

    Get PDF
    In this paper, we argue that causal effect models for realistic individualized treatment rules represent an attractive tool for analyzing sequentially randomized trials. Unlike a number of methods proposed previously, this approach does not rely on the assumption that intermediate outcomes are discrete or that models for the distributions of these intermediate outcomes given the observed past are correctly specified. In addition, it generalizes the methodology for performing pairwise comparisons between individualized treatment rules by allowing the user to posit a marginal structural model for all candidate treatment rules simultaneously. If only a small number of candidate treatment rules are under consideration, a non-parametric marginal structural can be used to conveniently carry out all of the pairwise comparisons of interest in a single step. An appropriately chosen marginal structural model becomes particularly useful, however, as the number of candidate treatment rules increases, in which case an approach based on individual pairwise comparisons would be likely to suffer from too much sampling variability to provide an informative answer. In addition, such causal effect models represent an interesting alternative to methods previously proposed for selecting an optimal individualized treatment rule in that they give the user a sense of how the optimal outcome is estimated to change in the neighborhood of the identified optimum. We discuss an inverse-probability-of-treatment-weighted (IPTW) estimator for these causal effect models that is straightforward to implement using standard statistical software and develop an approach for constructing valid asymptotic confidence intervals based on the influence curve of this estimator. The methodology is illustrated in two simulation studies that are intended to mimic an HIV/AIDS trial

    Supervised Detection of Conserved Motifs in DNA Sequences with cosmo

    Get PDF
    A number of computational methods have been proposed for identifying transcription factor binding sites from a set of unaligned sequences that are thought to share the motif in question. We here introduce an algorithm, called cosmo, that allows this search to be supervised by specifying a set of constraints that the position weight matrix of the unknown motif must satisfy. Such constraints may be formulated, for example, on the basis of prior knowledge about the structure of the transcription factor in question. The algorithm is based on the same two-component multinomial mixture model used by MEME, with stronger reliance, however, on the likelihood principle instead of more ad-hoc criteria like the E-value. The intensity parameter in the ZOOPS and TCM models, for instance, is estimated based on a profile-likelihood approach, and the width of the unknown motif is selected based on BIC. These changes allow cosmo to outperform MEME even in the absence of any constraints, as evidenced by 2- to 3-fold greater sensitivity in some simulation studies. Additional improvements in performance can be achieved by selecting the model type (OOPS, ZOOPS, or TCM) data-adaptively or by supplying correctly specified constraints, especially if the motif appears only as a weak signal in the data. The algorithm can data-adaptively choose between working in a given constrained model or in the completely unconstrained model, guarding against the risk of supplying mis-specified constraints. Simulation studies suggest that this approach can offer 3 to 3.5 times greater sensitivity than MEME. The algorithm has been implemented in the form of a stand-alone C program as well as a web application that can be accessed at http://cosmoweb.berkeley.edu. An R package is available through Bioconductor (http://bioconductor.org)

    Data Adaptive Estimation of the Treatment Specific Mean

    Get PDF
    An important problem in epidemiology and medical research is the estimation of the causal effect of a treatment action at a single point in time on the mean of an outcome, possibly within strata of the target population defined by a subset of the baseline covariates. Current approaches to this problem are based on marginal structural models, i.e., parametric models for the marginal distribution of counterfactural outcomes as a function of treatment and effect modifiers. The various estimators developed in this context furthermore each depend on a high-dimensional nuisance parameter whose estimation currently also relies on parametric models. Since misspecification of any of these models can lead to severely baised estimates of causal effects, the dependence of current methods on such parametric models represents a major limitation. In this article we introduce estimators that allow the marginal structural model as well as the parametric model for the relevant nuisance parameter to be selected data-adaptively. Our methodology is based on the unified loss-based estimation approach recently developed by van der Laan and Dudoit (2003) that in particular extends loss-based estimation to missing data problems. We study the practical performance of our proposed estimators in an extensive simulation study and also apply them to data derived from an epidemiologic study to assess the causal effect of forced expiratory volume on mortality in the elderly. All of the estimators presented in this article are made publicly available in the R package cvDSA

    The Causal Effect of Recent Leisure-Time Physical Activity on All-Cause Mortality Among the Elderly

    Get PDF
    We analyze data collected as part of a prospective cohort study of elderly people living in and around Sonoma, CA, in order to estimate, for each round of interviews, the causal effect of leisure-time physical activity (LTPA) over the past year on the risk of mortality in the following two years. For each round of interviews, this effect is estimated separately for subpopulations defined based on past exercise habits, age, and whether subjects have had cardiac events in the past. This decomposition of the original longitudinal data structure into a series of point-treatment data structures corresponds to an application of history-adjusted marginal structural models as introduced by van der Laan et al. (2005). We propose five different estimators of the parameter of interest, based on various combinations of the usual G-computation, inverse-weighting, and double robust approaches for the two layers of missingness corresponding to the treatment mechanism and right-censoring by drop-out. The models for all nuisance parameters required by these different estimators are selected data-adaptively. For most subpopulations, our analyses suggest that high leisure-time physical activity reduces the subsequent two-year mortality risk by about 50%. Among populations of elderly people aged 75 years or older, these effect estimates are generally significant at the 0.05 level. Notably, our analyses also identify one subpopulation that is estimated to experience an increase in mortality risk when exercising at a higher level, namely subjects aged 75 years or older with previous cardiac events and no history of habitual exercise (RR: 2.33, 95% CI: 0.76-4.35)

    Influenza Vaccination and Mortality: Differentiating Vaccine Effects From Bias

    Get PDF
    It is widely believed that influenza (flu) vaccination of the elderly reduces all-cause mortality, yet randomized trials for assessing vaccine effectiveness are not feasible and the observational research has been controversial. Efforts to differentiate vaccine effectiveness from selection bias have been problematic. The authors examined mortality before, during, and after 9 flu seasons in relation to time-varying vaccination status in an elderly California population in which 115,823 deaths occurred from 1996 to 2005, including 20,484 deaths during laboratory-defined flu seasons. Vaccine coverage averaged 63%; excess mortality when the flu virus was circulating averaged 7.8%. In analyses that omitted weeks when flu circulated, the odds ratio measuring the vaccination-mortality association increased monotonically from 0.34 early in November to 0.56 in January, 0.67 in April, and 0.76 in August. This reflects the trajectory of selection effects in the absence of flu. In analyses that included weeks with flu and adjustment for selection effects, flu season multiplied the odds ratio by 0.954. The corresponding vaccine effectiveness estimate was 4.6% (95% confidence interval: 0.7, 8.3). To differentiate vaccine effects from selection bias, the authors used logistic regression with a novel case-centered specification that may be useful in other population-based studies when the exposure-outcome association varies markedly over time

    Sequence logos for DNA sequence alignments

    No full text
    An alignment of DNA or amino acid sequences is commonly represented in the form of a position weight matrix (PWM), a J ×W matrix in which position (j, w) gives the probability of observing nucleotide j in position w of an alignment of length W. Here J denotes the number of letters in the alphabet from which the sequences were derived. An important summar

    Data-adaptive selection of the truncation level for Inverse-Probability-of-Treatment-Weighted estimators

    Get PDF
    Inverse-Probability-of-Treatment-Weighted (IPTW) estimators are becoming a popular analysis tool in causal inference. It is well known that these estimators suffer from high variability if some treatment probabilities are estimated to be close to zero. While it is a common recommendation for such situations to truncate the weights in order to reduce the mean squared error of the estimator, the current literature gives little guidance on how to select an appropriate truncation level. In this article, we develop a closed-form estimate for the mean squared error of a truncated IPTW estimator that can be used to select this truncation level data-adaptively. While the resulting estimator requires an estimate of an additional nuisance parameter, we show that its consistency does not rely on a consistent estimate of that nuisance parameter. For the case of a binary treatment variable, we present an approach for obtaining an estimate of this nuisance parameter that does not require the user to specify an appropriate parametric model. We illustrate the practical performance of the proposed estimator in a number of simulation studies that show consistent gains in efficiency relative to more ad-hoc truncation approaches currently in use, with typical gains lying in the range from 1 to 15%. In fact, the estimator is seen to perform on par with an infeasible benchmark estimator that relies on knowledge of the true data-generating distribution. In an applied data analysis, the proposed methodology is estimated to achieve a 7% gain in efficiency relative to the non-truncated IPTW estimator, with truncation resulting in a non-significant finding becoming statistically significant. The methodology presented here has been implemented in an R package called tIPTW that can be downloaded at http://www.stat.berkeley.edu/~laan/Software/
    corecore