Search CORE

85,802 research outputs found

All your data are always missing: incorporating bias due to measurement error into the potential outcomes framework

Author: Cole Stephen R
Edwards Jessie K
Westreich Daniel
Publication venue
Publication date: 01/01/2015
Field of study

Epidemiologists often use the potential outcomes framework to cast causal inference as a missing data problem. Here, we demonstrate how bias due to measurement error can be described in terms of potential outcomes and considered in concert with bias from other sources. In addition, we illustrate how acknowledging the uncertainty that arises due to measurement error increases the amount of missing information in causal inference. We use a simple example to show that estimating the average treatment effect requires the investigator to perform a series of hidden imputations based on strong assumptions

PubMed Central

Carolina Digital Repository

Contrastive Counterfactual Learning for Causality-aware Interpretable Recommender Systems

Author: Chen Xiaocong
Huang Chengkai
Wang Chen
Xu Xiwei
Yao Lina
Zhou Guanglin
Zhu Liming
Publication venue
Publication date: 23/05/2023
Field of study

There has been a recent surge in the study of generating recommendations within the framework of causal inference, with the recommendation being treated as a treatment. This approach enhances our understanding of how recommendations influence user behaviour and allows for identification of the factors that contribute to this impact. Many researchers in the field of causal inference for recommender systems have focused on using propensity scores, which can reduce bias but may also introduce additional variance. Other studies have proposed the use of unbiased data from randomized controlled trials, though this approach requires certain assumptions that may be difficult to satisfy in practice. In this paper, we first explore the causality-aware interpretation of recommendations and show that the underlying exposure mechanism can bias the maximum likelihood estimation (MLE) of observational feedback. Given that confounders may be inaccessible for measurement, we propose using contrastive SSL to reduce exposure bias, specifically through the use of inverse propensity scores and the expansion of the positive sample set. Based on theoretical findings, we introduce a new contrastive counterfactual learning method (CCL) that integrates three novel positive sampling strategies based on estimated exposure probability or random counterfactual samples. Through extensive experiments on two real-world datasets, we demonstrate that our CCL outperforms the state-of-the-art methods.Comment: conferenc

arXiv.org e-Print Archive

A new three-step method for using inverse propensity weighting with latent class analysis

Author: Clouth F. J.
Mols F.
Pauws S.
Vermunt J. K.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Bias-adjusted three-step latent class analysis (LCA) is widely popular to relate covariates to class membership. However, if the causal effect of a treatment on class membership is of interest and only observational data is available, causal inference techniques such as inverse propensity weighting (IPW) need to be used. In this article, we extend the bias-adjusted three-step LCA to incorporate IPW. This approach separates the estimation of the measurement model from the estimation of the treatment effect using IPW only for the later step. Compared to previous methods, this solves several conceptual issues and more easily facilitates model selection and the use of multiple imputation. This new approach, implemented in the software Latent GOLD, is evaluated in a simulation study and its use is illustrated using data of prostate cancer patients

Tilburg University Repository

On Defense of the Hazard Ratio

Author: Xu Ronghui
Ying Andrew
Publication venue
Publication date: 21/07/2023
Field of study

There has been debate on whether the hazard function should be used for causal inference in time-to-event studies. The main criticism is that there is selection bias because the risk sets beyond the first event time are comprised of subsets of survivors who are no longer balanced in the risk factors, even in the absence of unmeasured confounding, measurement error, and model misspecification. In this short communication we use the potential outcomes framework and the single-world intervention graph to show that there is indeed no selection bias when estimating the average treatment effect, and that the hazard ratio over time can provide a useful interpretation in practical settings

arXiv.org e-Print Archive

Mendelian randomisation for mediation analysis: current methods and challenges for implementation

Author: Carter Alice R
Davey Smith George
Davies Neil M
Hammerton Gemma
Heron Jon
Howe Laura D
Richmond Rebecca C
Sanderson Eleanor
Taylor Amy E
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Mediation analysis seeks to explain the pathway(s) through which an exposure affects an outcome. Traditional, non-instrumental variable methods for mediation analysis experience a number of methodological difficulties, including bias due to confounding between an exposure, mediator and outcome and measurement error. Mendelian randomisation (MR) can be used to improve causal inference for mediation analysis. We describe two approaches that can be used for estimating mediation analysis with MR: multivariable MR (MVMR) and two-step MR. We outline the approaches and provide code to demonstrate how they can be used in mediation analysis. We review issues that can affect analyses, including confounding, measurement error, weak instrument bias, interactions between exposures and mediators and analysis of multiple mediators. Description of the methods is supplemented by simulated and real data examples. Although MR relies on large sample sizes and strong assumptions, such as having strong instruments and no horizontally pleiotropic pathways, our simulations demonstrate that these methods are unaffected by confounders of the exposure or mediator and the outcome and non-differential measurement error of the exposure or mediator. Both MVMR and two-step MR can be implemented in both individual-level MR and summary data MR. MR mediation methods require different assumptions to be made, compared with non-instrumental variable mediation methods. Where these assumptions are more plausible, MR can be used to improve causal inference in mediation analysis. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s10654-021-00757-1

PubMed Central

Explore Bristol Research

Causal Inference with Measurement Error

Author: Shu Di
Publication venue: 'University of Waterloo'
Publication date: 02/05/2018
Field of study

Causal inference methods have been widely used in biomedical sciences and social sciences, among many others. With different assumptions, various methods have been proposed to conduct causal inference with interpretable results. The validity of most existing methods, if not all, relies on a crucial condition: all the variables need to be precisely measured. This condition, however, is commonly violated. In many applications, the collected data are not precisely measured and are subject to measurement error. Ignoring measurement error effects can lead to severely biased results and misleading conclusions. In order to obtain reliable inference results, measurement error effects should be carefully addressed. Outside the context of causal inference, research on measurement error problems has been extensive and a large body of methods have been developed. In the paradigm of causal inference, however, there is limited research on measurement error problems, although an increasing, but still scarce, literature has emerged. Certainly, this is an area that deserves in-depth investigation. Motivated by this, this thesis focuses on causal inference with measurement error. We investigate the impact of measurement error and propose methods which correct for measurement error effects for several useful settings. This thesis consists of nine chapters. As a preliminary, Chapter 1 gives an introduction to causal inference, measurement error and other features such as missing data, as well as an overview of existing methods on causal inference with measurement error. In this chapter we also describe the problems of our interest that will be investigated in depth in subsequent chapters. Chapter 2 considers estimation of the causal odds ratio, the causal risk ratio and the causal risk difference in the presence of measurement error in confounders, possibly time-varying. By adapting two correction methods for measurement error effects applicable for the noncausal context, we propose valid methods which consistently estimate the causal effect measures for settings with error-prone confounders. Furthermore, we develop a linear combination based method to construct estimators with improved asymptotic efficiency. Chapter 3 focuses on the inverse-probability-of-treatment weighted (IPTW) estimation of causal parameters under marginal structural models with error-contaminated and time-varying confounders. To account for bias due to imprecise measurements, we develop several correction methods. Both the so-called stabilized and unstabilized weighting strategies are covered in the development. In Chapter 4, measurement error in outcomes is of concern. For settings of inverse probability weighting (IPW) estimation, we study the impact of measurement error for both continuous and binary outcome variables and reveal interesting consequences of the naive analysis which ignores measurement error. When a continuous outcome variable is mismeasured under an additive measurement error model, the naive analysis may still yield a consistent estimator; when the outcome is binary, we derive the asymptotic bias in a closed-form. Furthermore, we develop consistent estimation procedures for practical scenarios where either validation data or replicates are available. With validation data, we propose an efficient method. To provide protection against model misspecification, we further develop a doubly robust estimator which is consistent even when one of the treatment model and the outcome model is misspecified. In Chapter 5, the research problem of interest is to deal with measurement error generated from more than one sources. We study the IPW estimation for settings with mismeasured covariates and misclassified outcomes. To correct for measurement error and misclassification effects simultaneously, we develop two estimation methods to facilitate different forms of the treatment model. Our discussion covers a broad scope of treatment models including typically assumed logistic regression models as well as general treatment assignment mechanisms. Chapters 2-5 emphasize addressing measurement error effects on causal inference. In applications, we may be further challenged by additional data features. For instance, missing values frequently occur in the data collection process in addition to measurement error. In Chapter 6, we investigate the problem for which both missingness and misclassification may be present in the binary outcome variable. We particularly consider the IPW estimation and derive the asymptotic biases of three types of naive analysis which ignore either missingness or misclassification or both. We develop valid estimation methods to correct for missingness and misclassification effects simultaneously. To provide protection against misspecification, we further propose a doubly robust correction method. Doubly robust estimators developed in Chapter 6 offer us a viable way to address issues of model misspecification and they can be easily applied for practical problems. However, such an appealing property does not say that doubly robust estimators have no weakness. When both the treatment model and the outcome model are misspecified, such estimators will not necessarily be consistent. Driven by this consideration, in Chapter 7, we propose new estimation methods to correct for effects of misclassification and/or missingness in outcomes. Differing from the doubly robust estimators which are constructed based on a single treatment model and a single outcome model, the new methods are developed by considering a set of treatment models and a set of outcome models. Such enlargements of the associated models enable us to construct consistent estimators which will enjoy the so-called multiple robustness, a property that has been discussed in the literature of missing data. To expedite the application of our developed methods, we implement the proposed methods in Chapter 4 and develop an R package for general users. The details are included in Chapter 8. The thesis concludes with a discussion in Chapter 9

University of Waterloo's Institutional Repository

Recommended from our members

Essays in Cluster Sampling and Causal Inference

Author: Makela Susanna
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

This thesis consists of three papers in applied statistics, specifically in cluster sampling, causal inference, and measurement error. The first paper studies the problem of estimating the finite population mean from a two-stage sample with unequal selection probabilies in a Bayesian framework. Cluster sampling is common in survey practice, and the corresponding inference has been predominantly design-based. We develop a Bayesian framework for cluster sampling and account for the design effect in the outcome modeling. In a two-stage cluster sampling design, clusters are first selected with probability proportional to cluster size, and units are then randomly sampled within selected clusters. Methodological challenges arise when the sizes of nonsampled cluster are unknown. We propose both nonparametric and parametric Bayesian approaches for predicting the cluster size, and we implement inference for the unknown cluster sizes simultaneously with inference for survey outcome. We implement this method in Stan and use simulation studies to compare the performance of an integrated Bayesian approach to classical methods on their frequentist properties. We then apply our propsed method to the Fragile Families and Child Wellbeing study as an illustration of complex survey inference. The second paper focuses on the problem of weak instrumental variables, motivated by estimating the causal effect of incarceration on recidivism. An instrument is weak when it is only weakly predictive of the treatment of interest. Given the well-known pitfalls of weak instrumental variables, we propose a method for strengthening a weak instrument. We use a matching strategy that pairs observations to be close on observed covariates but far on the instrument. This strategy strengthens the instrument, but with the tradeoff of reduced sample size. To help guide the applied researcher in selecting a match, we propose simulating the power of a sensitivity analysis and design sensitivity and using graphical methods to examine the results. We also demonstrate the use of recently developed methods for identifying effect modification, which is an interaction between a pretreatment covariate and the treatment. Larger and less variable treatment effects are less sensitive to unobserved bias, so identifying when effect modification is present and which covariates may be the source is important. We undertake our study in the context of studying the causal effect of incarceration on recividism via a natural experiment in the state of Pennsylvania, a motivating example that illustrates each component of our analysis. The third paper considers the issue of measurement error in the context of survey sampling and hierarchical models. Researchers are often interested in studying the relationship between community-levels variables and individual outcomes. This approach often requires estimating the neighborhood-level variable of interest from the sampled households, which induces measurement error in the neighborhood-level covariate since not all households are sampled. Other times, neighborhood-level variables are not observed directly, and only a noisy proxy is available. In both cases, the observed variables may contain measurement error. Measurement error is known to attenuate the coefficient of the mismeasured variable, but it can also affect other coefficients in the model, and ignoring measurement error can lead to misleading inference. We propose a Bayesian hierarchical model that integrates an explicit model for the measurement error process along with a model for the outcome of interest for both sampling-induced measurement error and classical measurement error. Advances in Bayesian computation, specifically the development of the Stan probabilistic programming language, make the implementation of such models easy and straightforward

Columbia University Academic Commons