10 research outputs found

    Fair Inference On Outcomes

    Full text link
    In this paper, we consider the problem of fair statistical inference involving outcome variables. Examples include classification and regression problems, and estimating treatment effects in randomized trials or observational data. The issue of fairness arises in such problems where some covariates or treatments are "sensitive," in the sense of having potential of creating discrimination. In this paper, we argue that the presence of discrimination can be formalized in a sensible way as the presence of an effect of a sensitive covariate on the outcome along certain causal pathways, a view which generalizes (Pearl, 2009). A fair outcome model can then be learned by solving a constrained optimization problem. We discuss a number of complications that arise in classical statistical inference due to this view and provide workarounds based on recent work in causal and semi-parametric inference

    Graphical models for mediation analysis

    Full text link
    Mediation analysis seeks to infer how much of the effect of an exposure on an outcome can be attributed to specific pathways via intermediate variables or mediators. This requires identification of so-called path-specific effects. These express how a change in exposure affects those intermediate variables (along certain pathways), and how the resulting changes in those variables in turn affect the outcome (along subsequent pathways). However, unlike identification of total effects, adjustment for confounding is insufficient for identification of path-specific effects because their magnitude is also determined by the extent to which individuals who experience large exposure effects on the mediator, tend to experience relatively small or large mediator effects on the outcome. This chapter therefore provides an accessible review of identification strategies under general nonparametric structural equation models (with possibly unmeasured variables), which rule out certain such dependencies. In particular, it is shown which path-specific effects can be identified under such models, and how this can be done

    Mediation analysis of time-to-event endpoints accounting for repeatedly measured mediators subject to time-varying confounding.

    Get PDF
    In this article, we will present statistical methods to assess to what extent the effect of a randomised treatment (versus control) on a time-to-event endpoint might be explained by the effect of treatment on a mediator of interest, a variable that is measured longitudinally at planned visits throughout the trial. In particular, we will show how to identify and infer the path-specific effect of treatment on the event time via the repeatedly measured mediator levels. The considered proposal addresses complications due to patients dying before the mediator is assessed, due to the mediator being repeatedly measured, and due to posttreatment confounding of the effect of the mediator by other mediators. We illustrate the method by an application to data from the LEADER cardiovascular outcomes trial

    Methods of analysis for survival outcomes with time-updated mediators, with application to longitudinal disease registry data.

    Get PDF
    Mediation analysis is a useful tool to illuminate the mechanisms through which an exposure affects an outcome but statistical challenges exist with time-to-event outcomes and longitudinal observational data. Natural direct and indirect effects cannot be identified when there are exposure-induced confounders of the mediator-outcome relationship. Previous measurements of a repeatedly-measured mediator may themselves confound the relationship between the mediator and the outcome. To overcome these obstacles, two recent methods have been proposed, one based on path-specific effects and one based on an additive hazards model and the concept of exposure splitting. We investigate these techniques, focusing on their application to observational datasets. We apply both methods to an analysis of the UK Cystic Fibrosis Registry dataset to identify how much of the relationship between onset of cystic fibrosis-related diabetes and subsequent survival acts through pulmonary function. Statistical properties of the methods are investigated using simulation. Both methods produce unbiased estimates of indirect and direct effects in scenarios consistent with their stated assumptions but, if the data are measured infrequently, estimates may be biased. Findings are used to highlight considerations in the interpretation of the observational data analysis

    Causal Inference Methods For Bias Correction In Data Analyses

    Get PDF
    Many problems in the empirical sciences and rational decision making require causal, rather than associative, reasoning. The field of causal inference is concerned with establishing and quantifying cause-effect relationships to inform interventions, even in the absence of direct experimentation or randomization. With the proliferation of massive datasets, it is crucial that we develop principled approaches to drawing actionable conclusions from imperfect information. Inferring valid causal conclusions is impeded by the fact that data are unstructured and filled with different sources of bias. The types of bias that we consider in this thesis include: confounding bias induced by common causes of observed exposures and outcomes, bias in estimation induced by high dimensional data and curse of dimensionality, discriminatory bias encoded in data that reflect historical patterns of discrimination and inequality, and missing data bias where instantiations of variables are systematically missing. The focus of this thesis is on the development of novel causal and statistical methodologies to better understand and resolve these pressing challenges. We draw on methodological insights from both machine learning/artificial intelligence and statistical theory. Specifically, we use ideas from graphical modeling to encode our assumptions about the underlying data generating mechanisms in a clear and succinct manner. Further, we use ideas from nonparametric and semiparametric theories to enable the use of flexible machine learning modes in the estimation of causal effects that are identified as functions of observed data. There are four main contributions to this thesis. First, we bridge the gap between identification and semiparametric estimation of causal effects that are identified in causal graphical models with unmeasured confounders. Second, we use semiparametric inference theory for marginal structural models to give the first general approach to causal sufficient dimension reduction of a high dimensional treatment. Third, we address conceptual, methodological, and practical gaps in assessing and overcoming disparities in automated decision making using causal inference and constrained optimization. Fourth, we use graphical representations of missing data mechanisms and provide a complete characterization of identification of the underlying joint distribution where some variables are systematically missing and others are unmeasured