4,620 research outputs found
Heterogeneous Treatment and Spillover Effects under Clustered Network Interference
The bulk of causal inference studies rules out the presence of interference
between units. However, in many real-world settings units are interconnected by
social, physical or virtual ties and the effect of a treatment can spill from
one unit to other connected individuals in the network. In these settings,
interference should be taken into account to avoid biased estimates of the
treatment effect, but it can also be leveraged to save resources and provide
the intervention to a lower percentage of the population where the treatment is
more effective and where the effect can spill over to other susceptible
individuals. In fact, different people might respond differently not only to
the treatment received but also to the treatment received by their network
contacts. Understanding the heterogeneity of treatment and spillover effects
can help policy-makers in the scale-up phase of the intervention, it can guide
the design of targeting strategies with the ultimate goal of making the
interventions more cost-effective, and it might even allow generalizing the
level of treatment spillover effects in other populations. In this paper, we
develop a machine learning method that makes use of tree-based algorithms and
an Horvitz-Thompson estimator to assess the heterogeneity of treatment and
spillover effects with respect to individual, neighborhood and network
characteristics in the context of clustered network interference. We illustrate
how the proposed binary tree methodology performs in a Monte Carlo simulation
study. Additionally, we provide an application on a randomized experiment aimed
at assessing the heterogeneous effects of information sessions on the uptake of
a new weather insurance policy in rural China
Robust and Heterogenous Odds Ratio: Estimating Price Sensitivity for Unbought Items
Problem definition: Mining for heterogeneous responses to an intervention is
a crucial step for data-driven operations, for instance to personalize
treatment or pricing. We investigate how to estimate price sensitivity from
transaction-level data. In causal inference terms, we estimate heterogeneous
treatment effects when (a) the response to treatment (here, whether a customer
buys a product) is binary, and (b) treatment assignments are partially observed
(here, full information is only available for purchased items).
Methodology/Results: We propose a recursive partitioning procedure to estimate
heterogeneous odds ratio, a widely used measure of treatment effect in medicine
and social sciences. We integrate an adversarial imputation step to allow for
robust inference even in presence of partially observed treatment assignments.
We validate our methodology on synthetic data and apply it to three case
studies from political science, medicine, and revenue management. Managerial
Implications: Our robust heterogeneous odds ratio estimation method is a simple
and intuitive tool to quantify heterogeneity in patients or customers and
personalize interventions, while lifting a central limitation in many revenue
management data
The Finite Sample Performance of Estimators for Mediation Analysis Under Sequential Conditional Independence
Using a comprehensive simulation study based on empirical data, this article investigates the finite sample properties of different classes of parametric and semiparametric stimators of (natural) direct and indirect causal effects used in mediation analysis under sequential conditional independence assumptions. The estimators are based on regression, inverse probability weighting, and combinations thereof. Our simulation design uses a large population of Swiss jobseekers and considers variations of several features of the data-generating process (DGP) and the implementation of the estimators that are of practical relevance. We find that no estimator performs uniformly best (in terms of root mean squared error) in all simulations. Overall, so-called “g-computation” dominates. However, differences between estimators are often (but not always) minor in the various setups and the relative performance of the methods often (but not always) varies with the features of the DGP
The Economic Effect of Gaining a New Qualification Later in Life
Pursuing educational qualifications later in life is an increasingly common
phenomenon within OECD countries since technological change and automation
continues to drive the evolution of skills needed in many professions. We focus
on the causal impacts to economic returns of degrees completed later in life,
where motivations and capabilities to acquire additional education may be
distinct from education in early years. We find that completing an additional
degree leads to more than \$3000 (AUD, 2019) extra income per year compared to
those who do not complete additional study. For outcomes, treatment and
controls we use the extremely rich and nationally representative longitudinal
data from the Household Income and Labour Dynamics Australia survey (HILDA). To
take full advantage of the complexity and richness of this data we use a
Machine Learning (ML) based methodology for causal effect estimation. We are
also able to use ML to discover sources of heterogeneity in the effects of
gaining additional qualifications. For example, those younger than 45 years of
age when obtaining additional qualifications tend to reap more benefits (as
much as \$50 per week more) than others.Comment: 63 pages, 16 figure
Nonparametric Treatment Effect Identification in School Choice
We study identification and estimation of treatment effects in common school
choice settings, under unrestricted heterogeneity in individual potential
outcomes. We propose two notions of identification, corresponding to design-
and sampling-based uncertainty, respectively. We characterize the set of causal
estimands that are identified for a large variety of school choice mechanisms,
including ones that feature both random and non-random tie-breaking; we discuss
their policy implications. We also study the asymptotic behavior of
nonparametric estimators for these causal estimands. Lastly, we connect our
approach to the propensity score approach proposed in Abdulkadiroglu, Angrist,
Narita, and Pathak (2017a, forthcoming), and derive the implicit estimands of
the latter approach, under fully heterogeneous treatment effects.Comment: Presented at SOLE 202
Recommended from our members
Hypothesis testing and causal inference with heterogeneous medical data
Learning from data which associations hold and are likely to hold in the future is a fundamental part of scientific discovery. With increasingly heterogeneous data collection practices, exemplified by passively collected electronic health records or high-dimensional genetic data with only few observed samples, biases and spurious correlations are prevalent. These are called spurious because they do not contribute to the effect being studied. In this context, the modelling assumptions of existing statistical tests and causal inference methods are often found inadequate and their practical utility diminished even though these models are increasingly used as decision-support tools in practice. This thesis investigates how modern computational techniques may broaden the fields of hypothesis testing and causal inference to handle the subtleties of large heterogeneous data sets, as well as simultaneously improve the robustness and theoretical understanding of machine learning algorithms using insights from causality and statistics.
The first part of this thesis is concerned with hypothesis testing. We develop a framework for hypothesis testing on set-valued data, a representation that faithfully describes many real-world phenomena including patient biomarker trajectories in the hospital. Using similar techniques, we develop next a two-sample test for making inference on selection-biased data, in the sense that not all individuals are equally likely to be included in the study, a fact that biases tests if not accounted for and if the desideratum is to obtain conclusions that are generally applicable. We conclude this section with an investigation of conditional independence in high-dimensional data, such as found in gene expression data, and propose a test using generative adversarial networks. The second part of this thesis is concerned with causal inference and discovery, with a special focus on the influence of unobserved confounders that distort the observed associations between variables and yet may not be ruled out or adjusted for using data alone. We start by demonstrating that unobserved confounders may bias substantially the generalization performance of machine learning algorithms trained with conventional learning paradigms such as empirical risk minimization. Acknowledging this spurious effect, we develop a new learning principle inspired by causal insights that provably generalizes to test data sampled from a larger set of distributions different from the training distribution. In the last chapter we consider the influence of unobserved confounders for causal discovery. We show that with some assumptions on the type and influence on the nature of unobserved confounding one may develop provably consistent causal discovery algorithms, formulated as a solution to a continuous optimization program
- …