2,097 research outputs found
Designing and Deploying Online Field Experiments
Online experiments are widely used to compare specific design alternatives,
but they can also be used to produce generalizable knowledge and inform
strategic decision making. Doing so often requires sophisticated experimental
designs, iterative refinement, and careful logging and analysis. Few tools
exist that support these needs. We thus introduce a language for online field
experiments called PlanOut. PlanOut separates experimental design from
application code, allowing the experimenter to concisely describe experimental
designs, whether common "A/B tests" and factorial designs, or more complex
designs involving conditional logic or multiple experimental units. These
latter designs are often useful for understanding causal mechanisms involved in
user behaviors. We demonstrate how experiments from the literature can be
implemented in PlanOut, and describe two large field experiments conducted on
Facebook with PlanOut. For common scenarios in which experiments are run
iteratively and in parallel, we introduce a namespaced management system that
encourages sound experimental practice.Comment: Proceedings of the 23rd international conference on World wide web,
283-29
Using functional annotation to characterize genome-wide association results
Genome-wide association studies (GWAS) have successfully identified thousands of variants robustly associated with hundreds of complex traits, but the biological mechanisms driving these results remain elusive. Functional annotation, describing the roles of known genes and regulatory elements, provides additional information about associated variants. This dissertation explores the potential of these annotations to explain the biology behind observed GWAS results.
The first project develops a random-effects approach to genetic fine mapping of trait-associated loci. Functional annotation and estimates of the enrichment of genetic effects in each annotation category are integrated with linkage disequilibrium (LD) within each locus and GWAS summary statistics to prioritize variants with plausible functionality. Applications of this method to simulated and real data show good performance in a wider range of scenarios relative to previous approaches. The second project focuses on the estimation of enrichment by annotation categories. I derive the distribution of GWAS summary statistics as a function of annotations and LD structure and perform maximum likelihood estimation of enrichment coefficients in two simulated scenarios. The resulting estimates are less variable than previous methods, but the asymptotic theory of standard errors is often not applicable due to non-convexity of the likelihood function. In the third project, I investigate the problem of selecting an optimal set of tissue-specific annotations with greatest relevance to a trait of interest. I consider three selection criteria defined in terms of the mutual information between functional annotations and GWAS summary statistics. These algorithms correctly identify enriched categories in simulated data, but in the application to a GWAS of BMI the penalty for redundant features outweighs the modest relationships with the outcome yielding null selected feature sets, due to the weaker overall association and high similarity between tissue-specific regulatory features.
All three projects require little in the way of prior hypotheses regarding the mechanism of genetic effects. These data-driven approaches have the potential to illuminate unanticipated biological relationships, but are also limited by the high dimensionality of the data relative to the moderate strength of the signals under investigation. These approaches advance the set of tools available to researchers to draw biological insights from GWAS results
Leveraging text data for causal inference using electronic health records
Text is a ubiquitous component of medical data, containing valuable
information about patient characteristics and care that are often missing from
structured chart data. Despite this richness, it is rarely used in clinical
research, owing partly to its complexity. Using a large database of patient
records and treatment histories accompanied by extensive notes by attendant
physicians and nurses, we show how text data can be used to support causal
inference with electronic health data in all stages, from conception and design
to analysis and interpretation, with minimal additional effort. We focus on
studies using matching for causal inference. We augment a classic matching
analysis by incorporating text in three ways: by using text to supplement a
multiple imputation procedure, we improve the fidelity of imputed values to
handle missing data; by incorporating text in the matching stage, we strengthen
the plausibility of the matching procedure; and by conditioning on text, we can
estimate easily interpretable text-based heterogeneous treatment effects that
may be stronger than those found across categories of structured covariates.
Using these techniques, we hope to expand the scope of secondary analysis of
clinical data to domains where quantitative data is of poor quality or
nonexistent, but where text is available, such as in developing countries
Robust and Heterogenous Odds Ratio: Estimating Price Sensitivity for Unbought Items
Problem definition: Mining for heterogeneous responses to an intervention is
a crucial step for data-driven operations, for instance to personalize
treatment or pricing. We investigate how to estimate price sensitivity from
transaction-level data. In causal inference terms, we estimate heterogeneous
treatment effects when (a) the response to treatment (here, whether a customer
buys a product) is binary, and (b) treatment assignments are partially observed
(here, full information is only available for purchased items).
Methodology/Results: We propose a recursive partitioning procedure to estimate
heterogeneous odds ratio, a widely used measure of treatment effect in medicine
and social sciences. We integrate an adversarial imputation step to allow for
robust inference even in presence of partially observed treatment assignments.
We validate our methodology on synthetic data and apply it to three case
studies from political science, medicine, and revenue management. Managerial
Implications: Our robust heterogeneous odds ratio estimation method is a simple
and intuitive tool to quantify heterogeneity in patients or customers and
personalize interventions, while lifting a central limitation in many revenue
management data
- …