1,047 research outputs found
Adaptive Identification of Populations with Treatment Benefit in Clinical Trials: Machine Learning Challenges and Solutions
We study the problem of adaptively identifying patient subpopulations that
benefit from a given treatment during a confirmatory clinical trial. This type
of adaptive clinical trial has been thoroughly studied in biostatistics, but
has been allowed only limited adaptivity so far. Here, we aim to relax
classical restrictions on such designs and investigate how to incorporate ideas
from the recent machine learning literature on adaptive and online
experimentation to make trials more flexible and efficient. We find that the
unique characteristics of the subpopulation selection problem -- most
importantly that (i) one is usually interested in finding subpopulations with
any treatment benefit (and not necessarily the single subgroup with largest
effect) given a limited budget and that (ii) effectiveness only has to be
demonstrated across the subpopulation on average -- give rise to interesting
challenges and new desiderata when designing algorithmic solutions. Building on
these findings, we propose AdaGGI and AdaGCPI, two meta-algorithms for
subpopulation construction. We empirically investigate their performance across
a range of simulation scenarios and derive insights into their (dis)advantages
across different settings.Comment: To appear in the Proceedings of the 40th International Conference on
Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 202
Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing Methodology
The rise of internet-based services and products in the late 1990's brought
about an unprecedented opportunity for online businesses to engage in large
scale data-driven decision making. Over the past two decades, organizations
such as Airbnb, Alibaba, Amazon, Baidu, Booking, Alphabet's Google, LinkedIn,
Lyft, Meta's Facebook, Microsoft, Netflix, Twitter, Uber, and Yandex have
invested tremendous resources in online controlled experiments (OCEs) to assess
the impact of innovation on their customers and businesses. Running OCEs at
scale has presented a host of challenges requiring solutions from many domains.
In this paper we review challenges that require new statistical methodologies
to address them. In particular, we discuss the practice and culture of online
experimentation, as well as its statistics literature, placing the current
methodologies within their relevant statistical lineages and providing
illustrative examples of OCE applications. Our goal is to raise academic
statisticians' awareness of these new research opportunities to increase
collaboration between academia and the online industry
Celebrating 70: An Interview with Don Berry
Donald (Don) Arthur Berry, born May 26, 1940 in Southbridge, Massachusetts,
earned his A.B. degree in mathematics from Dartmouth College and his M.A. and
Ph.D. in statistics from Yale University. He served first on the faculty at the
University of Minnesota and subsequently held endowed chair positions at Duke
University and The University of Texas M.D. Anderson Center. At the time of the
interview he served as Head of the Division of Quantitative Sciences, and
Chairman and Professor of the Department of Biostatistics at UT M.D. Anderson
Center.Comment: Published in at http://dx.doi.org/10.1214/11-STS366 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Sequential stopping for high-throughput experiments
In high-throughput experiments, the sample size is typically chosen informally. Most formal sample-size calculations depend critically on prior knowledge. We propose a sequential strategy that, by updating knowledge when new data are available, depends less critically on prior assumptions. Experiments are stopped or continued based on the potential benefits in obtaining additional data. The underlying decision-theoretic framework guarantees the design to proceed in a coherent fashion. We propose intuitively appealing, easy-to-implement utility functions. As in most sequential design problems, an exact solution is prohibitive. We propose a simulation-based approximation that uses decision boundaries. We apply the method to RNA-seq, microarray, and reverse-phase protein array studies and show its potential advantages. The approach has been added to the Bioconductor package gaga
- …