69,496 research outputs found
The Optimal Discovery Procedure: A New Approach to Simultaneous Significance Testing
Significance testing is one of the main objectives of statistics. The Neyman-Pearson lemma provides a simple rule for optimally testing a single hypothesis when the null and alternative distributions are known. This result has played a major role in the development of significance testing strategies that are used in practice. Most of the work extending single testing strategies to multiple tests has focused on formulating and estimating new types of significance measures, such as the false discovery rate. These methods tend to be based on p-values that are calculated from each test individually, ignoring information from the other tests. As shrinkage estimation borrows strength across point estimates to improve their overall performance, I show here that borrowing strength across multiple significance tests can improve their performance as well. The optimal discovery procedure (ODP) is introduced, which shows how to maximize the number of expected true positives for each fixed number of expected false positives. The optimality achieved by this procedure is shown to be closely related to optimality in terms of the false discovery rate. The ODP motivates a new approach to testing multiple hypotheses, especially when the tests are related. As a simple example, a new simultaneous procedure for testing several Normal means is defined; this is surprisingly demonstrated to outperform the optimal single test procedure, showing that an optimal method for single tests may no longer be optimal in the multiple test setting. Connections to other concepts in statistics are discussed, including Stein\u27s paradox, shrinkage estimation, and Bayesian classification theory
False discovery rate regression: an application to neural synchrony detection in primary visual cortex
Many approaches for multiple testing begin with the assumption that all tests
in a given study should be combined into a global false-discovery-rate
analysis. But this may be inappropriate for many of today's large-scale
screening problems, where auxiliary information about each test is often
available, and where a combined analysis can lead to poorly calibrated error
rates within different subsets of the experiment. To address this issue, we
introduce an approach called false-discovery-rate regression that directly uses
this auxiliary information to inform the outcome of each test. The method can
be motivated by a two-groups model in which covariates are allowed to influence
the local false discovery rate, or equivalently, the posterior probability that
a given observation is a signal. This poses many subtle issues at the interface
between inference and computation, and we investigate several variations of the
overall approach. Simulation evidence suggests that: (1) when covariate effects
are present, FDR regression improves power for a fixed false-discovery rate;
and (2) when covariate effects are absent, the method is robust, in the sense
that it does not lead to inflated error rates. We apply the method to neural
recordings from primary visual cortex. The goal is to detect pairs of neurons
that exhibit fine-time-scale interactions, in the sense that they fire together
more often than expected due to chance. Our method detects roughly 50% more
synchronous pairs versus a standard FDR-controlling analysis. The companion R
package FDRreg implements all methods described in the paper
Multiple testing procedures under confounding
While multiple testing procedures have been the focus of much statistical
research, an important facet of the problem is how to deal with possible
confounding. Procedures have been developed by authors in genetics and
statistics. In this chapter, we relate these proposals. We propose two new
multiple testing approaches within this framework. The first combines
sensitivity analysis methods with false discovery rate estimation procedures.
The second involves construction of shrinkage estimators that utilize the
mixture model for multiple testing. The procedures are illustrated with
applications to a gene expression profiling experiment in prostate cancer.Comment: Published in at http://dx.doi.org/10.1214/193940307000000176 the IMS
Collections (http://www.imstat.org/publications/imscollections.htm) by the
Institute of Mathematical Statistics (http://www.imstat.org
False Discovery Rate Controlled Heterogeneous Treatment Effect Detection for Online Controlled Experiments
Online controlled experiments (a.k.a. A/B testing) have been used as the
mantra for data-driven decision making on feature changing and product shipping
in many Internet companies. However, it is still a great challenge to
systematically measure how every code or feature change impacts millions of
users with great heterogeneity (e.g. countries, ages, devices). The most
commonly used A/B testing framework in many companies is based on Average
Treatment Effect (ATE), which cannot detect the heterogeneity of treatment
effect on users with different characteristics. In this paper, we propose
statistical methods that can systematically and accurately identify
Heterogeneous Treatment Effect (HTE) of any user cohort of interest (e.g.
mobile device type, country), and determine which factors (e.g. age, gender) of
users contribute to the heterogeneity of the treatment effect in an A/B test.
By applying these methods on both simulation data and real-world
experimentation data, we show how they work robustly with controlled low False
Discover Rate (FDR), and at the same time, provides us with useful insights
about the heterogeneity of identified user groups. We have deployed a toolkit
based on these methods, and have used it to measure the Heterogeneous Treatment
Effect of many A/B tests at Snap
- …