233 research outputs found
Recommended from our members
Covariate-assisted ranking and screening for large-scale two-sample inference
Two-sample multiple testing has a wide range of applications. The conventionalpractice first reduces the original observations to a vector of p-values and then chooses a cutoffto adjust for multiplicity. However, this data reduction step could cause significant loss ofinformation and thus lead to suboptimal testing procedures.We introduce a new framework fortwo-sample multiple testing by incorporating a carefully constructed auxiliary variable in inferenceto improve the power. A data-driven multiple-testing procedure is developed by employinga covariate-assisted ranking and screening (CARS) approach that optimally combines the informationfrom both the primary and the auxiliary variables. The proposed CARS procedureis shown to be asymptotically valid and optimal for false discovery rate control. The procedureis implemented in the R package CARS. Numerical results confirm the effectiveness of CARSin false discovery rate control and show that it achieves substantial power gain over existingmethods. CARS is also illustrated through an application to the analysis of a satellite imagingdata set for supernova detection
Weighted False Discovery Rate Control in Large-Scale Multiple Testing
The use of weights provides an effective strategy to incorporate prior domain
knowledge in large-scale inference. This paper studies weighted multiple
testing in a decision-theoretic framework. We develop oracle and data-driven
procedures that aim to maximize the expected number of true positives subject
to a constraint on the weighted false discovery rate. The asymptotic validity
and optimality of the proposed methods are established. The results demonstrate
that incorporating informative domain knowledge enhances the interpretability
of results and precision of inference. Simulation studies show that the
proposed method controls the error rate at the nominal level, and the gain in
power over existing methods is substantial in many settings. An application to
genome-wide association study is discussed.Comment: Revise
Optimal Screening and Discovery of Sparse Signals with Applications to Multistage High-throughput Studies
A common feature in large-scale scientific studies is that signals are sparse and it is desirable to significantly narrow down the focus to a much smaller subset in a sequential manner. In this paper, we consider two related data screening problems: One is to find the smallest subset such that it virtually contains all signals and another is to find the largest subset such that it essentially contains only signals. These screening problems are closely connected to but distinct from the more conventional signal detection or multiple testing problems. We develop data-driven screening procedures which control the error rates with near optimality properties and study how to design the experiments efficiently to achieve the goals in data screening. A class of new phase diagrams is developed to characterize the fundamental limitations in simultaneous inference. An application to multistage high-throughput studies is given to illustrate the merits of the proposed screening methods
ZAP: -value Adaptive Procedures for False Discovery Rate Control with Side Information
Adaptive multiple testing with covariates is an important research direction
that has gained major attention in recent years. It has been widely recognized
that leveraging side information provided by auxiliary covariates can improve
the power of false discovery rate (FDR) procedures. Currently, most such
procedures are devised with -values as their main statistics. However, for
two-sided hypotheses, the usual data processing step that transforms the
primary statistics, known as -values, into -values not only leads to a
loss of information carried by the main statistics, but can also undermine the
ability of the covariates to assist with the FDR inference. We develop a
-value based covariate-adaptive (ZAP) methodology that operates on the
intact structural information encoded jointly by the -values and covariates.
It seeks to emulate the oracle -value procedure via a working model, and its
rejection regions significantly depart from those of the -value adaptive
testing approaches. The key strength of ZAP is that the FDR control is
guaranteed with minimal assumptions, even when the working model is
misspecified. We demonstrate the state-of-the-art performance of ZAP using both
simulated and real data, which shows that the efficiency gain can be
substantial in comparison with -value based methods. Our methodology is
implemented in the package
A Locally Adaptive Shrinkage Approach to False Selection Rate Control in High-Dimensional Classification
The uncertainty quantification and error control of classifiers are crucial
in many high-consequence decision-making scenarios. We propose a selective
classification framework that provides an indecision option for any
observations that cannot be classified with confidence. The false selection
rate (FSR), defined as the expected fraction of erroneous classifications among
all definitive classifications, provides a useful error rate notion that trades
off a fraction of indecisions for fewer classification errors. We develop a new
class of locally adaptive shrinkage and selection (LASS) rules for FSR control
in the context of high-dimensional linear discriminant analysis (LDA). LASS is
easy-to-analyze and has robust performance across sparse and dense regimes.
Theoretical guarantees on FSR control are established without strong
assumptions on sparsity as required by existing theories in high-dimensional
LDA. The empirical performances of LASS are investigated using both simulated
and real data
- …