Search CORE

233 research outputs found

Recommended from our members

Covariate-assisted ranking and screening for large-scale two-sample inference

Author: Cai T. Tony
Sun Wenguang
Wang Weinan
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Two-sample multiple testing has a wide range of applications. The conventionalpractice first reduces the original observations to a vector of p-values and then chooses a cutoffto adjust for multiplicity. However, this data reduction step could cause significant loss ofinformation and thus lead to suboptimal testing procedures.We introduce a new framework fortwo-sample multiple testing by incorporating a carefully constructed auxiliary variable in inferenceto improve the power. A data-driven multiple-testing procedure is developed by employinga covariate-assisted ranking and screening (CARS) approach that optimally combines the informationfrom both the primary and the auxiliary variables. The proposed CARS procedureis shown to be asymptotically valid and optimal for false discovery rate control. The procedureis implemented in the R package CARS. Numerical results confirm the effectiveness of CARSin false discovery rate control and show that it achieves substantial power gain over existingmethods. CARS is also illustrated through an application to the analysis of a satellite imagingdata set for supernova detection

eScholarship - University of California

Weighted False Discovery Rate Control in Large-Scale Multiple Testing

Author: Basu Pallavi
Cai T. Tony
Das Kiranmoy
Sun Wenguang
Publication venue
Publication date: 09/05/2017
Field of study

The use of weights provides an effective strategy to incorporate prior domain knowledge in large-scale inference. This paper studies weighted multiple testing in a decision-theoretic framework. We develop oracle and data-driven procedures that aim to maximize the expected number of true positives subject to a constraint on the weighted false discovery rate. The asymptotic validity and optimality of the proposed methods are established. The results demonstrate that incorporating informative domain knowledge enhances the interpretability of results and precision of inference. Simulation studies show that the proposed method controls the error rate at the nominal level, and the gain in power over existing methods is substantial in many settings. An application to genome-wide association study is discussed.Comment: Revise

arXiv.org e-Print Archive

ScholarlyCommons@Penn

FigShare

Optimal Screening and Discovery of Sparse Signals with Applications to Multistage High-throughput Studies

Author: Cai Tony
Sun Wenguang
Publication venue: ScholarlyCommons
Publication date: 01/01/2017
Field of study

A common feature in large-scale scientific studies is that signals are sparse and it is desirable to significantly narrow down the focus to a much smaller subset in a sequential manner. In this paper, we consider two related data screening problems: One is to find the smallest subset such that it virtually contains all signals and another is to find the largest subset such that it essentially contains only signals. These screening problems are closely connected to but distinct from the more conventional signal detection or multiple testing problems. We develop data-driven screening procedures which control the error rates with near optimality properties and study how to design the experiments efficiently to achieve the goals in data screening. A class of new phase diagrams is developed to characterize the fundamental limitations in simultaneous inference. An application to multistage high-throughput studies is given to illustrate the merits of the proposed screening methods

ScholarlyCommons@Penn

ZAP: $Z$ -value Adaptive Procedures for False Discovery Rate Control with Side Information

Author: Leung Dennis
Sun Wenguang
Publication venue
Publication date: 01/10/2022
Field of study

Adaptive multiple testing with covariates is an important research direction that has gained major attention in recent years. It has been widely recognized that leveraging side information provided by auxiliary covariates can improve the power of false discovery rate (FDR) procedures. Currently, most such procedures are devised with

p

-values as their main statistics. However, for two-sided hypotheses, the usual data processing step that transforms the primary statistics, known as

z

-values, into

p

-values not only leads to a loss of information carried by the main statistics, but can also undermine the ability of the covariates to assist with the FDR inference. We develop a

z

-value based covariate-adaptive (ZAP) methodology that operates on the intact structural information encoded jointly by the

z

-values and covariates. It seeks to emulate the oracle

z

-value procedure via a working model, and its rejection regions significantly depart from those of the

p

-value adaptive testing approaches. The key strength of ZAP is that the FDR control is guaranteed with minimal assumptions, even when the working model is misspecified. We demonstrate the state-of-the-art performance of ZAP using both simulated and real data, which shows that the efficiency gain can be substantial in comparison with

p

-value based methods. Our methodology is implemented in the

\texttt{R}

package

\texttt{zap}

arXiv.org e-Print Archive

A Locally Adaptive Shrinkage Approach to False Selection Rate Control in High-Dimensional Classification

Author: Gang Bowen
Shi Yuantao
Sun Wenguang
Publication venue
Publication date: 09/10/2022
Field of study

The uncertainty quantification and error control of classifiers are crucial in many high-consequence decision-making scenarios. We propose a selective classification framework that provides an indecision option for any observations that cannot be classified with confidence. The false selection rate (FSR), defined as the expected fraction of erroneous classifications among all definitive classifications, provides a useful error rate notion that trades off a fraction of indecisions for fewer classification errors. We develop a new class of locally adaptive shrinkage and selection (LASS) rules for FSR control in the context of high-dimensional linear discriminant analysis (LDA). LASS is easy-to-analyze and has robust performance across sparse and dense regimes. Theoretical guarantees on FSR control are established without strong assumptions on sparsity as required by existing theories in high-dimensional LDA. The empirical performances of LASS are investigated using both simulated and real data

arXiv.org e-Print Archive