Search CORE

204,669 research outputs found

Effects of dependence in high-dimensional multiple testing problems

Author: A Dobra
A Reiner
A Subramanian
A Wille
AB Owen
B Efron
B Efron
B Jones
C Genovese
D Yekutieli
EL Korn
G Marsaglia
G Wright
GP Wagner
J Schafer
J Storey
J Whittaker
JD Storey
JD Storey
JD Storey
Kyung In Kim
M Langaas
MA Black
Mark A van de Wiel
P Westfall
R Ihaka
S Dudoit
SH Jung
SL Lauritzen
V Tusher
X Qiu
Y Benjamini
Y Benjamini
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background We consider effects of dependence among variables of high-dimensional data in multiple hypothesis testing problems, in particular the False Discovery Rate (FDR) control procedures. Recent simulation studies consider only simple correlation structures among variables, which is hardly inspired by real data features. Our aim is to systematically study effects of several network features like sparsity and correlation strength by imposing dependence structures among variables using random correlation matrices. Results We study the robustness against dependence of several FDR procedures that are popular in microarray studies, such as Benjamin-Hochberg FDR, Storey's q-value, SAM and resampling based FDR procedures. False Non-discovery Rates and estimates of the number of null hypotheses are computed from those methods and compared. Our simulation study shows that methods such as SAM and the q-value do not adequately control the FDR to the level claimed under dependence conditions. On the other hand, the adaptive Benjamini-Hochberg procedure seems to be most robust while remaining conservative. Finally, the estimates of the number of true null hypotheses under various dependence conditions are variable. Conclusion We discuss a new method for efficient guided simulation of dependent data, which satisfy imposed network constraints as conditional independence structures. Our simulation set-up allows for a structural study of the effect of dependencies on multiple testing criterions and is useful for testing a potentially new method on <it>π</it>0 or FDR estimation in a dependency context.</p

Directory of Open Access Journals

PubMed Central

Recommended from our members

Covariate-assisted ranking and screening for large-scale two-sample inference

Author: Cai T. Tony
Sun Wenguang
Wang Weinan
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Two-sample multiple testing has a wide range of applications. The conventionalpractice first reduces the original observations to a vector of p-values and then chooses a cutoffto adjust for multiplicity. However, this data reduction step could cause significant loss ofinformation and thus lead to suboptimal testing procedures.We introduce a new framework fortwo-sample multiple testing by incorporating a carefully constructed auxiliary variable in inferenceto improve the power. A data-driven multiple-testing procedure is developed by employinga covariate-assisted ranking and screening (CARS) approach that optimally combines the informationfrom both the primary and the auxiliary variables. The proposed CARS procedureis shown to be asymptotically valid and optimal for false discovery rate control. The procedureis implemented in the R package CARS. Numerical results confirm the effectiveness of CARSin false discovery rate control and show that it achieves substantial power gain over existingmethods. CARS is also illustrated through an application to the analysis of a satellite imagingdata set for supernova detection

eScholarship - University of California

Cram\'{e}r-type moderate deviations for Studentized two-sample $U$ -statistics with applications

Author: Chang Jinyuan
Shao Qi-Man
Zhou Wen-Xin
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 28/09/2016
Field of study

Two-sample

U

-statistics are widely used in a broad range of applications, including those in the fields of biostatistics and econometrics. In this paper, we establish sharp Cram\'{e}r-type moderate deviation theorems for Studentized two-sample

U

-statistics in a general framework, including the two-sample

t

-statistic and Studentized Mann-Whitney test statistic as prototypical examples. In particular, a refined moderate deviation theorem with second-order accuracy is established for the two-sample

t

-statistic. These results extend the applicability of the existing statistical methodologies from the one-sample

t

-statistic to more general nonlinear statistics. Applications to two-sample large-scale multiple testing problems with false discovery rate control and the regularized bootstrap method are also discussed.Comment: Published at http://dx.doi.org/10.1214/15-AOS1375 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

eScholarship - University of California

Simulation-Based Hypothesis Testing of High Dimensional Means Under Covariance Heterogeneity

Author: Chang Jinyuan
Zheng Chao
Zhou Wen
Zhou Wen-Xin
Publication venue: 'Wiley'
Publication date: 24/02/2017
Field of study

In this paper, we study the problem of testing the mean vectors of high dimensional data in both one-sample and two-sample cases. The proposed testing procedures employ maximum-type statistics and the parametric bootstrap techniques to compute the critical values. Different from the existing tests that heavily rely on the structural conditions on the unknown covariance matrices, the proposed tests allow general covariance structures of the data and therefore enjoy wide scope of applicability in practice. To enhance powers of the tests against sparse alternatives, we further propose two-step procedures with a preliminary feature screening step. Theoretical properties of the proposed tests are investigated. Through extensive numerical experiments on synthetic datasets and an human acute lymphoblastic leukemia gene expression dataset, we illustrate the performance of the new tests and how they may provide assistance on detecting disease-associated gene-sets. The proposed methods have been implemented in an R-package HDtest and are available on CRAN.Comment: 34 pages, 10 figures; Accepted for biometric

arXiv.org e-Print Archive

Southampton (e-Prints Soton)

eScholarship - University of California

Lancaster E-Prints

University of Melbourne Institutional Repository

Multi-Entity Dependence Learning with Rich Context via Conditional Variational Auto-encoder

Author: Chen Di
Gomes Carla P.
Tang Luming
Xue Yexiang
Publication venue
Publication date: 17/09/2017
Field of study

Multi-Entity Dependence Learning (MEDL) explores conditional correlations among multiple entities. The availability of rich contextual information requires a nimble learning scheme that tightly integrates with deep neural networks and has the ability to capture correlation structures among exponentially many outcomes. We propose MEDL_CVAE, which encodes a conditional multivariate distribution as a generating process. As a result, the variational lower bound of the joint likelihood can be optimized via a conditional variational auto-encoder and trained end-to-end on GPUs. Our MEDL_CVAE was motivated by two real-world applications in computational sustainability: one studies the spatial correlation among multiple bird species using the eBird data and the other models multi-dimensional landscape composition and human footprint in the Amazon rainforest with satellite images. We show that MEDL_CVAE captures rich dependency structures, scales better than previous methods, and further improves on the joint likelihood taking advantage of very large datasets that are beyond the capacity of previous methods.Comment: The first two authors contribute equall

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications