Search CORE

9 research outputs found

‘SGoFicance Trace’: Assessing Significance in High Dimensional Testing Problems

Author: Carvajal-Rodriguez Antonio
de Uña-Alvarez Jacobo
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Recently, an exact binomial test called SGoF (Sequential Goodness-of-Fit) has been introduced as a new method for handling high dimensional testing problems. SGoF looks for statistical significance when comparing the amount of null hypotheses individually rejected at level γ = 0.05 with the expected amount under the intersection null, and then proceeds to declare a number of effects accordingly. SGoF detects an increasing proportion of true effects with the number of tests, unlike other methods for which the opposite is true. It is worth mentioning that the choice γ = 0.05 is not essential to the SGoF procedure, and more power may be reached at other values of γ depending on the situation. In this paper we enhance the possibilities of SGoF by letting the γ vary on the whole interval (0,1). In this way, we introduce the ‘SGoFicance Trace’ (from SGoF's significance trace), a graphical complement to SGoF which can help to make decisions in multiple-testing problems. A script has been written for the computation in R of the SGoFicance Trace. This script is available from the web site http://webs.uvigo.es/acraaj/SGoFicance.htm

CiteSeerX

Public Library of Science (PLOS)

Investigo

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

PubMed Central

Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public Databases: An Illustration Using False Discovery Rates

Author: B Efron
C Dalmasso
C Tsai
D Nettleton
D Nguyen
David B. Allison
DB Allison
DB Allison
Gary L. Gadbury
GL Gadbury
GP Page
Greg Gibson
Grier P. Page
H Hsueh
J Yu
JD Storey
JD Storey
JD Storey
JG Liao
JJ Yang
K Kim
K Mavromatis
K Zhao
Lin Yang
M Langaas
MO Mosig
P Broberg
PA Broberg
Qinfang Xiang
RB Cattell
S Persson
S Pounds
S Pounds
S Pounds
S Scheid
S Singhal
Stephen Barnes
T Schweder
T Whitsett
T Whitsett
TS Mehta
TS Mehta
Y Benjamini
Y Benjamini
Publication venue: Public Library of Science
Publication date: 01/06/2008
Field of study

Plasmode is a term coined several years ago to describe data sets that are derived from real data but for which some truth is known. Omic techniques, most especially microarray and genomewide association studies, have catalyzed a new zeitgeist of data sharing that is making data and data sets publicly available on an unprecedented scale. Coupling such data resources with a science of plasmode use would allow statistical methodologists to vet proposed techniques empirically (as opposed to only theoretically) and with data that are by definition realistic and representative. We illustrate the technique of empirical statistics by consideration of a common task when analyzing high dimensional data: the simultaneous testing of hundreds or thousands of hypotheses to determine which, if any, show statistical significance warranting follow-on research. The now-common practice of multiple testing in high dimensional experiment (HDE) settings has generated new methods for detecting statistically significant results. Although such methods have heretofore been subject to comparative performance analysis using simulated data, simulating data that realistically reflect data from an actual HDE remains a challenge. We describe a simulation procedure using actual data from an HDE where some truth regarding parameters of interest is known. We use the procedure to compare estimates for the proportion of true null hypotheses, the false discovery rate (FDR), and a local version of FDR obtained from 15 different statistical methods

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

A robustness study of parametric and non-parametric tests in model-based multifactor dimensionality reduction for epistasis detection

Author: Gusareva ES
John JMM
Van Lishout F
Van Steen Kristel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Background: Applying a statistical method implies identifying underlying (model) assumptions and checking their validity in the particular context. One of these contexts is association modeling for epistasis detection. Here, depending on the technique used, violation of model assumptions may result in increased type I error, power loss, or biased parameter estimates. Remedial measures for violated underlying conditions or assumptions include data transformation or selecting a more relaxed modeling or testing strategy. Model-Based Multifactor Dimensionality Reduction (MB-MDR) for epistasis detection relies on association testing between a trait and a factor consisting of multilocus genotype information. For quantitative traits, the framework is essentially Analysis of Variance (ANOVA) that decomposes the variability in the trait amongst the different factors. In this study, we assess through simulations, the cumulative effect of deviations from normality and homoscedasticity on the overall performance of quantitative Model-Based Multifactor Dimensionality Reduction (MB-MDR) to detect 2-locus epistasis signals in the absence of main effects. Methodology: Our simulation study focuses on pure epistasis models with varying degrees of genetic influence on a quantitative trait. Conditional on a multilocus genotype, we consider quantitative trait distributions that are normal, chi-square or Student's t with constant or non-constant phenotypic variances. All data are analyzed with MB-MDR using the built-in Student's t-test for association, as well as a novel MB-MDR implementation based on Welch's t-test. Traits are either left untransformed or are transformed into new traits via logarithmic, standardization or rank-based transformations, prior to MB-MDR modeling. Results: Our simulation results show that MB-MDR controls type I error and false positive rates irrespective of the association test considered. Empirically-based MB-MDR power estimates for MB-MDR with Welch's t-tests are generally lower than those for MB-MDR with Student's t-tests. Trait transformations involving ranks tend to lead to increased power compared to the other considered data transformations. Conclusions: When performing MB-MDR screening for gene-gene interactions with quantitative traits, we recommend to first rank-transform traits to normality and then to apply MB-MDR modeling with Student's t-tests as internal tests for association

Ghent University Academic Bibliography

PubMed Central

A robustness study of parametric and non-parametric tests in model-based multifactor dimensionality reduction for epistasis detection

Author: A Tomarken
D Freedman
D Freedman
D Zimmerman
DC Howell
DM Evans
DW Zimmerman
Elena S Gusareva
ES Pearson
François Van Lishout
H Jin
HB Mann
HJ Keselman
J Gibbons
J Pratt
Jestinah M Mahachie John
JH McDonald
JM Mahachie John
JM Mahachie John
JM Mahachie John
JM Mahachie John
JV Bradley
K Van Steen
K Yang
Kristel Van Steen
L Goh
M Pett
M Weber
MD Ritchie
MDRA Jeanmougin
MH Kutner
ML Calle
MS Bartlett
PH Westfall
R Mani
R Wolfe
S Dudoit
SIB-W Szymczak
SS Sawilowsky
T Cattaert
T Cattaert
VN Danh
W Conover
WJ Conover
X Wang
XY Lou
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A robustness study of parametric and non-parametric tests in model-based multifactor dimensionality reduction for epistasis detection

Author
Publication venue: BioMed Central
Publication date
Field of study

Springer - Publisher Connector

On estimating the proportion of true null hypotheses for false discovery rate controlling procedures in exploratory DNA microarray studies

Author: Nguyen Danh V.
Publication venue
Publication date
Field of study

Research Papers in Economics

On estimating the proportion of true null hypotheses for false discovery rate controlling procedures in exploratory DNA microarray studies

Author: Allison
Benjamini
Benjamini
Chuaqui
Hansel
Hedenfalk
Ji
Lee
Newton
Nguyen
Pan
Schweder
Storey
Storey
Storey
Tusher
Yang
Yang
Zien
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Estimating the number of true null hypotheses from a histogram of p values

Author: Caldo Rico
Hwang J.T. Gene
Nettleton Dan
Nettleton Dan
Wise Roger
Publication venue: Iowa State University Digital Repository
Publication date: 01/09/2006
Field of study

In an earlier article, an intuitively appealing method for estimating the number of true null hypotheses in a multiple test situation was proposed. That article presented an iterative algorithm that relies on a histogram of observed p values to obtain the estimator. We characterize the limit of that iterative algorithm and show that the estimator can be computed directly without iteration. We compare the performance of the histogram-based estimator with other procedures for estimating the number of true null hypotheses from a collection of observed p values and find that the histogram-based estimator performs well in settings similar to those encountered in microarray data analysis. We demonstrate the approach using p values from a large microarray experiment aimed at uncovering molecular mechanisms of barley resistance to a fungal pathogen

Digital Repository @ Iowa State University (ISU)