Search CORE

308 research outputs found

False discovery rate: setting the probability of false claim of detection

Author: Benjamini Y
G A Prodi
Hochberg Y
L Baggio
Publication venue: 'IOP Publishing'
Publication date: 01/01/2005
Field of study

When testing multiple hypothesis in a survey --e.g. many different source locations, template waveforms, and so on-- the final result consists in a set of confidence intervals, each one at a desired confidence level. But the probability that at least one of these intervals does not cover the true value increases with the number of trials. With a sufficiently large array of confidence intervals, one can be sure that at least one is missing the true value. In particular, the probability of false claim of detection becomes not negligible. In order to compensate for this, one should increase the confidence level, at the price of a reduced detection power. False discovery rate control is a relatively new statistical procedure that bounds the number of mistakes made when performing multiple hypothesis tests. We shall review this method, discussing exercise applications to the field of gravitational wave surveys.Comment: 7 pages, 3 table, 3 figures. Prepared for the Proceedings of GWDAW 9 (http://lappc-in39.in2p3.fr/GWDAW9) A new section was added with a numerical example, along with two tables and a figure related to the new section. Many smaller revisions to improve readibilit

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

A two-step hierarchical hypothesis set testing framework, with applications to gene expression data on ordered categories

Author: Debashis Ghosh
H Finner
KR Coser
R Heller
RJ Simes
S Holm
SK Sarkar
W Guo
W Sun
Y Benjamini
Y Benjamini
Y Benjamini
Y Hochberg
Yihan Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Bon-EV: an improved multiple testing procedure for controlling false discovery rates

Author: A Gordon
AL Plant
CG Begley
CG Begley
Dongmei Li
HJ Keselman
JD Storey
JD Storey
Martin Zand
MD Robinson
MD Robinson
R Khatree
S Dohler
SB Pounds
Thomas Fogg
Timothy Dye
V Stodden
Y Benjamini
Y Benjamini
Y Hochberg
Zidian Xie
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Pruning of genetic programming trees using permutation tests

Author: A Gandy
BFJ Manly
D Jackson
D Kinzett
EJ Vladislavleva
G Wahba
J Ni
JR Koza
JR Quinlan
MD Ernst
NQ Uy
Peter Rockett
PI Good
R Salustowicz
S García
T Hastie
Y Benjamini
Y Hochberg
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2020
Field of study

We present a novel approach based on statistical permutation tests for pruning redundant subtrees from genetic programming (GP) trees that allows us to explore the extent of effective redundancy . We observe that over a range of regression problems, median tree sizes are reduced by around 20% largely independent of test function, and that while some large subtrees are removed, the median pruned subtree comprises just three nodes; most take the form of an exact algebraic simplification. Our statistically-based pruning technique has allowed us to explore the hypothesis that a given subtree can be replaced with a constant if this substitution results in no statistical change to the behavior of the parent tree – what we term approximate simplification. In the eventuality, we infer that more than 95% of the accepted pruning proposals are the result of algebraic simplifications, which provides some practical insight into the scope of removing redundancies in GP trees

Crossref

White Rose Research Online

Evidence for gene–gene epistatic interactions among susceptibility loci for systemic lupus erythematosus

Author: Barreto
Benjamini
Cordell
DiPlacido
Fagerholm
Flesher
Good-Jacobson
Graham
Graham
Harley
Hellquist
Hochberg
Hom
Janeway
Kosoy
Kozyrev
Lage-Castellanos
Lee
Lee
Li
Magnusson
McHeyzer-Williams
Moore
Nath
Prokunina
Purcell
Remmers
Ritchie
Sanchez
Sawalha
Simpson
Suarez-Gestal
Tan
Tian
Velez
Yang
Yang
Publication venue: 'Wiley'
Publication date: 01/02/2012
Field of study

Objective Several confirmed genetic susceptibility loci for lupus have been described. To date, no clear evidence for genetic epistasis in lupus has been established. The aim of this study was to test for gene–gene interactions in a number of known lupus susceptibility loci. Methods Eighteen single‐nucleotide polymorphisms tagging independent and confirmed lupus susceptibility loci were genotyped in a set of 4,248 patients with lupus and 3,818 normal healthy control subjects of European descent. Epistasis was tested by a 2‐step approach using both parametric and nonparametric methods. The false discovery rate (FDR) method was used to correct for multiple testing. Results We detected and confirmed gene–gene interactions between the HLA region and CTLA4 , IRF5 , and ITGAM and between PDCD1 and IL21 in patients with lupus. The most significant interaction detected by parametric analysis was between rs3131379 in the HLA region and rs231775 in CTLA4 (interaction odds ratio 1.19, Z = 3.95, P = 7.8 × 10 −5 [FDR ≤0.05], P for multifactor dimensionality reduction = 5.9 × 10 −45 ). Importantly, our data suggest that in patients with lupus, the presence of the HLA lupus risk alleles in rs1270942 and rs3131379 increases the odds of also carrying the lupus risk allele in IRF5 (rs2070197) by 17% and 16%, respectively ( P = 0.0028 and P = 0.0047, respectively). Conclusion We provide evidence for gene–gene epistasis in systemic lupus erythematosus. These findings support a role for genetic interaction contributing to the complexity of lupus heritability.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/90353/1/33354_ftp.pd

Crossref

PubMed Central

King's Research Portal

Deep Blue Documents at the University of Michigan

Evaluation Method, Dataset Size or Dataset Content: How to Evaluate Algorithms for Image Matching?

Author: Adrian F. Clark
B Hanczar
C Schmid
CD Manning
D Lowe
E Bostanci
Erkan Bostanci
H Abdi
HO Hartley
J Derrac
J Liu
JM Lobo
K Sakthivel
LB Lusted
M Calonder
M Gönen
MW Vasey
N Kanwal
N Uemura
Nadia Kanwal
Q McNemar
R Frothingham
R Hartley
RS Galen
T Tuytelaars
VL Durkalski
XY Wang
Y Benjamini
Y Benjamini
Y Hochberg
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/01/2016
Field of study

Most vision papers have to include some evaluation work in order to demonstrate that the algorithm proposed is an improvement on existing ones. Generally, these evaluation results are presented in tabular or graphical forms. Neither of these is ideal because there is no indication as to whether any performance differences are statistically significant. Moreover, the size and nature of the dataset used for evaluation will obviously have a bearing on the results, and neither of these factors are usually discussed. This paper evaluates the effectiveness of commonly used performance characterization metrics for image feature detection and description for matching problems and explores the use of statistical tests such as McNemar’s test and ANOVA as better alternatives

University of Essex Research Repository

Keele Research Repository

Crossref

A constrained polynomial regression procedure for estimating the local False Discovery Rate

Author: A Ploner
Avner Bar-Hen
B Efron
B Efron
B Efron
BM Bolstad
Cyril Dalmasso
G Glonek
I Hedenfalk
J Aubert
JD Storey
JD Storey
JD Storey
JG Liao
M Langaas
MA Newton
NL Johnson
P Broberg
P Broët
PE Gill
Philippe Broët
S Scheid
W Pan
W Pan
X Guo
X Qiu
Y Benjamini
Y Hochberg
Y Hochberg
Y Wang
Y Xie
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background In the context of genomic association studies, for which a large number of statistical tests are performed simultaneously, the local False Discovery Rate (<it>lFDR</it>), which quantifies the evidence of a specific gene association with a clinical or biological variable of interest, is a relevant criterion for taking into account the multiple testing problem. The <it>lFDR </it>not only allows an inference to be made for each gene through its specific value, but also an estimate of Benjamini-Hochberg's False Discovery Rate (<it>FDR</it>) for subsets of genes. Results In the framework of estimating procedures without any distributional assumption under the alternative hypothesis, a new and efficient procedure for estimating the <it>lFDR </it>is described. The results of a simulation study indicated good performances for the proposed estimator in comparison to four published ones. The five different procedures were applied to real datasets. Conclusion A novel and efficient procedure for estimating <it>lFDR </it>was developed and evaluated.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Computation of significance scores of unweighted Gene Set Enrichment Analyses

Author: A Subramanian
A Zanzoni
Andreas Keller
C Backes
C Backes
Christina Backes
E Rubin
H Hermjakob
H Lee
Hans-Peter Lenhof
J Küntzer
J Lamb
L Salwinski
M Kanehisa
M Krull
S Kim
S Peri
S Wachi
T Barrett
TGO Consortium
V Matys
V Mootha
Y Benjamini
Y Hochberg
Z Jiang
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Gene Set Enrichment Analysis (GSEA) is a computational method for the statistical evaluation of sorted lists of genes or proteins. Originally GSEA was developed for interpreting microarray gene expression data, but it can be applied to any sorted list of genes. Given the gene list and an arbitrary biological category, GSEA evaluates whether the genes of the considered category are randomly distributed or accumulated on top or bottom of the list. Usually, significance scores (p-values) of GSEA are computed by nonparametric permutation tests, a time consuming procedure that yields only estimates of the p-values. Results We present a novel dynamic programming algorithm for calculating exact significance values of unweighted Gene Set Enrichment Analyses. Our algorithm avoids typical problems of nonparametric permutation tests, as varying findings in different runs caused by the random sampling procedure. Another advantage of the presented dynamic programming algorithm is its runtime and memory efficiency. To test our algorithm, we applied it not only to simulated data sets, but additionally evaluated expression profiles of squamous cell lung cancer tissue and autologous unaffected tissue.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Supplemental Information 4: A zip file containing the raw code files used to generate results in this manuscript

Author: Abecasis
Benjamini
Blake
Bock
Brennan
Briggs
Burga
Cancer Genome Atlas Research N
Cancer Genome Atlas Research N
Chalancon
Chou
Croft
Daily
Falcon
Feintuch
Fraley
Guo
Hasegawa
Hochberg
Jiang
Kanehisa
Lachmann
Larsen
Mar
Mason
Matys
Mootha
Munsky
Oron
Park
Quek
Raj
Raser
Ritchie
Welter
Wijetunga
Wong
Yan
Yu
Publication venue: 'PeerJ'
Publication date
Field of study

Crossref

Parallel multiplicity and error discovery rate (EDR) in microarray experiments

Author: A Farcomeni
AA Fodor
AJ Hackstadt
B Efron
B Efron
B Wu
Clay J Carter
G van Belle
GD Gey
H Hsueh
H Jiang
H Parikh
J Krützfeldt
JD Storey
JD Storey
JD Storey
JR Monaghan
L Aubert
MLT Lee
N Jain
P Broberg
S Dudoit
S Dudoit
S Holm
S Pounds
S Pounds
S Scheid
SB Pounds
SE Eckenrode
SH Jung
SJ Wang
VG Tusher
Wayne Wenzhong Xu
WW Xu
Y Benjamini
Y Benjamini
Y Hochberg
Y Hong
Y Zhao
YH Yang
Z Sidak
Z Wu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background In microarray gene expression profiling experiments, differentially expressed genes (DEGs) are detected from among tens of thousands of genes on an array using statistical tests. It is important to control the number of false positives or errors that are present in the resultant DEG list. To date, more than 20 different multiple test methods have been reported that compute overall Type I error rates in microarray experiments. However, these methods share the following dilemma: they have low power in cases where only a small number of DEGs exist among a large number of total genes on the array. Results This study contrasts parallel multiplicity of objectively related tests against the traditional simultaneousness of subjectively related tests and proposes a new assessment called the Error Discovery Rate (EDR) for evaluating multiple test comparisons in microarray experiments. Parallel multiple tests use only the negative genes that parallel the positive genes to control the error rate; while simultaneous multiple tests use the total unchanged gene number for error estimates. Here, we demonstrate that the EDR method exhibits improved performance over other methods in specificity and sensitivity in testing expression data sets with sequence digital expression confirmation, in examining simulation data, as well as for three experimental data sets that vary in the proportion of DEGs. The EDR method overcomes a common problem of previous multiple test procedures, namely that the Type I error rate detection power is low when the total gene number used is large but the DEG number is small. Conclusions Microarrays are extensively used to address many research questions. However, there is potential to improve the sensitivity and specificity of microarray data analysis by developing improved multiple test comparisons. This study proposes a new view of multiplicity in microarray experiments and the EDR provides an alternative multiple test method for Type I error control in microarray experiments.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central