Search CORE

386 research outputs found

Nonparametric relevance-shifted multiple testing procedures for the analysis of high-dimensional multivariate data with small sample sizes

Author: AI Fleishman
C Frömke
C Li
Cornelia Frömke
D Hauschke
DC Polacek
DJ Schaid
E Witt
J Khan
JF Chich
L Guo
LA Hothorn
Ludwig A Hothorn
N Zimmermann
NF Cariello
OG Troyanskaya
PH Westfall
PH Westfall
S Dudoit
S Dudoit
S Holm
S Kropf
S Kropf
S Lange
Siegfried Kropf
T Speed
VR Iyer
Y Benjamini
Y Ge
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background In many research areas it is necessary to find differences between treatment groups with several variables. For example, studies of microarray data seek to find a significant difference in location parameters from zero or one for ratios thereof for each variable. However, in some studies a significant deviation of the difference in locations from zero (or 1 in terms of the ratio) is biologically meaningless. A relevant difference or ratio is sought in such cases. Results This article addresses the use of relevance-shifted tests on ratios for a multivariate parallel two-sample group design. Two empirical procedures are proposed which embed the relevance-shifted test on ratios. As both procedures test a hypothesis for each variable, the resulting multiple testing problem has to be considered. Hence, the procedures include a multiplicity correction. Both procedures are extensions of available procedures for point null hypotheses achieving exact control of the familywise error rate. Whereas the shift of the null hypothesis alone would give straight-forward solutions, the problems that are the reason for the empirical considerations discussed here arise by the fact that the shift is considered in both directions and the whole parameter space in between these two limits has to be accepted as null hypothesis. Conclusion The first algorithm to be discussed uses a permutation algorithm, and is appropriate for designs with a moderately large number of observations. However, many experiments have limited sample sizes. Then the second procedure might be more appropriate, where multiplicity is corrected according to a concept of data-driven order of hypotheses.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Institutionelles Repositorium der Leibniz Universität Hannover

Server für wissenschaftliche Schriften der Hochschule Hannover

Classes of Multiple Decision Functions Strongly Controlling FWER and FDR

Author: B Efron
B Efron
CR Genovese
E Roquain
EA Peña
Edsel A. Peña
G Blanchard
G Blanchard
G Kang
H Finner
J Scott
J Storey
JD Habiger
JD Habiger
JJ Goeman
JL Doob
Joshua D. Habiger
K Roeder
M Bogdan
M Guindani
P Müller
PH Westfall
PH Westfall
S Dudoit
S Holm
SK Sarkar
SK Sarkar
SK Sarkar
W Hoeffding
W Sun
W Wu
Wensong Wu
Y Benjamini
Y Benjamini
Z Šidák
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/07/2010
Field of study

This paper provides two general classes of multiple decision functions where each member of the first class strongly controls the family-wise error rate (FWER), while each member of the second class strongly controls the false discovery rate (FDR). These classes offer the possibility that an optimal multiple decision function with respect to a pre-specified criterion, such as the missed discovery rate (MDR), could be found within these classes. Such multiple decision functions can be utilized in multiple testing, specifically, but not limited to, the analysis of high-dimensional microarray data sets.Comment: 19 page

arXiv.org e-Print Archive

Crossref

Parallel multiplicity and error discovery rate (EDR) in microarray experiments

Author: A Farcomeni
AA Fodor
AJ Hackstadt
B Efron
B Efron
B Wu
Clay J Carter
G van Belle
GD Gey
H Hsueh
H Jiang
H Parikh
J Krützfeldt
JD Storey
JD Storey
JD Storey
JR Monaghan
L Aubert
MLT Lee
N Jain
P Broberg
S Dudoit
S Dudoit
S Holm
S Pounds
S Pounds
S Scheid
SB Pounds
SE Eckenrode
SH Jung
SJ Wang
VG Tusher
Wayne Wenzhong Xu
WW Xu
Y Benjamini
Y Benjamini
Y Hochberg
Y Hong
Y Zhao
YH Yang
Z Sidak
Z Wu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background In microarray gene expression profiling experiments, differentially expressed genes (DEGs) are detected from among tens of thousands of genes on an array using statistical tests. It is important to control the number of false positives or errors that are present in the resultant DEG list. To date, more than 20 different multiple test methods have been reported that compute overall Type I error rates in microarray experiments. However, these methods share the following dilemma: they have low power in cases where only a small number of DEGs exist among a large number of total genes on the array. Results This study contrasts parallel multiplicity of objectively related tests against the traditional simultaneousness of subjectively related tests and proposes a new assessment called the Error Discovery Rate (EDR) for evaluating multiple test comparisons in microarray experiments. Parallel multiple tests use only the negative genes that parallel the positive genes to control the error rate; while simultaneous multiple tests use the total unchanged gene number for error estimates. Here, we demonstrate that the EDR method exhibits improved performance over other methods in specificity and sensitivity in testing expression data sets with sequence digital expression confirmation, in examining simulation data, as well as for three experimental data sets that vary in the proportion of DEGs. The EDR method overcomes a common problem of previous multiple test procedures, namely that the Type I error rate detection power is low when the total gene number used is large but the DEG number is small. Conclusions Microarrays are extensively used to address many research questions. However, there is potential to improve the sensitivity and specificity of microarray data analysis by developing improved multiple test comparisons. This study proposes a new view of multiplicity in microarray experiments and the EDR provides an alternative multiple test method for Type I error control in microarray experiments.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Colored Motifs Reveal Computational Building Blocks in the C. elegans Brain

Author: AL Barabasi
AL Barabási
Arend Hintze
Christoph Adami
D Chase
D Hall
E Niebur
EL White
J Karbowski
J Richmond
J White
JG White
Jifeng Qian
JJ Rice
JJ Tyson
JW Lichtman
LH Hartwell
LR Varshney
M Newman
M Reigl
MEJ Newman
O Sporns
Olaf Sporns
P Westfall
PJ Ingram
R Milo
R Milo
RJ Prill
S Dudoit
S Song
S Wernicke
S Wernicke
S Wuchty
SS Shen-Orr
TB Achacoso
U Alon
W Callebaut
WP Lee
Y Yoshimura
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/12/2010
Field of study

Background: Complex networks can often be decomposed into less complex sub-networks whose structures can give hints about the functional organization of the network as a whole. However, these structural motifs can only tell one part of the functional story because in this analysis each node and edge is treated on an equal footing. In real networks, two motifs that are topologically identical but whose nodes perform very different functions will play very different roles in the network. Methodology/Principal Findings: Here, we combine structural information derived from the topology of the neuronal network of the nematode C. elegans with information about the biological function of these nodes, thus coloring nodes by function. We discover that particular colorations of motifs are significantly more abundant in the worm brain than expected by chance, and have particular computational functions that emphasize the feed-forward structure of information processing in the network, while evading feedback loops. Interneurons are strongly over-represented among the common motifs, supporting the notion that these motifs process and transduce the information from the sensor neurons towards the muscles. Some of the most common motifs identified in the search for significant colored motifs play a crucial role in the system of neurons controlling the worm's locomotion. Conclusions/Significance: The analysis of complex networks in terms of colored motifs combines two independent data sets to generate insight about these networks that cannot be obtained with either data set alone. The method is general and should allow a decomposition of any complex networks into its functional (rather than topological) motifs as long as both wiring and functional information is available

arXiv.org e-Print Archive

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Caltech Authors

Dalarna University College Electronic Archive

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Integrated analysis of the heterogeneous microarray data

Author: B Damdinsuren
B Stott
C Kendziorski
DR Rhodes
DR Rhodes
E Wiercinska
EA Bard-Chapeau
EA Bard-Chapeau
H Choi
J Hu
JK Choi
JK Choi
M Kerr
M Kerr
M Lee
MA Newton
R Boopathy
R Shen
R Shibata
S Dudoit
S Dudoit
S González
S Teglund
Sung Gon Yi
T Ideker
T Park
T Park
Taesung Park
VG Tusher
W Gao
W Pan
XX Tang
Y Benjamini
Y Midorikawa
YW Chen
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

A global statistical test for improved detection of gene activity

Author: B Lewin
D Martin
David JG Bakewell
DJ Bakewell
Ernst Wit
F Al-Shahrour
R Breitling
RB West
RJ Akhurst
S Dudoit
S Dudoit
X Cui
Y Hochberg
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A comprehensive evaluation of SAM, the SAM R-package and a simple modification to improve its performance

Author: AJ Rice
B Efron
B Efron
CM Kendziorski
GK Smyth
JG Thomas
M Newton
MA Newton
MK Kerr
O Larsson
P Delmar
S Dudoit
S Zhang
Shunpu Zhang
TR Golub
VG Tusher
W Huber
W Pan
W Pan
X Guo
Y Xie
Y Zhao
Publication venue: BioMed Central
Publication date: 01/06/2007
Field of study

Abstract Background The Significance Analysis of Microarrays (SAM) is a popular method for detecting significantly expressed genes and controlling the false discovery rate (FDR). Recently, it has been reported in the literature that the FDR is not well controlled by SAM. Due to the vast application of SAM in microarray data analysis, it is of great importance to have an extensive evaluation of SAM and its associated R-package (sam2.20). Results Our study has identified several discrepancies between SAM and sam2.20. One major difference is that SAM and sam2.20 use different methods for estimating FDR. Such discrepancies may cause confusion among the researchers who are using SAM or are developing the SAM-like methods. We have also shown that SAM provides no meaningful estimates of FDR and this problem has been corrected in sam2.20 by using a different formula for estimating FDR. However, we have found that, even with the improvement sam2.20 has made over SAM, sam2.20 may still produce erroneous and even conflicting results under certain situations. Using an example, we show that the problem of sam2.20 is caused by its use of asymmetric cutoffs which are due to the large variability of null scores at both ends of the order statistics. An obvious approach without the complication of the order statistics is the conventional symmetric cutoff method. For this reason, we have carried out extensive simulations to compare the performance of sam2.20 and the symmetric cutoff method. Finally, a simple modification is proposed to improve the FDR estimation of sam2.20 and the symmetric cutoff method. Conclusion Our study shows that the most serious drawback of SAM is its poor estimation of FDR. Although this drawback has been corrected in sam2.20, the control of FDR by sam2.20 is still not satisfactory. The comparison between sam2.20 and the symmetric cutoff method reveals that the relative performance of sam2.20 to the symmetric cutff method depends on the ratio of induced to repressed genes in a microarray data, and is also affected by the ratio of DE to EE genes and the distributions of induced and repressed genes. Numerical simulations show that the symmetric cutoff method has the biggest advantage over sam2.20 when there are equal number of induced and repressed genes (i.e., the ratio of induced to repressed genes is 1). As the ratio of induced to repressed genes moves away from 1, the advantage of the symmetric cutoff method to sam2.20 is gradually diminishing until eventually sam2.20 becomes significantly better than the symmetric cutoff method when the differentially expressed (DE) genes are either all induced or all repressed genes. Simulation results also show that our proposed simple modification provides improved control of FDR for both sam2.20 and the symmetric cutoff method.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Hepatic microRNA expression is associated with the response to interferon treatment of chronic hepatitis C

Author: AM Wertheimer
Atsushi Tajima
AY Gong
B Pulendran
CL Jopling
D Yu
E Sonkoly
Hidenori Toyoda
IM Pedersen
J Ji
JJ Feld
JL Dienstag
Katsuyuki Hayashi
Kunitada Shimotohno
L Chen
LG Guidotti
M Lagos-Quintana
M Sarasin-Filipowicz
MA Lindsay
Masahiko Kuroda
Masami Tanaka
MW Fried
N Akuta
N Enomoto
P Ferenci
PD Zamore
PW Hsu
PY Chen
RS Pillai
S Dudoit
S Dudoit
T Asselah
TW Nilsen
V Ambros
X Peng
Y Murakami
Y Tanaka
Yoshiki Murakami
ZM Younossi
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background HCV infection frequently induces chronic liver diseases. The current standard treatment for chronic hepatitis (CH) C combines pegylated interferon (IFN) and ribavirin, and is less than ideal due to undesirable effects. MicroRNAs (miRNAs) are endogenous small non-coding RNAs that control gene expression by degrading or suppressing the translation of target mRNAs. In this study we administered the standard combination treatment to CHC patients. We then examined their miRNA expression profiles in order to identify the miRNAs that were associated with each patient's drug response. Methods 99 CHC patients with no anti-viral therapy history were enrolled. The expression level of 470 mature miRNAs found their biopsy specimen, obtained prior to the combination therapy, were quantified using microarray analysis. The miRNA expression pattern was classified based on the final virological response to the combination therapy. Monte Carlo Cross Validation (MCCV) was used to validate the outcome of the prediction based on the miRNA expression profile. Results We found that the expression level of 9 miRNAs were significantly different in the sustained virological response (SVR) and non-responder (NR) groups. MCCV revealed an accuracy, sensitivity, and specificity of 70.5%, 76.5% and 63.3% in SVR and non-SVR and 70.0%, 67.5%, and 73.7% in relapse (R) and NR, respectively. Conclusions The hepatic miRNA expression pattern that exists in CHC patients before combination therapy is associated with their therapeutic outcome. This information can be utilized as a novel biomarker to predict drug response and can also be applied to developing novel anti-viral therapy for CHC patients.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Kyoto University Research Information Repository

The choice of null distributions for detecting gene-gene interactions in genome-wide association studies

Author: A Niu
B Efron
B Med
C Greene
C Greene
C Herold
C Yang
C Yang
Can Yang
D Balding
D Evans
E Eichler
H Cordell
Hong Xue
J Marchini
J Moore
J Moore
K Kira
L Wiskott
M Nelson
M Park
M Ritchie
PC Phillips
Qiang Yang
R Culverhouse
R Klein
R Tibshirani
S Dudoit
S Dudoit
S Purcell
T Hastie
T Hastie
T Wu
T Zheng
W Li
Weichuan Yu
WTCCC
X Chen
X Wan
X Wan
Xiang Wan
Y Benjamini
Y Zhang
Zengyou He
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Effect of various normalization methods on Applied Biosystems expression array system data

Author: A Hartemink
BM Bolstad
CA Heid
Catalin C Barbacioru
David N Keys
EF Petricoin 3rd
Frances Chan
GK Smyth
JL Hackett
Karen A Poulter
L Guo
R Canales
Raymond R Samaha
Roger D Canales
S Dudoit
T Patterson
UE Gibson
V Tusher
W Huber
WS Cleveland
Y Benjamini
Y Wang
YH Yang
YH Yang
Yongming A Sun
Yulei Wang
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: DNA microarray technology provides a powerful tool for characterizing gene expression on a genome scale. While the technology has been widely used in discovery-based medical and basic biological research, its direct application in clinical practice and regulatory decision-making has been questioned. A few key issues, including the reproducibility, reliability, compatibility and standardization of microarray analysis and results, must be critically addressed before any routine usage of microarrays in clinical laboratory and regulated areas can occur. In this study we investigate some of these issues for the Applied Biosystems Human Genome Survey Microarrays. RESULTS: We analyzed the gene expression profiles of two samples: brain and universal human reference (UHR), a mixture of RNAs from 10 cancer cell lines, using the Applied Biosystems Human Genome Survey Microarrays. Five technical replicates in three different sites were performed on the same total RNA samples according to manufacturer's standard protocols. Five different methods, quantile, median, scale, VSN and cyclic loess were used to normalize AB microarray data within each site. 1,000 genes spanning a wide dynamic range in gene expression levels were selected for real-time PCR validation. Using the TaqMan(® )assays data set as the reference set, the performance of the five normalization methods was evaluated focusing on the following criteria: (1) Sensitivity and reproducibility in detection of expression; (2) Fold change correlation with real-time PCR data; (3) Sensitivity and specificity in detection of differential expression; (4) Reproducibility of differentially expressed gene lists. CONCLUSION: Our results showed a high level of concordance between these normalization methods. This is true, regardless of whether signal, detection, variation, fold change measurements and reproducibility were interrogated. Furthermore, we used TaqMan(® )assays as a reference, to generate TPR and FDR plots for the various normalization methods across the assay range. Little impact is observed on the TP and FP rates in detection of differentially expressed genes. Additionally, little effect was observed by the various normalization methods on the statistical approaches analyzed which indicates a certain robustness of the analysis methods currently in use in the field, particularly when used in conjunction with the Applied Biosystems Gene Expression System

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central