Search CORE

Elsevier - Publisher Connector

Consensus and meta-analysis regulatory networks for combining multiple microarray gene expression datasets

Author: Akaike
Allan Tucker
Beissbarth
Conlon
Courcelle
DerSimonian
Eisen
Emma Steele
Faith
Friedman
Gasch
Grigull
Hanley
Hartemink
Jarvinen
Khil
Kuo
Matzkevich
Ng
Pearl
Pearl
Pennock
Pe’er
Pe’er
Quillardet
Salgado
Sangurdekar
Smyth
Soinov
Spellman
Stoica
Sutton
Teixeira
Wang
Yauk
Publication venue: 'Elsevier BV'
Publication date: 01/12/2008
Field of study

Microarray data is a key source of experimental data for modelling gene regulatory interactions from expression levels. With the rapid increase of publicly available microarray data comes the opportunity to produce regulatory network models based on multiple datasets. Such models are potentially more robust with greater confidence, and place less reliance on a single dataset. However, combining datasets directly can be difficult as experiments are often conducted on different microarray platforms, and in different laboratories leading to inherent biases in the data that are not always removed through pre-processing such as normalisation. In this paper we compare two frameworks for combining microarray datasets to model regulatory networks: pre- and post-learning aggregation. In pre-learning approaches, such as using simple scale-normalisation prior to the concatenation of datasets, a model is learnt from a combined dataset, whilst in post-learning aggregation individual models are learnt from each dataset and the models are combined. We present two novel approaches for post-learning aggregation, each based on aggregating high-level features of Bayesian network models that have been generated from different microarray expression datasets. Meta-analysis Bayesian networks are based on combining statistical confidences attached to network edges whilst Consensus Bayesian networks identify consistent network features across all datasets. We apply both approaches to multiple datasets from synthetic and real (Escherichia coli and yeast) networks and demonstrate that both methods can improve on networks learnt from a single dataset or an aggregated dataset formed using a standard scale-normalisation

Brunel University Research Archive

Classes of Multiple Decision Functions Strongly Controlling FWER and FDR

Author: B Efron
B Efron
CR Genovese
E Roquain
EA Peña
Edsel A. Peña
G Blanchard
G Blanchard
G Kang
H Finner
J Scott
J Storey
JD Habiger
JD Habiger
JJ Goeman
JL Doob
Joshua D. Habiger
K Roeder
M Bogdan
M Guindani
P Müller
PH Westfall
PH Westfall
S Dudoit
S Holm
SK Sarkar
SK Sarkar
SK Sarkar
W Hoeffding
W Sun
W Wu
Wensong Wu
Y Benjamini
Y Benjamini
Z Šidák
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/07/2010
Field of study

This paper provides two general classes of multiple decision functions where each member of the first class strongly controls the family-wise error rate (FWER), while each member of the second class strongly controls the false discovery rate (FDR). These classes offer the possibility that an optimal multiple decision function with respect to a pre-specified criterion, such as the missed discovery rate (MDR), could be found within these classes. Such multiple decision functions can be utilized in multiple testing, specifically, but not limited to, the analysis of high-dimensional microarray data sets.Comment: 19 page

arXiv.org e-Print Archive

EzArray: A web-based highly automated Affymetrix expression array data management and analysis system

Author: A Brazma
BM Bolstad
C Li
C Romualdi
CM Kendziorski
D Rajagopalan
E Hubbell
GK Smyth
H Rehrauer
HM Hsueh
J Rainer
JM Vaquerizas
JM Wettenhall
K Hokamp
L Jones
M Kapushesky
M Psarros
MA Newton
O Larsson
R Diaz-Uriarte
R Edgar
R Ihaka
RA Irizarry
RA Irizarry
S Dudoit
S Vardhanabhuti
S Zhang
VG Tusher
Wei Xu
WK Lim
WM Liu
X Xia
Y Barash
Yuelin Zhu
Yuerong Zhu
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Though microarray experiments are very popular in life science research, managing and analyzing microarray data are still challenging tasks for many biologists. Most microarray programs require users to have sophisticated knowledge of mathematics, statistics and computer skills for usage. With accumulating microarray data deposited in public databases, easy-to-use programs to re-analyze previously published microarray data are in high demand. Results EzArray is a web-based Affymetrix expression array data management and analysis system for researchers who need to organize microarray data efficiently and get data analyzed instantly. EzArray organizes microarray data into projects that can be analyzed online with predefined or custom procedures. EzArray performs data preprocessing and detection of differentially expressed genes with statistical methods. All analysis procedures are optimized and highly automated so that even novice users with limited pre-knowledge of microarray data analysis can complete initial analysis quickly. Since all input files, analysis parameters, and executed scripts can be downloaded, EzArray provides maximum reproducibility for each analysis. In addition, EzArray integrates with Gene Expression Omnibus (GEO) and allows instantaneous re-analysis of published array data. Conclusion EzArray is a novel Affymetrix expression array data analysis and sharing system. EzArray provides easy-to-use tools for re-analyzing published microarray data and will help both novice and experienced users perform initial analysis of their microarray data from the location of data storage. We believe EzArray will be a useful system for facilities with microarray services and laboratories with multiple members involved in microarray data analysis. EzArray is freely available from <url>http://www.ezarray.com/</url>.</p

Springer - Publisher Connector

Public Library of Science (PLOS)

A Comprehensive and Universal Method for Assessing the Performance of Differential Gene Expression Analyses

Author: A Oshlack
AB Baker
BJ McNeil
BM Bolstad
D Wang
DM Rocke
DS Johnson
GK Smyth
I Dozmorov
I Dozmorov
I Dozmorov
I Dozmorov
IB Jeffery
Igor M. Dozmorov
J Li
J Makhoul
Joel M. Guthridge
LX Qin
M Chiogna
M Dozmorov
M Gribskov
Mikhail G. Dozmorov
NN Khodarev
P Ryden
Robert E. Hurst
S Song
S Vardhanabhuti
T Barrett
T Mehta
T Park
Thomas Preiss
W Jin
W Wu
Publication venue: Public Library of Science
Publication date: 09/09/2010
Field of study

The number of methods for pre-processing and analysis of gene expression data continues to increase, often making it difficult to select the most appropriate approach. We present a simple procedure for comparative estimation of a variety of methods for microarray data pre-processing and analysis. Our approach is based on the use of real microarray data in which controlled fold changes are introduced into 20% of the data to provide a metric for comparison with the unmodified data. The data modifications can be easily applied to raw data measured with any technological platform and retains all the complex structures and statistical characteristics of the real-world data. The power of the method is illustrated by its application to the quantitative comparison of different methods of normalization and analysis of microarray data. Our results demonstrate that the method of controlled modifications of real experimental data provides a simple tool for assessing the performance of data preprocessing and analysis methods

arXiv.org e-Print Archive

"Pre-conditioning" for feature selection and regression in high-dimensional problems

Author: Bair Eric
Hastie Trevor
Paul Debashis
Tibshirani Robert
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 28/03/2007
Field of study

We consider regression problems where the number of predictors greatly exceeds the number of observations. We propose a method for variable selection that first estimates the regression function, yielding a "pre-conditioned" response variable. The primary method used for this initial regression is supervised principal components. Then we apply a standard procedure such as forward stepwise selection or the LASSO to the pre-conditioned response variable. In a number of simulated and real data examples, this two-step procedure outperforms forward stepwise selection or the usual LASSO (applied directly to the raw outcome). We also show that under a certain Gaussian latent variable model, application of the LASSO to the pre-conditioned response variable is consistent as the number of predictors and observations increases. Moreover, when the observational noise is rather large, the suggested procedure can give a more accurate estimate than LASSO. We illustrate our method on some real problems, including survival analysis with microarray data

CiteSeerX

Springer - Publisher Connector

CARMA: A platform for analyzing microarray datasets that incorporate replicate measures

Author: Brooks Heddwen L
Greer Kevin A
Hoying James B
McReynolds Matthew R
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The incorporation of statistical models that account for experimental variability provides a necessary framework for the interpretation of microarray data. A robust experimental design coupled with an analysis of variance (ANOVA) incorporating a model that accounts for known sources of experimental variability can significantly improve the determination of differences in gene expression and estimations of their significance. RESULTS: To realize the full benefits of performing analysis of variance on microarray data we have developed CARMA, a microarray analysis platform that reads data files generated by most microarray image processing software packages, performs ANOVA using a user-defined linear model, and produces easily interpretable graphical and numeric results. No pre-processing of the data is required and user-specified parameters control most aspects of the analysis including statistical significance criterion. The software also performs location and intensity dependent lowess normalization, automatic outlier detection and removal, and accommodates missing data. CONCLUSION: CARMA provides a clear quantitative and statistical characterization of each measured gene that can be used to assess marginally acceptable measures and improve confidence in the interpretation of microarray results. Overall, applying CARMA to microarray datasets incorporating repeated measures effectively reduces the number of gene incorrectly identified as differentially expressed and results in a more robust and reliable analysis