Search CORE

28,730 research outputs found

MAID : An effect size based model for microarray data integration across laboratories and platforms

Author: Borozan Ivan
Chen Limin
Edwards Aled M
Heathcote Jenny E
Katze Michael
McGilvray Ian D
Paeper Bryan
Zhang Zhaolei
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Gene expression profiling has the potential to unravel molecular mechanisms behind gene regulation and identify gene targets for therapeutic interventions. As microarray technology matures, the number of microarray studies has increased, resulting in many different datasets available for any given disease. The increase in sensitivity and reliability of measurements of gene expression changes can be improved through a systematic integration of different microarray datasets that address the same or similar biological questions. Results Traditional effect size models can not be used to integrate array data that directly compare treatment to control samples expressed as log ratios of gene expressions. Here we extend the traditional effect size model to integrate as many array datasets as possible. The extended effect size model (MAID) can integrate any array datatype generated with either single or two channel arrays using either direct or indirect designs across different laboratories and platforms. The model uses two standardized indices, the standard effect size score for experiments with two groups of data, and a new standardized index that measures the difference in gene expression between treatment and control groups for one sample data with replicate arrays. The statistical significance of treatment effect across studies for each gene is determined by appropriate permutation methods depending on the type of data integrated. We apply our method to three different expression datasets from two different laboratories generated using three different array platforms and two different experimental designs. Our results indicate that the proposed integration model produces an increase in statistical power for identifying differentially expressed genes when integrating data across experiments and when compared to other integration models. We also show that genes found to be significant using our data integration method are of direct biological relevance to the three experiments integrated. Conclusion High-throughput genomics data provide a rich and complex source of information that could play a key role in deciphering intricate molecular networks behind disease. Here we propose an extension of the traditional effect size model to allow the integration of as many array experiments as possible with the aim of increasing the statistical power for identifying differentially expressed genes.</p

University of Toronto Research Repository

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The Reproducibility of Lists of Differentially Expressed Genes in Microarray Studies

Reproducibility is a fundamental requirement in scientific experiments and clinical contexts. Recent publications raise concerns about the reliability of microarray technology because of the apparent lack of agreement between lists of differentially expressed genes (DEGs). In this study we demonstrate that (1) such discordance may stem from ranking and selecting DEGs solely by statistical significance (P) derived from widely used simple t-tests; (2) when fold change (FC) is used as the ranking criterion, the lists become much more reproducible, especially when fewer genes are selected; and (3) the instability of short DEG lists based on P cutoffs is an expected mathematical consequence of the high variability of the t-values. We recommend the use of FC ranking plus a non-stringent P cutoff as a baseline practice in order to generate more reproducible DEG lists. The FC criterion enhances reproducibility while the P criterion balances sensitivity and specificity

Crossref

Nature Precedings

Application of Volcano Plots in Analyses of mRNA Differential Expressions with Microarrays

Author: Alvord W. G.
Auer P. L.
Chen Y.
Chen Z.
Cohen J.
Fechner G. T.
Guyon I.
Göhlmann H.
Lee J.
Li C.
Schwender H.
Smyth G. K.
Snedecor G. W.
Trevino V.
Vandesompele J.
Welsh B. L.
WENTIAN LI
Zhao C.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 28/08/2013
Field of study

Volcano plot displays unstandardized signal (e.g. log-fold-change) against noise-adjusted/standardized signal (e.g. t-statistic or -log10(p-value) from the t test). We review the basic and an interactive use of the volcano plot, and its crucial role in understanding the regularized t-statistic. The joint filtering gene selection criterion based on regularized statistics has a curved discriminant line in the volcano plot, as compared to the two perpendicular lines for the "double filtering" criterion. This review attempts to provide an unifying framework for discussions on alternative measures of differential expression, improved methods for estimating variance, and visual display of a microarray analysis result. We also discuss the possibility to apply volcano plots to other fields beyond microarray.Comment: 8 figure

arXiv.org e-Print Archive

Crossref

Estimating the proportion of differentially expressed genes in comparative DNA microarray experiments

Author: Cabrera Javier
Yu Ching-Ray
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2007
Field of study

DNA microarray experiments, a well-established experimental technique, aim at understanding the function of genes in some biological processes. One of the most common experiments in functional genomics research is to compare two groups of microarray data to determine which genes are differentially expressed. In this paper, we propose a methodology to estimate the proportion of differentially expressed genes in such experiments. We study the performance of our method in a simulation study where we compare it to other standard methods. Finally we compare the methods in real data from two toxicology experiments with mice.Comment: Published at http://dx.doi.org/10.1214/074921707000000076 in the IMS Lecture Notes Monograph Series (http://www.imstat.org/publications/lecnotes.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Transcription analysis of apple fruit development using cDNA microarrays

Author: Costa F.
Gianfranceschi L.
Molthoff J.W.
Schouten H.J.
Soglio V.
Weemen-Hendriks M.
Publication venue
Publication date: 01/01/2009
Field of study

The knowledge of the molecular mechanisms underlying fruit quality traits is fundamental to devise efficient marker-assisted selection strategies and to improve apple breeding. In this study, cDNA microarray technology was used to identify genes whose expression changes during fruit development and maturation thus potentially involved in fruit quality traits. The expression profile of 1,536 transcripts was analysed by microarray hybridisation. A total of 177 genes resulted to be differentially expressed in at least one of the developmental stages considered. Gene ontology annotation was employed to univocally describe gene function, while cluster analysis allowed grouping genes according to their expression profile. An overview of the transcriptional changes and of the metabolic pathways involved in fruit development was obtained. As expected, August and September are the two months where the largest number of differentially expressed genes was observed. In particular, 85 genes resulted to be up-regulated in September. Even though most of the differentially expressed genes are involved in primary metabolism, several other interesting functions were detected and will be presented

AIR Universita degli studi di Milano

Wageningen University & Research Publications

Predictive response-relevant clustering of expression data provides insights into disease processes

Author: Abe
Amanda K. Sampson
Anna F. Dominiczak
Bach
Bae
Benjamini
Bennett
Bishop
Breitling
Bunger
Clark
de Snoo
Delyth Graham
Doi
Dudoit
Golub
Gore
Graham
Graham Young
Hanczar
Harris
Hoffbrand
Hubert
Huffman
Irizarry
Jeffs
John D. McClure
Kearney
Keith J. Harris
Lee
Lee
Lisa E. M. Hopcroft
Mark A. Girolami
Martin W. McBride
McBride
Mohri
Park
Stein
Tessa L. Holyoake
Tibshirani
Vinh
Weinberger
Woon
Ziino
Zuber
Publication venue: 'Oxford University Press (OUP)'
Publication date: 22/06/2010
Field of study

This article describes and illustrates a novel method of microarray data analysis that couples model-based clustering and binary classification to form clusters of ;response-relevant' genes; that is, genes that are informative when discriminating between the different values of the response. Predictions are subsequently made using an appropriate statistical summary of each gene cluster, which we call the ;meta-covariate' representation of the cluster, in a probit regression model. We first illustrate this method by analysing a leukaemia expression dataset, before focusing closely on the meta-covariate analysis of a renal gene expression dataset in a rat model of salt-sensitive hypertension. We explore the biological insights provided by our analysis of these data. In particular, we identify a highly influential cluster of 13 genes-including three transcription factors (Arntl, Bhlhe41 and Npas2)-that is implicated as being protective against hypertension in response to increased dietary sodium. Functional and canonical pathway analysis of this cluster using Ingenuity Pathway Analysis implicated transcriptional activation and circadian rhythm signalling, respectively. Although we illustrate our method using only expression data, the method is applicable to any high-dimensional datasets

Crossref

PubMed Central

Enlighten

White Rose Research Online

CUED - Cambridge University Engineering Department

Diverse correlation structures in gene expression data and their utility in improving statistical inference

Author: Klebanov Lev
Yakovlev Andrei
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 13/12/2007
Field of study

It is well known that correlations in microarray data represent a serious nuisance deteriorating the performance of gene selection procedures. This paper is intended to demonstrate that the correlation structure of microarray data provides a rich source of useful information. We discuss distinct correlation substructures revealed in microarray gene expression data by an appropriate ordering of genes. These substructures include stochastic proportionality of expression signals in a large percentage of all gene pairs, negative correlations hidden in ordered gene triples, and a long sequence of weakly dependent random variables associated with ordered pairs of genes. The reported striking regularities are of general biological interest and they also have far-reaching implications for theory and practice of statistical methods of microarray data analysis. We illustrate the latter point with a method for testing differential expression of nonoverlapping gene pairs. While designed for testing a different null hypothesis, this method provides an order of magnitude more accurate control of type 1 error rate compared to conventional methods of individual gene expression profiling. In addition, this method is robust to the technical noise. Quantitative inference of the correlation structure has the potential to extend the analysis of microarray data far beyond currently practiced methods.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS120 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

The Functional Consequences of Variation in Transcription Factor Binding

Author: Cusanovich Darren A.
Gilad Yoav
Pavlovic Bryan
Pritchard Jonathan K.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 18/10/2013
Field of study

One goal of human genetics is to understand how the information for precise and dynamic gene expression programs is encoded in the genome. The interactions of transcription factors (TFs) with DNA regulatory elements clearly play an important role in determining gene expression outputs, yet the regulatory logic underlying functional transcription factor binding is poorly understood. Many studies have focused on characterizing the genomic locations of TF binding, yet it is unclear to what extent TF binding at any specific locus has functional consequences with respect to gene expression output. To evaluate the context of functional TF binding we knocked down 59 TFs and chromatin modifiers in one HapMap lymphoblastoid cell line. We then identified genes whose expression was affected by the knockdowns. We intersected the gene expression data with transcription factor binding data (based on ChIP-seq and DNase-seq) within 10 kb of the transcription start sites of expressed genes. This combination of data allowed us to infer functional TF binding. On average, 14.7% of genes bound by a factor were differentially expressed following the knockdown of that factor, suggesting that most interactions between TF and chromatin do not result in measurable changes in gene expression levels of putative target genes. We found that functional TF binding is enriched in regulatory elements that harbor a large number of TF binding sites, at sites with predicted higher binding affinity, and at sites that are enriched in genomic regions annotated as active enhancers.Comment: 30 pages, 6 figures (7 supplemental figures and 6 supplemental tables available upon request to [email protected]). Submitted to PLoS Genetic

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare