Search CORE

4,251 research outputs found

Meta-analysis of gene expression microarrays with missing replicates

Author: A Boussioutas
A Ramasamy
Adam Kowalczyk
AP Dempster
AV Ivshina
C Desmedt
CH Ahn
Christopher Leckie
D Ghosh
D Petersen
DB Rubin
DR Rhodes
E Tahara
Fan Shi
G Marot
Gad Abraham
HY Dai
I Borozan
Izhak Haviv
J Mosley
JD Wren
JF Ji
JK Choi
JL Schafer
JR Stevens
K Shen
L Xu
L Xu
LJ van 't Veer
LV Hedges
M Ashburner
M Schmidt
N Crawford
P Lauren
P Warnat
R Breitling
R DerSimonian
R Edgar
S Loi
S Loi
SF Arnold
T Aittokallio
T Beissbarth
VG Tusher
WG Cochran
WG Stetler-Stevenson
Y Benjamini
Y Hippo
Y Wang
Y Yonemura
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Many different microarray experiments are publicly available today. It is natural to ask whether different experiments for the same phenotypic conditions can be combined using meta-analysis, in order to increase the overall sample size. However, some genes are not measured in all experiments, hence they cannot be included or their statistical significance cannot be appropriately estimated in traditional meta-analysis. Nonetheless, these genes, which we refer to as <it>incomplete genes</it>, may also be informative and useful. Results We propose a meta-analysis framework, called "Incomplete Gene Meta-analysis", which can include incomplete genes by imputing the significance of missing replicates, and computing a meta-score for every gene across all datasets. We demonstrate that the incomplete genes are worthy of being included and our method is able to appropriately estimate their significance in two groups of experiments. We first apply the <it>Incomplete Gene Meta-analysis </it>and several comparable methods to five breast cancer datasets with an identical set of probes. We simulate incomplete genes by randomly removing a subset of probes from each dataset and demonstrate that our method consistently outperforms two other methods in terms of their false discovery rate. We also apply the methods to three gastric cancer datasets for the purpose of discriminating diffuse and intestinal subtypes. Conclusions Meta-analysis is an effective approach that identifies more robust sets of differentially expressed genes from multiple studies. The incomplete genes that mainly arise from the use of different platforms may also have statistical and biological importance but are ignored or are not appropriately involved by previous studies. Our Incomplete Gene Meta-analysis is able to incorporate the incomplete genes by estimating their significance. The results on both breast and gastric cancer datasets suggest that the highly ranked genes and associated GO terms produced by our method are more significant and biologically meaningful according to the previous literature.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Melbourne Institutional Repository

A meta-data based method for DNA microarray imputation

Author: Jörnsten Rebecka
Ouyang Ming
Wang Hui-Yu
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: DNA microarray experiments are conducted in logical sets, such as time course profiling after a treatment is applied to the samples, or comparisons of the samples under two or more conditions. Due to cost and design constraints of spotted cDNA microarray experiments, each logical set commonly includes only a small number of replicates per condition. Despite the vast improvement of the microarray technology in recent years, missing values are prevalent. Intuitively, imputation of missing values is best done using many replicates within the same logical set. In practice, there are few replicates and thus reliable imputation within logical sets is difficult. However, it is in the case of few replicates that the presence of missing values, and how they are imputed, can have the most profound impact on the outcome of downstream analyses (e.g. significance analysis and clustering). This study explores the feasibility of imputation across logical sets, using the vast amount of publicly available microarray data to improve imputation reliability in the small sample size setting. RESULTS: We download all cDNA microarray data of Saccharomyces cerevisiae, Arabidopsis thaliana, and Caenorhabditis elegans from the Stanford Microarray Database. Through cross-validation and simulation, we find that, for all three species, our proposed imputation using data from public databases is far superior to imputation within a logical set, sometimes to an astonishing degree. Furthermore, the imputation root mean square error for significant genes is generally a lot less than that of non-significant ones. CONCLUSION: Since downstream analysis of significant genes, such as clustering and network analysis, can be very sensitive to small perturbations of estimated gene effects, it is highly recommended that researchers apply reliable data imputation prior to further analysis. Our method can also be applied to cDNA microarray experiments from other species, provided good reference data are available

Crossref

Springer - Publisher Connector

PubMed Central

Application of a correlation correction factor in a microarray cross-platform reproducibility study

Author: Archer Kellie J.
Chaplin Michael D.
Dumur Catherine I.
Ferreira-Gonzalez Andrea
Garrett Carleton T.
Grant Geraldine
Guiseppi-Elie Anthony
Taylor G. Scott
Publication venue: VCU Scholars Compass
Publication date: 01/01/2007
Field of study

Background Recent research examining cross-platform correlation of gene expression intensities has yielded mixed results. In this study, we demonstrate use of a correction factor for estimating cross-platform correlations. Results In this paper, three technical replicate microarrays were hybridized to each of three platforms. The three platforms were then analyzed to assess both intra- and cross-platform reproducibility. We present various methods for examining intra-platform reproducibility. We also examine cross-platform reproducibility using Pearson\u27s correlation. Additionally, we previously developed a correction factor for Pearson\u27s correlation which is applicable when X and Y are measured with error. Herein we demonstrate that correcting for measurement error by estimating the disattenuated correlation substantially improves cross-platform correlations. Conclusion When estimating cross-platform correlation, it is essential to thoroughly evaluate intra-platform reproducibility as a first step. In addition, since measurement error is present in microarray gene expression data, methods to correct for attenuation are useful in decreasing the bias in cross-platform correlation estimates

Springer - Publisher Connector

PubMed Central

VCU Scholars Compass

Physico-chemical foundations underpinning microarray and next-generation sequencing experiments

Author: A. Buhot
A. E. Pozhitkov
A. Halperin
A. Harrison
A. Ott
Amend
B. M. Pettitt
Berger
Binder
Binder
Binder
Binder
Bolstad
Bullard
Burden
Burden
C. Gibas
C. J. Burden
Chou
Chou
Czypionka
D. P. Kreil
D. Tautz
E. Carlon
Fasold
Fasold
Fiche
Fuchs
H. Binder
Halperin
Harr
Harrison
He
Heim
Held
Hooyberghs
Huettel
Iltumur
Irizarry
Irizarry
Irizarry
Irving
J. Hooyberghs
Kane
L. J. Gamble
Lee
Lee
Letowski
Li
Liebich
Lockhart
Luebke
Marshall
Matveeva
Mueckstein
Mulders
Naiser
Naiser
P. A. Noble
Pingel
Pozhitkov
R. Levicky
Relogio
Rouillard
Tanaka
Trapp
Upton
Vainrub
Wodicka
Yu
Zhang
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2013
Field of study

Hybridization of nucleic acids on solid surfaces is a key process involved in high-throughput technologies such as microarrays and, in some cases, next-generation sequencing (NGS). A physical understanding of the hybridization process helps to determine the accuracy of these technologies. The goal of a widespread research program is to develop reliable transformations between the raw signals reported by the technologies and individual molecular concentrations from an ensemble of nucleic acids. This research has inputs from many areas, from bioinformatics and biostatistics, to theoretical and experimental biochemistry and biophysics, to computer simulations. A group of leading researchers met in Ploen Germany in 2011 to discuss present knowledge and limitations of our physico-chemical understanding of high-throughput nucleic acid technologies. This meeting inspired us to write this summary, which provides an overview of the state-of-the-art approaches based on physico-chemical foundation to modeling of the nucleic acids hybridization process on solid surfaces. In addition, practical application of current knowledge is emphasized

University of Essex Research Repository

Crossref

Hal - Université Grenoble Alpes

PubMed Central

Warwick Research Archives Portal Repository

The Australian National University

HAL-CEA

MPG.PuRe

Empirical comparison of cross-platform normalization methods for gene expression data

Author: A Platts
A Ramasamy
A Shabalin
Applied Biosystems
B Bolstad
C Metz
C Yauk
D Hekstra
D Petersen
D Rhodes
E Glaab
F Hong
Faramarz Valafar
G Elvidge
G Hardiman
G Held
G Smyth
H Jiang
H Parkinson
H Yasrebi
I Borozan
I Wick
J Larkin
J Storey
J Storey
J Tuszynski
Jason Rudy
JM Chambers
K Kugler
K Noguchi
L Gautier
L Shi
L Shi
LIK Lin
M Barnes
M Benito
M Mulligan
M Schena
P Tan
P Warnat
P Wirapati
R Development Core Team
R Gentleman
R Grützmann
R Kothapalli
R Lacson
R Martinez
S Assou
S Carter
S Davis
S Rogic
T Barrett
VE Velculescu
W Kuo
W Kuo
W Walker
X Chi
XQ Xia
Y Woo
Z Hu
Z Wang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Simultaneous measurement of gene expression on a genomic scale can be accomplished using microarray technology or by sequencing based methods. Researchers who perform high throughput gene expression assays often deposit their data in public databases, but heterogeneity of measurement platforms leads to challenges for the combination and comparison of data sets. Researchers wishing to perform cross platform normalization face two major obstacles. First, a choice must be made about which method or methods to employ. Nine are currently available, and no rigorous comparison exists. Second, software for the selected method must be obtained and incorporated into a data analysis workflow. Results Using two publicly available cross-platform testing data sets, cross-platform normalization methods are compared based on inter-platform concordance and on the consistency of gene lists obtained with transformed data. Scatter and ROC-like plots are produced and new statistics based on those plots are introduced to measure the effectiveness of each method. Bootstrapping is employed to obtain distributions for those statistics. The consistency of platform effects across studies is explored theoretically and with respect to the testing data sets. Conclusions Our comparisons indicate that four methods, DWD, EB, GQ, and XPN, are generally effective, while the remaining methods do not adequately correct for platform effects. Of the four successful methods, XPN generally shows the highest inter-platform concordance when treatment groups are equally sized, while DWD is most robust to differently sized treatment groups and consistently shows the smallest loss in gene detection. We provide an R package, CONOR, capable of performing the nine cross-platform normalization methods considered. The package can be downloaded at <url>http://alborz.sdsu.edu/conor</url> and is available from CRAN.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Mayday SeaSight: Combined Analysis of Deep Sequencing and Microarray Data

Author: A Barski
A Mortazavi
B Ren
C Trapnell
C Trapnell
DS Horner
F Battke
Florian Battke
GJ Porreca
H Ji
H Li
J Dietzsch
J Marioni
K Kadota
Kay Nieselt
L Hillier
M Droege
N Gehlenborg
R Breitling
R Irizarry
S Bennet
S Symons
V Tusher
Vincent Laudet
Z Wang
Publication venue: Public Library of Science
Publication date: 31/01/2011
Field of study

Recently emerged deep sequencing technologies offer new high-throughput methods to quantify gene expression, epigenetic modifications and DNA-protein binding. From a computational point of view, the data is very different from that produced by the already established microarray technology, providing a new perspective on the samples under study and complementing microarray gene expression data. Software offering the integrated analysis of data from different technologies is of growing importance as new data emerge in systems biology studies. Mayday is an extensible platform for visual data exploration and interactive analysis and provides many methods for dissecting complex transcriptome datasets. We present Mayday SeaSight, an extension that allows to integrate data from different platforms such as deep sequencing and microarrays. It offers methods for computing expression values from mapped reads and raw microarray data, background correction and normalization and linking microarray probes to genomic coordinates. It is now possible to use Mayday's wealth of methods to analyze sequencing data and to combine data from different technologies in one analysis

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Gene ARMADA: an integrated multi-analysis platform for microarray data implemented in MATLAB

Author: Chatziioannou Aristotelis
Kolisis Fragiskos N
Moulos Panagiotis
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The microarray data analysis realm is ever growing through the development of various tools, open source and commercial. However there is absence of predefined rational algorithmic analysis workflows or batch standardized processing to incorporate all steps, from raw data import up to the derivation of significantly differentially expressed gene lists. This absence obfuscates the analytical procedure and obstructs the massive comparative processing of genomic microarray datasets. Moreover, the solutions provided, heavily depend on the programming skills of the user, whereas in the case of GUI embedded solutions, they do not provide direct support of various raw image analysis formats or a versatile and simultaneously flexible combination of signal processing methods. Results We describe here Gene ARMADA (Automated Robust MicroArray Data Analysis), a MATLAB implemented platform with a Graphical User Interface. This suite integrates all steps of microarray data analysis including automated data import, noise correction and filtering, normalization, statistical selection of differentially expressed genes, clustering, classification and annotation. In its current version, Gene ARMADA fully supports 2 coloured cDNA and Affymetrix oligonucleotide arrays, plus custom arrays for which experimental details are given in tabular form (Excel spreadsheet, comma separated values, tab-delimited text formats). It also supports the analysis of already processed results through its versatile import editor. Besides being fully automated, Gene ARMADA incorporates numerous functionalities of the Statistics and Bioinformatics Toolboxes of MATLAB. In addition, it provides numerous visualization and exploration tools plus customizable export data formats for seamless integration by other analysis tools or MATLAB, for further processing. Gene ARMADA requires MATLAB 7.4 (R2007a) or higher and is also distributed as a stand-alone application with MATLAB Component Runtime. Conclusion Gene ARMADA provides a highly adaptable, integrative, yet flexible tool which can be used for automated quality control, analysis, annotation and visualization of microarray data, constituting a starting point for further data interpretation and integration with numerous other tools.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Constructing a fish metabolic network model

Author: Brouwer Marius
Brown-Peterson Nancy
Li Shuzhao
Manning Charles S
Pozhitkov Alexander
Ryan Rachel A
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

We report the construction of a genome-wide fish metabolic network model, MetaFishNet, and its application to analyzing high throughput gene expression data. This model is a stepping stone to broader applications of fish systems biology, for example by guiding study design through comparison with human metabolism and the integration of multiple data types. MetaFishNet resources, including a pathway enrichment analysis tool, are accessible at http://metafishnet.appspot.com

Aquila Digital Community

Crossref

Springer - Publisher Connector

PubMed Central

MPG.PuRe