344 research outputs found
Recommended from our members
Bayesian models for pooling microarray studies with multiple sources of replications
BACKGROUND: Biologists often conduct multiple but different cDNA microarray studies that all target the same biological system or pathway. Within each study, replicate slides within repeated identical experiments are often produced. Pooling information across studies can help more accurately identify true target genes. Here, we introduce a method to integrate multiple independent studies efficiently. RESULTS: We introduce a Bayesian hierarchical model to pool cDNA microarray data across multiple independent studies to identify highly expressed genes. Each study has multiple sources of variation, i.e. replicate slides within repeated identical experiments. Our model produces the gene-specific posterior probability of differential expression, which provides a direct method for ranking genes, and provides Bayesian estimates of false discovery rates (FDR). In simulations combining two and five independent studies, with fixed FDR levels, we observed large increases in the number of discovered genes in pooled versus individual analyses. When the number of output genes is fixed (e.g., top 100), the pooled model found appreciably more truly differentially expressed genes than the individual studies. We were also able to identify more differentially expressed genes from pooling two independent studies in Bacillus subtilis than from each individual data set. Finally, we observed that in our simulation studies our Bayesian FDR estimates tracked the true FDRs very well. CONCLUSION: Our method provides a cohesive framework for combining multiple but not identical microarray studies with several sources of replication, with data produced from the same platform. We assume that each study contains only two conditions: an experimental and a control sample. We demonstrated our model's suitability for a small number of studies that have been either pre-scaled or have no outliers
Consensus and meta-analysis regulatory networks for combining multiple microarray gene expression datasets
Microarray data is a key source of experimental data for modelling gene regulatory interactions from expression levels. With the rapid increase of publicly available microarray data comes the opportunity to produce regulatory network models based on multiple datasets. Such models are potentially more robust with greater confidence, and place less reliance on a single dataset. However, combining datasets directly can be difficult as experiments are often conducted on different microarray platforms, and in different laboratories leading to inherent biases in the data that are not always removed through pre-processing such as normalisation. In this paper we compare two frameworks for combining microarray datasets to model regulatory networks: pre- and post-learning aggregation. In pre-learning approaches, such as using simple scale-normalisation prior to the concatenation of datasets, a model is learnt from a combined dataset, whilst in post-learning aggregation individual models are learnt from each dataset and the models are combined. We present two novel approaches for post-learning aggregation, each based on aggregating high-level features of Bayesian network models that have been generated from different microarray expression datasets. Meta-analysis Bayesian networks are based on combining statistical confidences attached to network edges whilst Consensus Bayesian networks identify consistent network features across all datasets. We apply both approaches to multiple datasets from synthetic and real (Escherichia coli and yeast) networks and demonstrate that both methods can improve on networks learnt from a single dataset or an aggregated dataset formed using a standard scale-normalisation
Recommended from our members
Covariate-assisted ranking and screening for large-scale two-sample inference
Two-sample multiple testing has a wide range of applications. The conventionalpractice first reduces the original observations to a vector of p-values and then chooses a cutoffto adjust for multiplicity. However, this data reduction step could cause significant loss ofinformation and thus lead to suboptimal testing procedures.We introduce a new framework fortwo-sample multiple testing by incorporating a carefully constructed auxiliary variable in inferenceto improve the power. A data-driven multiple-testing procedure is developed by employinga covariate-assisted ranking and screening (CARS) approach that optimally combines the informationfrom both the primary and the auxiliary variables. The proposed CARS procedureis shown to be asymptotically valid and optimal for false discovery rate control. The procedureis implemented in the R package CARS. Numerical results confirm the effectiveness of CARSin false discovery rate control and show that it achieves substantial power gain over existingmethods. CARS is also illustrated through an application to the analysis of a satellite imagingdata set for supernova detection
Mixture-model based estimation of gene expression variance from public database improves identification of differentially expressed genes in small sized microarray data
Motivation: The small number of samples in many microarray experiments is a challenge for the correct identification of differentially expressed gens (DEGs) by conventional statistical means. Information from public microarray databases can help more efficient identification of DEGs. To model various experimental conditions of a public microarray database, we applied Gaussian mixture model and extracted bi- or tri-modal distributions of gene expression. Prior variance of Baldi's Bayesian framework was estimate for the analysis of the small sample-sized datasets
Reassessing Design and Analysis of two-Colour Microarray Experiments Using Mixed Effects Models
Gene expression microarray studies have led to interesting experimental design
and statistical analysis challenges. The comparison of expression profiles across
populations is one of the most common objectives of microarray experiments. In
this manuscript we review some issues regarding design and statistical analysis for
two-colour microarray platforms using mixed linear models, with special attention
directed towards the different hierarchical levels of replication and the consequent
effect on the use of appropriate error terms for comparing experimental groups.
We examine the traditional analysis of variance (ANOVA) models proposed for
microarray data and their extensions to hierarchically replicated experiments. In
addition, we discuss a mixed model methodology for power and efficiency calculations
of different microarray experimental designs
Consistent Differential Expression Pattern (CDEP) on microarray to identify genes related to metastatic behavior
<p>Abstract</p> <p>Background</p> <p>To utilize the large volume of gene expression information generated from different microarray experiments, several meta-analysis techniques have been developed. Despite these efforts, there remain significant challenges to effectively increasing the statistical power and decreasing the Type I error rate while pooling the heterogeneous datasets from public resources. The objective of this study is to develop a novel meta-analysis approach, Consistent Differential Expression Pattern (CDEP), to identify genes with common differential expression patterns across different datasets.</p> <p>Results</p> <p>We combined False Discovery Rate (FDR) estimation and the non-parametric RankProd approach to estimate the Type I error rate in each microarray dataset of the meta-analysis. These Type I error rates from all datasets were then used to identify genes with common differential expression patterns. Our simulation study showed that CDEP achieved higher statistical power and maintained low Type I error rate when compared with two recently proposed meta-analysis approaches. We applied CDEP to analyze microarray data from different laboratories that compared transcription profiles between metastatic and primary cancer of different types. Many genes identified as differentially expressed consistently across different cancer types are in pathways related to metastatic behavior, such as ECM-receptor interaction, focal adhesion, and blood vessel development. We also identified novel genes such as <it>AMIGO2</it>, <it>Gem</it>, and <it>CXCL11 </it>that have not been shown to associate with, but may play roles in, metastasis.</p> <p>Conclusions</p> <p>CDEP is a flexible approach that borrows information from each dataset in a meta-analysis in order to identify genes being differentially expressed consistently. We have shown that CDEP can gain higher statistical power than other existing approaches under a variety of settings considered in the simulation study, suggesting its robustness and insensitivity to data variation commonly associated with microarray experiments.</p> <p><b>Availability</b>: CDEP is implemented in R and freely available at: <url>http://genomebioinfo.musc.edu/CDEP/</url></p> <p><b>Contact</b>: [email protected]</p
Cross-platform Comparison of Two Pancreatic Cancer Phenotypes
Model-based approaches for combining gene expression data from multiple high throughput platforms can be sensitive to technological artifacts when the number of samples in each platform is small. This paper proposes simple tools for quantifying concordance in a small study of pancreatic cancer cells lines with an emphasis on visualizations that uncover intra- and inter-platform variation. Using this approach, we identify several transcripts from the integrative analysis whose over-or under-expression in pancreatic cancer cell lines was validated by qPCR
- âŠ