Motivation: We present a method for directly inferring transcriptional modules (TMs) by integrating gene expression and transcription factor binding (ChIP-chip) data. Our model extends a hierarchical Dirichlet process mixture model to allow data fusion on a gene-by-gene basis. This encodes the intuition that co-expression and co-regulation are not necessarily equivalent and hence we do not expect all genes to group similarly in both datasets. In particular, it allows us to identify the subset of genes that share the same structure of transcriptional modules in both datasets. 



Results: We find that by working on a gene-by-gene basis, our model is able to extract clusters with greater functional coherence than existing methods. By combining gene expression and transcription factor binding (ChIP-chip) data in this way, we are better able to determine the groups of genes that are most likely to represent underlying TMs

Antoniak

Bar-Joseph

Bernard J. de la Cruz

Bähler

Dahl

Datta

David L. Wild

Eisen

Falcon

Ferguson

Fritsch

Gasch

Gerber

Geweke

Harbison

Ideker

Ihmels

Jim E. Griffin

Kundaje

Medvedovic

Rasmussen

Reid

Richard S. Savage

Savage

Segal

Wild

Yeung

Zoubin Ghahramani

English

PubMed

MOTIVATION: We present a method for directly inferring transcriptional modules (TMs) by integrating gene expression and transcription factor binding (ChIP-chip) data. Our model extends a hierarchical Dirichlet process mixture model to allow data fusion on a gene-by-gene basis. This encodes the intuition that co-expression and co-regulation are not necessarily equivalent and hence we do not expect all genes to group similarly in both datasets. In particular, it allows us to identify the subset of genes that share the same structure of transcriptional modules in both datasets. RESULTS: We find that by working on a gene-by-gene basis, our model is able to extract clusters with greater functional coherence than existing methods. By combining gene expression and transcription factor binding (ChIP-chip) data in this way, we are better able to determine the groups of genes that are most likely to represent underlying TMs. AVAILABILITY: If interested in the code for the work presented in this article, please contact the authors. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

Savage, RS

Ghahramani, Z

Griffin, JE

de la Cruz, BJ

Wild, DL

CUED - Cambridge University Engineering Department

Discovering transcriptional modules by Bayesian data integration.

Savage, Richard S.

Ghahramani, Zoubin

Griffin, Jim E.

De la Cruz, Bernard J.

Wild, David L.

Warwick Research Archives Portal Repository

(2003a) Genome-wide discovery of transcriptional modules from DNA sequence and gene expression.

A Bayesian analysis of some nonparametric problems.

A Bayesian approach to modeling uncertainty in gene expression clusters.

A genome-wide transcriptional analysis of the mitotic cell cycle.

APPENDIX A A.1 THE ALGORITHM We can perform inference for this model using MCMC sampling, by extending the sampler in section 5.1 of (Teh

Automated discovery of functional generality of human gene expression programs.

Bayesian hierarchical model for transcriptional module discovery by jointly modeling gene expression and chip-chip data.

Bayesian inﬁnite mixture model based clustering of gene expression proﬁles.

Bayesian mixture model based clustering of replicated microarray data.

Cell-cycle control of gene expression in budding and ﬁssion yeast.

Cluster analysis and display of genome-wide expression.

Clustering gene-expression data with repeated measurements.

Clustering microarray gene expression data using weighted Chinese restaurant process.

Combining sequence and time series expression data to learn transcriptional modules.

Computational discovery of gene modules and regulatory networks.

Context-speciﬁc inﬁnite mixtures for clustering gene expression proﬁles across diverse microarray dataset.

Evaluating the accuracy of sampling-based approaches to calcualting posterior moments. In Bernardo,J.M. et al. (eds) Bayesian Statistics 4.

Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefﬁcient.

Genomic expression programs in the response of yeast cells to environmental changes.

Hierarchical Bayesian nonparametric models with applications. In Lid Hjort,N. et al. (eds), Bayesian Nonparametrics,

Hierarchical Dirichlet processes.

Improved criteria for clustering based on the posterior similarity matrix.

Integrated genomic and proteomic analyses of a systematically perturbed metabolic network.

Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes.

Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems.

Model-based clustering for expression data via a Dirichlet process mixture model. In

Modeling and visualizing uncertainty in gene expression clusters using Dirichlet process mixtures.

Module networks: Discovering regulatory modules and their condition speciﬁc regulators from gene expression data.

R/BHC: fast Bayesian hierarchical clustering for microarray data.

Revealing modular organization in the yeast transcriptional network.

The inﬁnite Gaussian mixture model.

Transcriptional programs: modelling higher order structure in transcriptional control.

Transcriptional regulatory code of a eukaryotic genome.

Transcriptional regulatory networks in Saccharomyces cerevisiae.

Using GOstats to test gene lists for GO term association.

Discovering transcriptional modules by Bayesian data integration

Crossref

Motivation: We present a method for directly inferring transcriptional modules (TMs) by integrating gene expression and transcription factor binding (ChIP-chip) data. Our model extends a hierarchical Dirichlet process mixture model to allow data fusion on a geneby- gene basis. This encodes the intuition that co-expression and co-regulation are not necessarily equivalent and hence we do not expect all genes to group similarly in both datasets. In particular, it allows us to identify the subset of genes that share the same structure of transcriptional modules in both datasets. 



Results: We find that by working on a gene-by-gene basis, our model is able to extract clusters with greater functional coherence than existing methods. By combining gene expression and transcription factor binding (ChIP-chip) data in this way, we are better able to determine the groups of genes that are most likely to represent underlying TMs

De La Cruz, Bernard J.

Kent Academic Repository

Discovering Transcriptional Modules from Bayesian Data Fusion

http://wrap.warwick.ac.uk/3277/1/WRAP_Wild_Transcriptional_modules.pdf

Discovering transcriptional modules by Bayesian data integration

Abstract

Similar works

Full text

Available Versions

CUED - Cambridge University Engineering Department

Warwick Research Archives Portal Repository

Crossref

Kent Academic Repository