Skip to main content
Article thumbnail
Location of Repository

Modeling and visualizing uncertainty in gene expression clusters using Dirichlet process mixtures

By Carl Edward Rasmussen, Bernard J. De la Cruz, Zoubin Ghahramani and David L. Wild


Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data, little attention has been paid to uncertainty in the results obtained. Dirichlet process mixture (DPM) models provide a nonparametric Bayesian alternative to the bootstrap approach to modeling uncertainty in gene expression clustering. Most previously published applications of Bayesian model-based clustering methods have been to short time series data. In this paper, we present a case study of the application of nonparametric Bayesian clustering methods to the clustering of high-dimensional nontime series gene expression data using full Gaussian covariances. We use the probability that two genes belong to the same cluster in a DPM model as a measure of the similarity of these gene expression profiles. Conversely, this probability can be used to define a dissimilarity measure, which, for the purposes of visualization, can be input to one of the standard linkage algorithms used for hierarchical clustering. Biologically plausible results are obtained from the Rosetta compendium of expression profiles which extend previously published cluster analyses of this data

Topics: QA, QH426
Publisher: IEEE
Year: 2009
OAI identifier:

Suggested articles


  1. (1973). A Bayesian Analysis of Some Nonparametric Problems,”
  2. (2002). A Bayesian Approach to Modelling Uncertainty in Gene Expression Clusters,”
  3. (2002). A Mixture Model-Based Approach to the Clustering of
  4. (2006). A Quantitative Study of Gene Regulation Involved in the Immune Response of Anopheline Mosquitoes:
  5. (2000). Assessing Reliability of Gene Clusters from
  6. (1995). Bayesian Density Estimation and Inference Using Mixtures,”
  7. (2005). Bayesian Hierarchical Clustering,”
  8. (2002). Bayesian Infinite Mixture Model Based Clustering of
  9. (2004). Bayesian Mixture Model Based Clustering of
  10. (2007). Bayesian Model Based Clustering Procedures,”
  11. (2001). Bootstrapping Cluster Analysis: Assessing the Reliability of Conclusions from Microarray Experiments,”
  12. (1999). Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays,”
  13. (1998). Cluster Analysis and Display of Genome-Wide Expression,”
  14. (1975). Clustering Algorithms.
  15. (2006). Clustering Microarray Gene Expression Data Using Weighted Chinese Restaurant Process,”
  16. (2006). Context-Specific Infinite Mixtures for Clustering Gene Expression Profiles across Diverse Microarray Dataset,”
  17. (1998). Dissecting Glycoprotein Biosynthesis by
  18. (2005). Exploration of the Function and Organization of the Yeast Early Secretory Pathway through an Epistatic Miniarray Profile,”
  19. (2000). Finite Mixture Models.
  20. (2000). Functional Discovery via a Compendium of Expression Profiles,”
  21. (1999). Genome-Wide Analysis of Gene Expression Regulated by the Yeast Cell Wall Integrity Signalling Pathway,”
  22. (2004). Go::Termfinder-Open Source Software for Accessing Gene Ontology Information and Finding Significantly Enriched Gene Ontology Terms Associated with a
  23. (1992). Imp2, a Nuclear Gene Controlling the Mitochondrial Dependence of Galactose,
  24. (2003). Information Theory, Inference and Learning Algorithms.
  25. (2003). Large-Scale Functional Genomic Analysis of Sporulation and Meiosis
  26. (2000). Markov Chain Sampling Methods for Dirichlet Process Mixture Models,”
  27. (2004). Mips: Analysis and Annotation of Proteins from Whole Genomes,”
  28. (1974). Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems,”
  29. (2001). Model Based Clustering and Data Transformations for Gene Expression Data,”
  30. (2006). Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model,” Bayesian Inference for Gene Expression
  31. (1984). On a Class of Bayesian Nonparametric Estimates: I. Density Estimates,”
  32. (2007). ru ¨r, “Nonparametric Bayesian Discrete Latent Variable Models for Unsupervised Learning,”
  33. (1994). Seripauperins of Saccharomyces Cerevisiae: A New Multigene Family Encoding Serine-Poor
  34. (2000). The Infinite Gaussian Mixture Model,”
  35. (1996). The Saccharomyces Cerevisiae imp2 Gene Encodes a Transcriptional Activator that
  36. (2004). The sir2
  37. (2000). Turning Genes Off by ssn6-tup1: A Conserved System of Transcriptional Repression
  38. (2001). Validating Clustering for

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.