Skip to main content
Article thumbnail
Location of Repository

R/BHC : fast Bayesian hierarchical clustering for microarray data

By Richard S. Savage, K. (Katherine) Heller, Yang Xu, Zoubin Ghahramani, William M. Truman, Murray Grant, Katherine J. Denby and David L. Wild


Background: \ud Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data analysis, little attention has been paid to uncertainty in the results obtained.\ud Results: \ud We present an R/Bioconductor port of a fast novel algorithm for Bayesian agglomerative hierarchical clustering and demonstrate its use in clustering gene expression microarray data. The method performs bottom-up hierarchical clustering, using a Dirichlet Process (infinite mixture) to model uncertainty in the data and Bayesian model selection to decide at each step which clusters to merge.\ud Conclusion: \ud Biologically plausible results are presented from a well studied data set: expression profiles of A. thaliana subjected to a variety of biotic and abiotic stresses. Our method avoids several limitations of traditional methods, for example how many clusters there should be and how to choose a principled distance metric

Topics: R1, QA
Publisher: BioMed Central Ltd.
OAI identifier:

Suggested articles


  1. (1998). Cluster Analysis and Display of Genome-wide Expression.
  2. (1999). A: Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays.
  3. (2002). D: A mixture model-based approach to the clustering of microarray expression data. Bioinformatics
  4. (2001). Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments.
  5. (2000). Zhao H: Assessing reliability of gene clusters from gene expression data.
  6. (2000). Functional Discovery via a Compendium of Expression Pro Cell
  7. (2003). Statistical signi for hierarchical clustering in genetic association and microarray expression studies. BMC bioinformatics
  8. (1975). Clustering Algorithms.
  9. (2001). Validating clustering for gene expression data. Bioinformatics
  10. Information Theory, Inference and Learning Algorithms. Cambridge:
  11. (2003). Bayesian clustering of many GARCH models. SSRN eLibrary
  12. (2005). S: Model-based clustering of multiple time series.
  13. (2007). WJ F: Bayesian Unsupervised Classi by Dirichlet Process Mixtures of Gaussian Processes.
  14. (2004). Rannala B: The Bayesian revolution in genetics.
  15. (2003). Density Modeling and Clustering Using Dirichlet Diusion Trees. In Bayesian Statistics, Volume 7. Edited by
  16. (2005). Bayesian coclustering of Anopheles gene expression time series: Study of immune defense response to multiple experimental challenges.
  17. (2006). A Quantitative Study of Gene Regulation Involved in the Immune Response of Anopheline Mosquitoes: An Application of Bayesian Hierarchical Clustering of Curves.
  18. (2007). DL: Modeling and Visualizing Uncertainty in Gene Expression Clusters using Dirichlet Process Mixtures.
  19. (2005). Ghahramani Z: Bayesian Hierarchical Clustering.
  20. (2000). The In Gaussian Mixture Model.
  21. (2007). Pseudomonas syringae pv. tomato hijacks the Arabidopsis abscisic acid signalling pathway to cause disease.
  22. (2004). F: A model-based background adjustment for oligonucleotide expression arrays.
  23. (2008). S: clValid, an R package for cluster validation.
  24. (1971). Objective criteria for the evaluation of clustering methods.
  25. (2003). Clustering gene-expression data with repeated measurements. Genome Biol
  26. (2001). Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science
  27. (2008). S: Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coecient. BMC bioinformatics
  28. (2007). Sidow A: Automated discovery of functional generality of human gene expression programs. PLoS Comput Biol
  29. (2007). Using GOstats to test gene lists for GO term association. Bioinformatics
  30. (2006). Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes. BMC bioinformatics
  31. (2007). AJ domain virulence eector of Pseudomonas syringae remodels host chloroplasts and suppresses defenses. Current Biology

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.