Skip to main content
Article thumbnail
Location of Repository

Partial mixture model for tight clustering of gene expression time-course

By Yinyin Yuan, Chang-Tsun Li and Roland Wilson

Abstract

Background: Tight clustering arose recently from a desire to obtain tighter and potentially more informative clusters in gene expression studies. Scattered genes with relatively loose correlations should be excluded from the clusters. However, in the literature there is little work dedicated to\ud this area of research. On the other hand, there has been extensive use of maximum likelihood techniques for model parameter estimation. By contrast, the minimum distance estimator has been largely ignored.\ud Results: In this paper we show the inherent robustness of the minimum distance estimator that makes it a powerful tool for parameter estimation in model-based time-course clustering. To apply minimum distance estimation, a partial mixture model that can naturally incorporate replicate\ud information and allow scattered genes is formulated. We provide experimental results of simulated data fitting, where the minimum distance estimator demonstrates superior performance to the maximum likelihood estimator. Both biological and statistical validations are conducted on a\ud simulated dataset and two real gene expression datasets. Our proposed partial regression clustering algorithm scores top in Gene Ontology driven evaluation, in comparison with four other popular clustering algorithms.\ud Conclusion: For the first time partial mixture model is successfully extended to time-course data analysis. The robustness of our partial regression clustering algorithm proves the suitability of the ombination of both partial mixture model and minimum distance estimator in this field. We show that tight clustering not only is capable to generate more profound understanding of the dataset\ud under study well in accordance to established biological knowledge, but also presents interesting new hypotheses during interpretation of clustering results. In particular, we provide biological evidences that scattered genes can be relevant and are interesting subjects for study, in contrast to prevailing opinion

Topics: QR
Publisher: BioMed Central Ltd.
Year: 2008
OAI identifier: oai:wrap.warwick.ac.uk:532

Suggested articles

Citations

  1. (2003). Clustering gene expression data with repeated measurements. Genome Biology
  2. (2003). Clustering of time-course gene expression data using a mixed-effects model with B-splines. Bioinformatics doi
  3. (2006). DA: A quantitative study of gene regulation involved in the immune response of Anopheline mosquitoes: An application of Bayesian hierarchical clustering of curves. doi
  4. (2005). Kusalik AJ: Dynamic model-based clustering for time-course gene expression data. doi
  5. (2006). SW: A Mixture model with random-effects components for clustering correlated gene-expression profiles. Bioinformatics doi
  6. (2006). Tseng GC: Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics doi
  7. (2005). Unsupervised pattern recognition: An introduction to the whys and wherefores of clustering microarray data. Brief Bioinform doi
  8. (2006). Wong WH: Computational Biology: Toward Deciphering Gene Regulatory Information in Mammalian Genomes. Biometrics doi

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.