Article thumbnail

CLIC: clustering analysis of large microarray datasets with individual dimension-based clustering

By Taegyun Yun, Taeho Hwang, Kihoon Cha and Gwan-Su Yi

Abstract

Large microarray data sets have recently become common. However, most available clustering methods do not easily handle large microarray data sets due to their very large computational complexity and memory requirements. Furthermore, typical clustering methods construct oversimplified clusters that ignore subtle but meaningful changes in the expression patterns present in large microarray data sets. It is necessary to develop an efficient clustering method that identifies both absolute expression differences and expression profile patterns in different expression levels for large microarray data sets. This study presents CLIC, which meets the requirements of clustering analysis particularly but not limited to large microarray data sets. CLIC is based on a novel concept in which genes are clustered in individual dimensions first and in which the ordinal labels of clusters in each dimension are then used for further full dimension-wide clustering. CLIC enables iterative sub-clustering into more homogeneous groups and the identification of common expression patterns among the genes separated in different groups due to the large difference in the expression levels. In addition, the computation of clustering is parallelized, the number of clusters is automatically detected, and the functional enrichment for each cluster and pattern is provided. CLIC is freely available at http://gexp2.kaist.ac.kr/clic

Topics: Articles
Publisher: Oxford University Press
OAI identifier: oai:pubmedcentral.nih.gov:2896182
Provided by: PubMed Central

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.

Suggested articles

Citations

  1. (1998). A genome-wide transcriptional analysis of the mitotic cell cycle.
  2. (2009). A modified hyperplane clustering algorithm allows for efficient and accurate clustering of extremely large datasets.
  3. (2002). Bayesian infinite mixture model based clustering of gene expression profiles.
  4. (2009). BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources.
  5. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.
  6. (2003). CLICK and EXPANDER: a system for clustering and visualizing gene expression data.
  7. (2000). CLICK: a clustering algorithm with applications to gene expression analysis.
  8. (1998). Cluster analysis and display of genome-wide expression patterns.
  9. (2006). Clustering microarray gene expression data using weighted Chinese restaurant process.
  10. (2009). COFECO: composite function annotation enriched by protein complex data.
  11. (1998). Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization.
  12. (2000). Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae.
  13. (2001). Controlling the false discovery rate in behavior genetics research.
  14. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling.
  15. (1999). Distinctive gene expression patterns in human mammary epithelial cells and breast cancers.
  16. (2003). eXPatGen: generating dynamic expression patterns for the systematic evaluation of analytical methods.
  17. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.
  18. (1999). Genome-wide analysis of DNA copy-number changes using cDNA microarrays.
  19. (2005). How fast is the k-means method?
  20. (2001). Integrated genomic and proteomic analyses of a systematically perturbed metabolic network.
  21. (2009). K-Boost: a scalable algorithm for high-quality clustering of microarray gene expression data.
  22. (2008). KEGG for linking genomes to life and the environment.
  23. (2001). Model-based clustering and data transformations for gene expression data.
  24. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.
  25. (2006). Neuronal and glioma-derived stem cell factor induces angiogenesis within the brain.
  26. (2000). Systematic variation in gene expression patterns in human cancer cell lines.
  27. (2005). Tight clustering: a resampling-based approach for identifying stable and tight patterns in data.