Evaluation of clustering algorithms for gene expression data

A Ruepp; I Gat-Viks; J Quackenbush; JA Hartigan; JD Banfield; JT Taylor; L Kaufman; MC Abba; PJ Rousseeuw; R Shamir; S Chu; S Datta; S Datta; S Datta; S Dudoit; Somnath Datta; Susmita Datta; T Kohonen; WN Venables

Evaluation of clustering algorithms for gene expression data

Authors: A Ruepp
I Gat-Viks
J Quackenbush
JA Hartigan
JD Banfield
JT Taylor
L Kaufman
MC Abba
PJ Rousseeuw
R Shamir
S Chu
S Datta
S Datta
S Datta
S Dudoit
Somnath Datta
Susmita Datta
T Kohonen
WN Venables
Publication date: 1 January 2006
Publisher: BioMed Central
Doi

Abstract

BACKGROUND: Cluster analysis is an integral part of high dimensional data analysis. In the context of large scale gene expression data, a filtered set of genes are grouped together according to their expression profiles using one of numerous clustering algorithms that exist in the statistics and machine learning literature. A closely related problem is that of selecting a clustering algorithm that is "optimal" in some sense from a rather impressive list of clustering algorithms that currently exist. RESULTS: In this paper, we propose two validation measures each with two parts: one measuring the statistical consistency (stability) of the clusters produced and the other representing their biological functional congruence. Smaller values of these indices indicate better performance for a clustering algorithm. We illustrate this approach using two case studies with publicly available gene expression data sets: one involving a SAGE data of breast cancer patients and the other involving a time course cDNA microarray data on yeast. Six well known clustering algorithms UPGMA, K-Means, Diana, Fanny, Model-Based and SOM were evaluated. CONCLUSION: No single clustering algorithm may be best suited for clustering genes into functional groups via expression profiles for all data sets. The validation measures introduced in this paper can aid in the selection of an optimal algorithm, for a given data set, from a collection of available clustering algorithms

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Crossref

Last time updated on 05/06/2019

Springer - Publisher Connector

Last time updated on 05/06/2019

Springer - Publisher Connector

Last time updated on 28/04/2017