Article thumbnail
Location of Repository

DifFUZZY: A fuzzy clustering algorithm for complex data sets

By Ornella Cominetti, Anastasios Matzavinos, Sandhya Samarasinghe, Don Kulasiri, Sijia Liu, P. K. Maini and R. Erban

Abstract

Soft (fuzzy) clustering techniques are often used in the study of high-dimensional datasets, such as microarray and other high-throughput bioinformatics data. The most widely used method is the fuzzy C-means (FCM) algorithm , but it can present difficulties when dealing with some datasets. A fuzzy clustering algorithm, DifFUZZY, which utilises concepts from diffusion processes in graphs and is applicable to a larger class of clustering problems than other fuzzy clustering algorithms is developed. Examples of datasets (synthetic and real) for which this method outperforms other frequently used algorithms are presented, including two benchmark biological datasets, a genetic expression dataset and a dataset that contains taxonomic measurements. This method is better than traditional fuzzy clustering algorithms at handling datasets that are ‘curved’, elongated or those which contain clusters of different dispersion. The algorithm has been implemented in Matlab and C++ and is available at http://www.maths.ox.ac.uk/cmb/difFUZZY

Topics: Biology and other natural sciences
Publisher: Inderscience Publishers
Year: 2010
OAI identifier: oai:generic.eprints.org:1044/core69

Suggested articles


To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.