Search CORE

102,517 research outputs found

Clustering of gene expression data: performance and similarity analysis

Author: B Everitt
Chun-Hsi Huang
D Botstein
J Dopazo
J Hartigan
J Herrero
J Herrero
J Tamames
Joaquín Dopazo
Jun Ni
K Alsabti
KY Yeung
Longde Yin
MB Eisen
MF Ramoni
P Tamayo
PT Spellman
R Cho
RL Stears
Sneath
T Kanungo
T Kohonen
T Kohonen
T Kohonen
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: DNA Microarray technology is an innovative methodology in experimental molecular biology, which has produced huge amounts of valuable data in the profile of gene expression. Many clustering algorithms have been proposed to analyze gene expression data, but little guidance is available to help choose among them. The evaluation of feasible and applicable clustering algorithms is becoming an important issue in today's bioinformatics research. RESULTS: In this paper we first experimentally study three major clustering algorithms: Hierarchical Clustering (HC), Self-Organizing Map (SOM), and Self Organizing Tree Algorithm (SOTA) using Yeast Saccharomyces cerevisiae gene expression data, and compare their performance. We then introduce Cluster Diff, a new data mining tool, to conduct the similarity analysis of clusters generated by different algorithms. The performance study shows that SOTA is more efficient than SOM while HC is the least efficient. The results of similarity analysis show that when given a target cluster, the Cluster Diff can efficiently determine the closest match from a set of clusters. Therefore, it is an effective approach for evaluating different clustering algorithms. CONCLUSION: HC methods allow a visual, convenient representation of genes. However, they are neither robust nor efficient. The SOM is more robust against noise. A disadvantage of SOM is that the number of clusters has to be fixed beforehand. The SOTA combines the advantages of both hierarchical and SOM clustering. It allows a visual representation of the clusters and their structure and is not sensitive to noises. The SOTA is also more flexible than the other two clustering methods. By using our data mining tool, Cluster Diff, it is possible to analyze the similarity of clusters generated by different algorithms and thereby enable comparisons of different clustering methods

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A statistical learning based approach for parameter fine-tuning of metaheuristics

Author: Calvet Liñán Laura
Juan Pérez Ángel Alejandro
Ries Jana
Serrat Piè Carles
Publication venue
Publication date: 01/01/2016
Field of study

Metaheuristics are approximation methods used to solve combinatorial optimization problems. Their performance usually depends on a set of parameters that need to be adjusted. The selection of appropriate parameter values causes a loss of efficiency, as it requires time, and advanced analytical and problem-specific skills. This paper provides an overview of the principal approaches to tackle the Parameter Setting Problem, focusing on the statistical procedures employed so far by the scientific community. In addition, a novel methodology is proposed, which is tested using an already existing algorithm for solving the Multi-Depot Vehicle Routing Problem.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

The Oberta in open access

Comparison of different strategies of utilizing fuzzy clustering in structure identification

Author: Kılıç Kemal
Türkşen I. Burhan
Uncu Özge
Publication venue: 'Elsevier BV'
Publication date: 01/12/2007
Field of study

Fuzzy systems approximate highly nonlinear systems by means of fuzzy "if-then" rules. In the literature, various algorithms are proposed for mining. These algorithms commonly utilize fuzzy clustering in structure identification. Basically, there are three different approaches in which one can utilize fuzzy clustering; the �first one is based on input space clustering, the second one considers clustering realized in the output space, while the third one is concerned with clustering realized in the combined input-output space. In this study, we analyze these three approaches. We discuss each of the algorithms in great detail and o¤er a thorough comparative analysis. Finally, we compare the performances of these algorithms in a medical diagnosis classi�cation problem, namely Aachen Aphasia Test. The experiment and the results provide a valuable insight about the merits and the shortcomings of these three clustering approaches

Sabanci University Research Database