unknown

Ensemble based Clustering of Plasmodium falciparum genes

Abstract

Ensemble learning is a recent and extended approach to the unsupervised data mining technique called clustering which is used from finding natunl gmupings that exist in a dataset. Hetre, we applied an ensemble based clustering algol'ithm called Random Fot·ests with Pat·tition amund Medoids (PAM) to multiple time sel'ies gene expt·ession data of Plasmodium falcipat·um. The Random Fot·est algol'ithm is most common ensemble leat·ning appmach that uses decision tt·ees. Random Fm·est consists of lat·ge numbet· of classification tt·ees (nnging fmm hundt·eds to thousands) built from rabootstnp sampling of the dataset. We also applied the following intemal clustet· validity measures; Silhouette Width index, Connectivity Index and the Dunn Index to select the optimal numbet· of final clustet·s. Om· t·esults show that ensemble based clustering is indeed a good altet·native fm· clustet· analysis with the premise of an improved performance ovet· traditional clustering algorithm

    Similar works