Comparative Analysis of Data Mining Tools and Classification Techniques using WEKA in Medical Bioinformatics

Abstract

The availability of huge amounts of data resulted in great need of data mining technique in order to generate useful knowledge. In the present study we provide detailed information about data mining techniques with more focus on classification techniques as one important supervised learning technique. We also discuss WEKA software as a tool of choice to perform classification analysis for different kinds of available data. A detailed methodology is provided to facilitate utilizing the software by a wide range of users. The main features of WEKA are 49 data preprocessing tools, 76 classification/regression algorithms, 8 clustering algorithms, 3 algorithms for finding association rules, 15 attribute/subset evaluators plus 10 search algorithms for feature selection. WEKA extracts useful information from data and enables a suitable algorithm for generating an accurate predictive model from it to be identified.  Moreover, medical bioinformatics analyses have been performed to illustrate the usage of WEKA in the diagnosis of Leukemia. Keywords: Data mining, WEKA, Bioinformatics, Knowledge discovery, Gene Expression

    Similar works