Feature selection of imbalanced gene expression microarray data

Anaissi, A; Goyal, M; Kennedy, PJ

research

Feature selection of imbalanced gene expression microarray data

Authors: A Anaissi
M Goyal
PJ Kennedy
Publication date: 21 November 2011
Publisher: 'Institute of Electrical and Electronics Engineers (IEEE)'
Doi

Abstract

Gene expression data is a very complex data set characterised by abundant numbers of features but with a low number of observations. However, only a small number of these features are relevant to an outcome of interest. With this kind of data set, feature selection becomes a real prerequisite. This paper proposes a methodology for feature selection for an imbalanced leukaemia gene expression data based on random forest algorithm. It presents the importance of feature selection in terms of reducing the number of features, enhancing the quality of machine learning and providing better understanding for biologists in diagnosis and prediction. Algorithms are presented to show the methodology and strategy for feature selection taking care to avoid over fitting. Moreover, experiments are done using imbalanced Leukaemia gene expression data and special measurement is used to evaluate the quality of feature selection and performance of classification. © 2011 IEEE

Similar works

Full text

Available Versions

OPUS - University of Technology Sydney

oai:opus.lib.uts.edu.au:10453/...

Last time updated on 13/02/2017