Gene encoder: a feature selection technique through unsupervised deep learning-based clustering for large gene expression data

A Ben-Dor; A Brazma; A Shah; AA Alizadeh; C Fraley; C Song; D Jiang; D Sathiaraj; EP Xing; G Gan; H Chen; H Salem; HM Alshamlan; J Oyelade; L Zhu; M Ghosh; MB Eisen; MH Law; MJ Rani; MK Pakhira; P Bihani; P Langley; P Pudil; PA Estévez; PY Chen; R Kohavi; S Mahajan; S Tiwari; SA Armstrong; SM Ayyad; T Kohonen; T Muhammad; TR Golub; W Mao; X Zhu; Z Halim; Z Halim; Z Halim; Z Halim

Gene encoder: a feature selection technique through unsupervised deep learning-based clustering for large gene expression data

Authors: A Ben-Dor
A Brazma
A Shah
AA Alizadeh
C Fraley
C Song
D Jiang
D Sathiaraj
EP Xing
G Gan
H Chen
H Salem
HM Alshamlan
J Oyelade
L Zhu
M Ghosh
MB Eisen
MH Law
MJ Rani
MK Pakhira
P Bihani
P Langley
P Pudil
PA Estévez
PY Chen
R Kohavi
S Mahajan
S Tiwari
SA Armstrong
SM Ayyad
T Kohonen
T Muhammad
TR Golub
W Mao
X Zhu
Z Halim
Z Halim
Z Halim
Z Halim
Publication date: 1 January 2020
Publisher: 'Springer Science and Business Media LLC'
Doi

Abstract

© 2020, Springer-Verlag London Ltd., part of Springer Nature. Cancer is a severe condition of uncontrolled cell division that results in a tumor formation that spreads to other tissues of the body. Therefore, the development of new medication and treatment methods for this is in demand. Classification of microarray data plays a vital role in handling such situations. The relevant gene selection is an important step for the classification of microarray data. This work presents gene encoder, an unsupervised two-stage feature selection technique for the cancer samples’ classification. The first stage aggregates three filter methods, namely principal component analysis, correlation, and spectral-based feature selection techniques. Next, the genetic algorithm is used, which evaluates the chromosome utilizing the autoencoder-based clustering. The resultant feature subset is used for the classification task. Three classifiers, namely support vector machine, k-nearest neighbors, and random forest, are used in this work to avoid the dependency on any one classifier. Six benchmark gene expression datasets are used for the performance evaluation, and a comparison is made with four state-of-the-art related algorithms. Three sets of experiments are carried out to evaluate the proposed method. These experiments are for the evaluation of the selected features based on sample-based clustering, adjusting optimal parameters, and for selecting better performing classifier. The comparison is based on accuracy, recall, false positive rate, precision, F-measure, and entropy. The obtained results suggest better performance of the current proposal

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

ZU Scholars (Zayed University)

oai:zuscholars.zu.ac.ae:works-...

Last time updated on 03/12/2021

Crossref

Last time updated on 29/03/2021