4 research outputs found
Mapping microarray gene expression data into dissimilarity spaces for tumor classification
Microarray gene expression data sets usually contain a large number of genes, but a small
number of samples. In this article, we present a two-stage classification model by combining
feature selection with the dissimilarity-based representation paradigm. In the preprocessing
stage, the ReliefF algorithm is used to generate a subset with a number of topranked
genes; in the learning/classification stage, the samples represented by the previously
selected genes are mapped into a dissimilarity space, which is then used to construct
a classifier capable of separating the classes more easily than a feature-based model. The
ultimate aim of this paper is not to find the best subset of genes, but to analyze the performance
of the dissimilarity-based models by means of a comprehensive collection of experiments
for the classification of microarray gene expression data. To this end, we compare
the classification results of an artificial neural network, a support vector machine and the
Fisher’s linear discriminant classifier built on the feature (gene) space with those on the
dissimilarity space when varying the number of genes selected by ReliefF, using eight different
microarray databases. The results show that the dissimilarity-based classifiers systematically
outperform the feature-based models. In addition, classification through the
proposed representation appears to be more robust (i.e. less sensitive to the number of
genes) than that with the conventional feature-based representation
ENSEMBLE CLASSIFICATION BASED MICROARRAY GENE RETRIEVAL SYSTEM
Data mining plays an important role in the process of classifying between the normal and the cancerous samples by utilizing microarray gene data. As this classification process is related to the human lives, greater sensitivity and specificity rates are mandatory. Taking this challenge into account, this work presents a technique to classify between the normal and cancerous samples by means of efficient feature selection and classification. The process of feature selection is achieved by Information Gain Ratio (IGR) and the selected features are forwarded to the classification process, which is achieved by ensemble classification. The classifiers being employed to attain ensemble classification are k-Nearest Neighbour (k-NN), Support Vector Machine (SVM) and Extreme Learning Machine (ELM). The performance of the proposed approach is analysed with respect to three different datasets such as Leukemia, Colon and Breast cancer in terms of accuracy, sensitivity and specificity. The experimental results prove that the proposed work shows better results, when compared to the existing techniques
An efficient multivariate feature ranking method for gene selection in high-dimensional microarray data
Classification of microarray data plays a significant role in the diagnosis and prediction of cancer. However, its high-dimensionality (>tens of thousands) compared to the number of observations (<tens of hundreds) may lead to poor classification accuracy. In addition, only a fraction of genes is really important for the classification of a certain cancer, and thus feature selection is very essential in this field. Due to the time and memory burden for processing the high-dimensional data, univariate feature ranking methods are widely-used in gene selection. However, most of them are not that accurate because they only consider the relevance of features to the target without considering the redundancy among features. In this study, we propose a novel multivariate feature ranking method to improve the quality of gene selection and ultimately to improve the accuracy of microarray data classification. The method can be efficiently applied to high-dimensional microarray data. We embedded the formal definition of relevance into a Markov blanket (MB) to create a new feature ranking method. Using a few microarray datasets, we demonstrated the practicability of MB-based feature ranking having high accuracy and good efficiency. The method outperformed commonly-used univariate ranking methods and also yielded the better result even compared with the other multivariate feature ranking method due to the advantage of data efficiency