Microarray gene expression data sets usually contain a large number of genes, but a small
number of samples. In this article, we present a two-stage classification model by combining
feature selection with the dissimilarity-based representation paradigm. In the preprocessing
stage, the ReliefF algorithm is used to generate a subset with a number of topranked
genes; in the learning/classification stage, the samples represented by the previously
selected genes are mapped into a dissimilarity space, which is then used to construct
a classifier capable of separating the classes more easily than a feature-based model. The
ultimate aim of this paper is not to find the best subset of genes, but to analyze the performance
of the dissimilarity-based models by means of a comprehensive collection of experiments
for the classification of microarray gene expression data. To this end, we compare
the classification results of an artificial neural network, a support vector machine and the
Fisher’s linear discriminant classifier built on the feature (gene) space with those on the
dissimilarity space when varying the number of genes selected by ReliefF, using eight different
microarray databases. The results show that the dissimilarity-based classifiers systematically
outperform the feature-based models. In addition, classification through the
proposed representation appears to be more robust (i.e. less sensitive to the number of
genes) than that with the conventional feature-based representation