38,447 research outputs found
Mapping microarray gene expression data into dissimilarity spaces for tumor classification
Microarray gene expression data sets usually contain a large number of genes, but a small
number of samples. In this article, we present a two-stage classification model by combining
feature selection with the dissimilarity-based representation paradigm. In the preprocessing
stage, the ReliefF algorithm is used to generate a subset with a number of topranked
genes; in the learning/classification stage, the samples represented by the previously
selected genes are mapped into a dissimilarity space, which is then used to construct
a classifier capable of separating the classes more easily than a feature-based model. The
ultimate aim of this paper is not to find the best subset of genes, but to analyze the performance
of the dissimilarity-based models by means of a comprehensive collection of experiments
for the classification of microarray gene expression data. To this end, we compare
the classification results of an artificial neural network, a support vector machine and the
Fisher’s linear discriminant classifier built on the feature (gene) space with those on the
dissimilarity space when varying the number of genes selected by ReliefF, using eight different
microarray databases. The results show that the dissimilarity-based classifiers systematically
outperform the feature-based models. In addition, classification through the
proposed representation appears to be more robust (i.e. less sensitive to the number of
genes) than that with the conventional feature-based representation
Dissimilarity-based representation for radiomics applications
Radiomics is a term which refers to the analysis of the large amount of
quantitative tumor features extracted from medical images to find useful
predictive, diagnostic or prognostic information. Many recent studies have
proved that radiomics can offer a lot of useful information that physicians
cannot extract from the medical images and can be associated with other
information like gene or protein data. However, most of the classification
studies in radiomics report the use of feature selection methods without
identifying the machine learning challenges behind radiomics. In this paper, we
first show that the radiomics problem should be viewed as an high dimensional,
low sample size, multi view learning problem, then we compare different
solutions proposed in multi view learning for classifying radiomics data. Our
experiments, conducted on several real world multi view datasets, show that the
intermediate integration methods work significantly better than filter and
embedded feature selection methods commonly used in radiomics.Comment: conference, 6 pages, 2 figure
Improve the performance of transfer learning without fine-tuning using dissimilarity-based multi-view learning for breast cancer histology images
Breast cancer is one of the most common types of cancer and leading
cancer-related death causes for women. In the context of ICIAR 2018 Grand
Challenge on Breast Cancer Histology Images, we compare one handcrafted feature
extractor and five transfer learning feature extractors based on deep learning.
We find out that the deep learning networks pretrained on ImageNet have better
performance than the popular handcrafted features used for breast cancer
histology images. The best feature extractor achieves an average accuracy of
79.30%. To improve the classification performance, a random forest
dissimilarity based integration method is used to combine different feature
groups together. When the five deep learning feature groups are combined, the
average accuracy is improved to 82.90% (best accuracy 85.00%). When handcrafted
features are combined with the five deep learning feature groups, the average
accuracy is improved to 87.10% (best accuracy 93.00%)
Classifying sequences by the optimized dissimilarity space embedding approach: a case study on the solubility analysis of the E. coli proteome
We evaluate a version of the recently-proposed classification system named
Optimized Dissimilarity Space Embedding (ODSE) that operates in the input space
of sequences of generic objects. The ODSE system has been originally presented
as a classification system for patterns represented as labeled graphs. However,
since ODSE is founded on the dissimilarity space representation of the input
data, the classifier can be easily adapted to any input domain where it is
possible to define a meaningful dissimilarity measure. Here we demonstrate the
effectiveness of the ODSE classifier for sequences by considering an
application dealing with the recognition of the solubility degree of the
Escherichia coli proteome. Solubility, or analogously aggregation propensity,
is an important property of protein molecules, which is intimately related to
the mechanisms underlying the chemico-physical process of folding. Each protein
of our dataset is initially associated with a solubility degree and it is
represented as a sequence of symbols, denoting the 20 amino acid residues. The
herein obtained computational results, which we stress that have been achieved
with no context-dependent tuning of the ODSE system, confirm the validity and
generality of the ODSE-based approach for structured data classification.Comment: 10 pages, 49 reference
Dissimilarity-based Ensembles for Multiple Instance Learning
In multiple instance learning, objects are sets (bags) of feature vectors
(instances) rather than individual feature vectors. In this paper we address
the problem of how these bags can best be represented. Two standard approaches
are to use (dis)similarities between bags and prototype bags, or between bags
and prototype instances. The first approach results in a relatively
low-dimensional representation determined by the number of training bags, while
the second approach results in a relatively high-dimensional representation,
determined by the total number of instances in the training set. In this paper
a third, intermediate approach is proposed, which links the two approaches and
combines their strengths. Our classifier is inspired by a random subspace
ensemble, and considers subspaces of the dissimilarity space, defined by
subsets of instances, as prototypes. We provide guidelines for using such an
ensemble, and show state-of-the-art performances on a range of multiple
instance learning problems.Comment: Submitted to IEEE Transactions on Neural Networks and Learning
Systems, Special Issue on Learning in Non-(geo)metric Space
- …