54 research outputs found

    A methodology to compare dimensionality reduction algorithms in terms of loss of quality

    Get PDF
    Dimensionality Reduction (DR) is attracting more attention these days as a result of the increasing need to handle huge amounts of data effectively. DR methods allow the number of initial features to be reduced considerably until a set of them is found that allows the original properties of the data to be kept. However, their use entails an inherent loss of quality that is likely to affect the understanding of the data, in terms of data analysis. This loss of quality could be determinant when selecting a DR method, because of the nature of each method. In this paper, we propose a methodology that allows different DR methods to be analyzed and compared as regards the loss of quality produced by them. This methodology makes use of the concept of preservation of geometry (quality assessment criteria) to assess the loss of quality. Experiments have been carried out by using the most well-known DR algorithms and quality assessment criteria, based on the literature. These experiments have been applied on 12 real-world datasets. Results obtained so far show that it is possible to establish a method to select the most appropriate DR method, in terms of minimum loss of quality. Experiments have also highlighted some interesting relationships between the quality assessment criteria. Finally, the methodology allows the appropriate choice of dimensionality for reducing data to be established, whilst giving rise to a minimum loss of quality

    Image processing system based on similarity/dissimilarity measures to classify binary images from contour-based features

    Get PDF
    Image Processing Systems (IPS) try to solve tasks like image classification or segmentation based on its content. Many authors proposed a variety of techniques to tackle the image classification task. Plenty of methods address the performance of the IPS [1], as long as the influence of many external circumstances, such as illumination, rotation, and noise [2]. However, there is an increasing interest in classifying shapes from binary images (BI). Shape Classification (SC) from BI considers a segmented image as a sample (backgroundsegmentation [3]) and aims to identify objects based in its shape..

    Making nonlinear manifold learning models interpretable: The manifold grand tour

    Get PDF
    Dimensionality reduction is required to produce visualisations of high dimensional data. In this framework, one of the most straightforward approaches to visualising high dimensional data is based on reducing complexity and applying linear projections while tumbling the projection axes in a defined sequence which generates a Grand Tour of the data. We propose using smooth nonlinear topographic maps of the data distribution to guide the Grand Tour, increasing the effectiveness of this approach by prioritising the linear views of the data that are most consistent with global data structure in these maps. A further consequence of this approach is to enable direct visualisation of the topographic map onto projective spaces that discern structure in the data. The experimental results on standard databases reported in this paper, using self-organising maps and generative topographic mapping, illustrate the practical value of the proposed approach. The main novelty of our proposal is the definition of a systematic way to guide the search of data views in the grand tour, selecting and prioritizing some of them, based on nonlinear manifold models

    Image processing system based on similarity/dissimilarity measures to classify binary images from contour-based features

    Get PDF
    Image Processing Systems (IPS) try to solve tasks like image classification or segmentation based on its content. Many authors proposed a variety of techniques to tackle the image classification task. Plenty of methods address the performance of the IPS [1], as long as the influence of many external circumstances, such as illumination, rotation, and noise [2]. However, there is an increasing interest in classifying shapes from binary images (BI). Shape Classification (SC) from BI considers a segmented image as a sample (backgroundsegmentation [3]) and aims to identify objects based in its shape..

    MedVir: 3D visual interface applied to gene profile analysis

    Get PDF
    The use of data mining techniques for the gene profile discovery of diseases, such as cancer, is becoming usual in many researches. These techniques do not usually analyze the relationships between genes in depth, depending on the different variety of manifestations of the disease (related to patients). This kind of analysis takes a considerable amount of time and is not always the focus of the research. However, it is crucial in order to generate personalized treatments to fight the disease. Thus, this research focuses on finding a mechanism for gene profile analysis to be used by the medical and biologist experts. Results: In this research, the MedVir framework is proposed. It is an intuitive mechanism based on the visualization of medical data such as gene profiles, patients, clinical data, etc. MedVir, which is based on an Evolutionary Optimization technique, is a Dimensionality Reduction (DR) approach that presents the data in a three dimensional space. Furthermore, thanks to Virtual Reality technology, MedVir allows the expert to interact with the data in order to tailor it to the experience and knowledge of the expert

    Exploratory visualization of misclassified GPCRs from their transformed unaligned sequences using manifold learning techniques

    Get PDF
    Class C G-protein-coupled receptors (GPCRs) are cell membrane proteins of great relevance to biology and pharmacology. Previous research has revealed an upper boundary on the accuracy that can be achieved in their classification into subtypes from the unaligned transformation of their sequences. To investigate this, we focus on sequences that have been misclassified using supervised methods. These are visualized, using a nonlinear dimensionality reduction technique and phylogenetic trees, and then characterized against the rest of the data and, particularly, against the rest of cases of their own subtype. This should help to discriminate between different types of misclassification and to build hypotheses about database quality problems and the extent to which GPCR sequence transformations limit subtype discriminability. The reported experiments provide a proof of concept for the proposed method.Postprint (published version
    corecore