2,282 research outputs found
Trustworthiness and metrics in visualizing similarity of gene expression
BACKGROUND: Conventionally, the first step in analyzing the large and high-dimensional data sets measured by microarrays is visual exploration. Dendrograms of hierarchical clustering, self-organizing maps (SOMs), and multidimensional scaling have been used to visualize similarity relationships of data samples. We address two central properties of the methods: (i) Are the visualizations trustworthy, i.e., if two samples are visualized to be similar, are they really similar? (ii) The metric. The measure of similarity determines the result; we propose using a new learning metrics principle to derive a metric from interrelationships among data sets. RESULTS: The trustworthiness of hierarchical clustering, multidimensional scaling, and the self-organizing map were compared in visualizing similarity relationships among gene expression profiles. The self-organizing map was the best except that hierarchical clustering was the most trustworthy for the most similar profiles. Trustworthiness can be further increased by treating separately those genes for which the visualization is least trustworthy. We then proceed to improve the metric. The distance measure between the expression profiles is adjusted to measure differences relevant to functional classes of the genes. The genes for which the new metric is the most different from the usual correlation metric are listed and visualized with one of the visualization methods, the self-organizing map, computed in the new metric. CONCLUSIONS: The conjecture from the methodological results is that the self-organizing map can be recommended to complement the usual hierarchical clustering for visualizing and exploring gene expression data. Discarding the least trustworthy samples and improving the metric still improves it
Comparison of visualization methods of genome-wide SNP profiles in childhood acute lymphoblastic leukaemia
Data mining and knowledge discovery have been applied to datasets in various industries including biomedical data. Modelling, data mining and visualization in biomedical data address the problem of extracting knowledge from large and complex biomedical data. The current challenge of dealing with such data is to develop statistical-based and data mining methods that search and browse the underlying patterns within the data. In this paper, we employ several data reduction methods for visualizing genome- wide Single Nucleotide Polymorphism (SNP) datasets based on state-of-art data reduction techniques. Visualization approach has been selected based on the trustworthiness of the resultant visualizations. To deal with large amounts of genetic variation data, we have chosen to apply different data reduction methods to deal with the problem induced by high dimensionality. Based on the trustworthiness metric we found that neighbour Retrieval Visualizer (NeRV) outperformed other methods. This method optimizes the retrieval quality of Stochastic neighbour Embedding. The quality measure of the visualization (i.e. NeRV) showed excellent results, even though the dataset was reduced from 13917 to 2 dimensions. The visualization results will assist clinicians and biomedical researchers in understanding the systems biology of patients and how to compare different groups of clusters in visualizations. © 2008, Australian Computer Society, Inc
Implicit Multidimensional Projection of Local Subspaces
We propose a visualization method to understand the effect of
multidimensional projection on local subspaces, using implicit function
differentiation. Here, we understand the local subspace as the multidimensional
local neighborhood of data points. Existing methods focus on the projection of
multidimensional data points, and the neighborhood information is ignored. Our
method is able to analyze the shape and directional information of the local
subspace to gain more insights into the global structure of the data through
the perception of local structures. Local subspaces are fitted by
multidimensional ellipses that are spanned by basis vectors. An accurate and
efficient vector transformation method is proposed based on analytical
differentiation of multidimensional projections formulated as implicit
functions. The results are visualized as glyphs and analyzed using a full set
of specifically-designed interactions supported in our efficient web-based
visualization tool. The usefulness of our method is demonstrated using various
multi- and high-dimensional benchmark datasets. Our implicit differentiation
vector transformation is evaluated through numerical comparisons; the overall
method is evaluated through exploration examples and use cases
HyperNP: Interactive Visual Exploration of Multidimensional Projection Hyperparameters
Projection algorithms such as t-SNE or UMAP are useful for the visualization
of high dimensional data, but depend on hyperparameters which must be tuned
carefully. Unfortunately, iteratively recomputing projections to find the
optimal hyperparameter value is computationally intensive and unintuitive due
to the stochastic nature of these methods. In this paper we propose HyperNP, a
scalable method that allows for real-time interactive hyperparameter
exploration of projection methods by training neural network approximations.
HyperNP can be trained on a fraction of the total data instances and
hyperparameter configurations and can compute projections for new data and
hyperparameters at interactive speeds. HyperNP is compact in size and fast to
compute, thus allowing it to be embedded in lightweight visualization systems
such as web browsers. We evaluate the performance of the HyperNP across three
datasets in terms of performance and speed. The results suggest that HyperNP is
accurate, scalable, interactive, and appropriate for use in real-world
settings
ChemVA: Interactive visual analysis of chemical compound similarity in virtual screening
In the modern drug discovery process, medicinal chemists deal with the complexity of analysis of large ensembles of candidate molecules. Computational tools, such as dimensionality reduction (DR) and classification, are commonly used to efficiently process the multidimensional space of features. These underlying calculations often hinder interpretability of results and prevent experts from assessing the impact of individual molecular features on the resulting representations. To provide a solution for scrutinizing such complex data, we introduce ChemVA, an interactive application for the visual exploration of large molecular ensembles and their features. Our tool consists of multiple coordinated views: Hexagonal view, Detail view, 3D view, Table view, and a newly proposed Difference view designed for the comparison of DR projections. These views display DR projections combined with biological activity, selected molecular features, and confidence scores for each of these projections. This conjunction of views allows the user to drill down through the dataset and to efficiently select candidate compounds. Our approach was evaluated on two case studies of finding structurally similar ligands with similar binding affinity to a target protein, as well as on an external qualitative evaluation. The results suggest that our system allows effective visual inspection and comparison of different high-dimensional molecular representations. Furthermore, ChemVA assists in the identification of candidate compounds while providing information on the certainty behind different molecular representations.Fil: Sabando, María Virginia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Ulbrich, Pavol. Masaryk University. Faculty of Sciences; República ChecaFil: Selzer, Matias Nicolas. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Laboratorio de Ciencias de la Imágenes; ArgentinaFil: Byska, Jan. Masaryk University. Faculty of Sciences; República ChecaFil: Mican, Jan. Masaryk University. Faculty of Sciences; República ChecaFil: Ponzoni, Ignacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Soto, Axel Juan. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Ganuza, María Luján. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Laboratorio de Ciencias de la Imágenes; ArgentinaFil: Kozlikova, Barbora. Masaryk University. Faculty of Sciences; República Chec
- …