13,974 research outputs found

    The BioDICE Taverna plugin for clustering and visualization of biological data: a workflow for molecular compounds exploration

    Get PDF
    Background: In many experimental pipelines, clustering of multidimensional biological datasets is used to detect hidden structures in unlabelled input data. Taverna is a popular workflow management system that is used to design and execute scientific workflows and aid in silico experimentation. The availability of fast unsupervised methods for clustering and visualization in the Taverna platform is important to support a data-driven scientific discovery in complex and explorative bioinformatics applications. Results: This work presents a Taverna plugin, the Biological Data Interactive Clustering Explorer (BioDICE), that performs clustering of high-dimensional biological data and provides a nonlinear, topology preserving projection for the visualization of the input data and their similarities. The core algorithm in the BioDICE plugin is Fast Learning Self Organizing Map (FLSOM), which is an improved variant of the Self Organizing Map (SOM) algorithm. The plugin generates an interactive 2D map that allows the visual exploration of multidimensional data and the identification of groups of similar objects. The effectiveness of the plugin is demonstrated on a case study related to chemical compounds. Conclusions: The number and variety of available tools and its extensibility have made Taverna a popular choice for the development of scientific data workflows. This work presents a novel plugin, BioDICE, which adds a data-driven knowledge discovery component to Taverna. BioDICE provides an effective and powerful clustering tool, which can be adopted for the explorative analysis of biological datasets

    Path finding on a spherical self-organizing map using distance transformations

    Get PDF
    Spatialization methods create visualizations that allow users to analyze high-dimensional data in an intuitive manner and facilitates the extraction of meaningful information. Just as geographic maps are simpli ed representations of geographic spaces, these visualizations are esssentially maps of abstract data spaces that are created through dimensionality reduction. While we are familiar with geographic maps for path planning/ nding applications, research into using maps of high-dimensional spaces for such purposes has been largely ignored. However, literature has shown that it is possible to use these maps to track temporal and state changes within a high-dimensional space. A popular dimensionality reduction method that produces a mapping for these purposes is the Self-Organizing Map. By using its topology preserving capabilities with a colour-based visualization method known as the U-Matrix, state transitions can be visualized as trajectories on the resulting mapping. Through these trajectories, one can gather information on the transition path between two points in the original high-dimensional state space. This raises the interesting question of whether or not the Self-Organizing Map can be used to discover the transition path between two points in an n-dimensional space. In this thesis, we use a spherically structured Self-Organizing Map called the Geodesic Self-Organizing Map for dimensionality reduction and the creation of a topological mapping that approximates the n-dimensional space. We rst present an intuitive method for a user to navigate the surface of the Geodesic SOM. A new application of the distance transformation algorithm is then proposed to compute the path between two points on the surface of the SOM, which corresponds to two points in the data space. Discussions will then follow on how this application could be improved using some form of surface shape analysis. The new approach presented in this thesis would then be evaluated by analyzing the results of using the Geodesic SOM for manifold embedding and by carrying out data analyses using carbon dioxide emissions data

    Path finding on a spherical self-organizing map using distance transformations

    Get PDF
    Spatialization methods create visualizations that allow users to analyze high-dimensional data in an intuitive manner and facilitates the extraction of meaningful information. Just as geographic maps are simpli ed representations of geographic spaces, these visualizations are esssentially maps of abstract data spaces that are created through dimensionality reduction. While we are familiar with geographic maps for path planning/ nding applications, research into using maps of high-dimensional spaces for such purposes has been largely ignored. However, literature has shown that it is possible to use these maps to track temporal and state changes within a high-dimensional space. A popular dimensionality reduction method that produces a mapping for these purposes is the Self-Organizing Map. By using its topology preserving capabilities with a colour-based visualization method known as the U-Matrix, state transitions can be visualized as trajectories on the resulting mapping. Through these trajectories, one can gather information on the transition path between two points in the original high-dimensional state space. This raises the interesting question of whether or not the Self-Organizing Map can be used to discover the transition path between two points in an n-dimensional space. In this thesis, we use a spherically structured Self-Organizing Map called the Geodesic Self-Organizing Map for dimensionality reduction and the creation of a topological mapping that approximates the n-dimensional space. We rst present an intuitive method for a user to navigate the surface of the Geodesic SOM. A new application of the distance transformation algorithm is then proposed to compute the path between two points on the surface of the SOM, which corresponds to two points in the data space. Discussions will then follow on how this application could be improved using some form of surface shape analysis. The new approach presented in this thesis would then be evaluated by analyzing the results of using the Geodesic SOM for manifold embedding and by carrying out data analyses using carbon dioxide emissions data

    A combined measure for quantifying and qualifying the topology preservation of growing self-organizing maps

    Get PDF
    The Self-OrganizingMap (SOM) is a neural network model that performs an ordered projection of a high dimensional input space in a low-dimensional topological structure. The process in which such mapping is formed is defined by the SOM algorithm, which is a competitive, unsupervised and nonparametric method, since it does not make any assumption about the input data distribution. The feature maps provided by this algorithm have been successfully applied for vector quantization, clustering and high dimensional data visualization processes. However, the initialization of the network topology and the selection of the SOM training parameters are two difficult tasks caused by the unknown distribution of the input signals. A misconfiguration of these parameters can generate a feature map of low-quality, so it is necessary to have some measure of the degree of adaptation of the SOM network to the input data model. The topologypreservation is the most common concept used to implement this measure. Several qualitative and quantitative methods have been proposed for measuring the degree of SOM topologypreservation, particularly using Kohonen's model. In this work, two methods for measuring the topologypreservation of the Growing Cell Structures (GCSs) model are proposed: the topographic function and the topology preserving ma

    Self-Organizing Time Map: An Abstraction of Temporal Multivariate Patterns

    Full text link
    This paper adopts and adapts Kohonen's standard Self-Organizing Map (SOM) for exploratory temporal structure analysis. The Self-Organizing Time Map (SOTM) implements SOM-type learning to one-dimensional arrays for individual time units, preserves the orientation with short-term memory and arranges the arrays in an ascending order of time. The two-dimensional representation of the SOTM attempts thus twofold topology preservation, where the horizontal direction preserves time topology and the vertical direction data topology. This enables discovering the occurrence and exploring the properties of temporal structural changes in data. For representing qualities and properties of SOTMs, we adapt measures and visualizations from the standard SOM paradigm, as well as introduce a measure of temporal structural changes. The functioning of the SOTM, and its visualizations and quality and property measures, are illustrated on artificial toy data. The usefulness of the SOTM in a real-world setting is shown on poverty, welfare and development indicators

    Somoclu: An Efficient Parallel Library for Self-Organizing Maps

    Get PDF
    Somoclu is a massively parallel tool for training self-organizing maps on large data sets written in C++. It builds on OpenMP for multicore execution, and on MPI for distributing the workload across the nodes in a cluster. It is also able to boost training by using CUDA if graphics processing units are available. A sparse kernel is included, which is useful for high-dimensional but sparse data, such as the vector spaces common in text mining workflows. Python, R and MATLAB interfaces facilitate interactive use. Apart from fast execution, memory use is highly optimized, enabling training large emergent maps even on a single computer.Comment: 26 pages, 9 figures. The code is available at https://peterwittek.github.io/somoclu

    Batch and median neural gas

    Full text link
    Neural Gas (NG) constitutes a very robust clustering algorithm given euclidian data which does not suffer from the problem of local minima like simple vector quantization, or topological restrictions like the self-organizing map. Based on the cost function of NG, we introduce a batch variant of NG which shows much faster convergence and which can be interpreted as an optimization of the cost function by the Newton method. This formulation has the additional benefit that, based on the notion of the generalized median in analogy to Median SOM, a variant for non-vectorial proximity data can be introduced. We prove convergence of batch and median versions of NG, SOM, and k-means in a unified formulation, and we investigate the behavior of the algorithms in several experiments.Comment: In Special Issue after WSOM 05 Conference, 5-8 september, 2005, Pari

    eXamine: a Cytoscape app for exploring annotated modules in networks

    Get PDF
    Background. Biological networks have growing importance for the interpretation of high-throughput "omics" data. Statistical and combinatorial methods allow to obtain mechanistic insights through the extraction of smaller subnetwork modules. Further enrichment analyses provide set-based annotations of these modules. Results. We present eXamine, a set-oriented visual analysis approach for annotated modules that displays set membership as contours on top of a node-link layout. Our approach extends upon Self Organizing Maps to simultaneously lay out nodes, links, and set contours. Conclusions. We implemented eXamine as a freely available Cytoscape app. Using eXamine we study a module that is activated by the virally-encoded G-protein coupled receptor US28 and formulate a novel hypothesis about its functioning
    corecore