1,254 research outputs found

    Principal manifolds and graphs in practice: from molecular biology to dynamical systems

    Full text link
    We present several applications of non-linear data modeling, using principal manifolds and principal graphs constructed using the metaphor of elasticity (elastic principal graph approach). These approaches are generalizations of the Kohonen's self-organizing maps, a class of artificial neural networks. On several examples we show advantages of using non-linear objects for data approximation in comparison to the linear ones. We propose four numerical criteria for comparing linear and non-linear mappings of datasets into the spaces of lower dimension. The examples are taken from comparative political science, from analysis of high-throughput data in molecular biology, from analysis of dynamical systems.Comment: 12 pages, 9 figure

    Data complexity measured by principal graphs

    Full text link
    How to measure the complexity of a finite set of vectors embedded in a multidimensional space? This is a non-trivial question which can be approached in many different ways. Here we suggest a set of data complexity measures using universal approximators, principal cubic complexes. Principal cubic complexes generalise the notion of principal manifolds for datasets with non-trivial topologies. The type of the principal cubic complex is determined by its dimension and a grammar of elementary graph transformations. The simplest grammar produces principal trees. We introduce three natural types of data complexity: 1) geometric (deviation of the data's approximator from some "idealized" configuration, such as deviation from harmonicity); 2) structural (how many elements of a principal graph are needed to approximate the data), and 3) construction complexity (how many applications of elementary graph transformations are needed to construct the principal object starting from the simplest one). We compute these measures for several simulated and real-life data distributions and show them in the "accuracy-complexity" plots, helping to optimize the accuracy/complexity ratio. We discuss various issues connected with measuring data complexity. Software for computing data complexity measures from principal cubic complexes is provided as well.Comment: Computers and Mathematics with Applications, in pres

    Geometrical complexity of data approximators

    Full text link
    There are many methods developed to approximate a cloud of vectors embedded in high-dimensional space by simpler objects: starting from principal points and linear manifolds to self-organizing maps, neural gas, elastic maps, various types of principal curves and principal trees, and so on. For each type of approximators the measure of the approximator complexity was developed too. These measures are necessary to find the balance between accuracy and complexity and to define the optimal approximations of a given type. We propose a measure of complexity (geometrical complexity) which is applicable to approximators of several types and which allows comparing data approximations of different types.Comment: 10 pages, 3 figures, minor correction and extensio

    Elastic principal manifolds and their practical applications

    Full text link
    Principal manifolds serve as useful tool for many practical applications. These manifolds are defined as lines or surfaces passing through "the middle" of data distribution. We propose an algorithm for fast construction of grid approximations of principal manifolds with given topology. It is based on analogy of principal manifold and elastic membrane. The first advantage of this method is a form of the functional to be minimized which becomes quadratic at the step of the vertices position refinement. This makes the algorithm very effective, especially for parallel implementations. Another advantage is that the same algorithmic kernel is applied to construct principal manifolds of different dimensions and topologies. We demonstrate how flexibility of the approach allows numerous adaptive strategies like principal graph constructing, etc. The algorithm is implemented as a C++ package elmap and as a part of stand-alone data visualization tool VidaExpert, available on the web. We describe the approach and provide several examples of its application with speed performance characteristics.Comment: 26 pages, 10 figures, edited final versio

    Elastic Maps and Nets for Approximating Principal Manifolds and Their Application to Microarray Data Visualization

    Full text link
    Principal manifolds are defined as lines or surfaces passing through ``the middle'' of data distribution. Linear principal manifolds (Principal Components Analysis) are routinely used for dimension reduction, noise filtering and data visualization. Recently, methods for constructing non-linear principal manifolds were proposed, including our elastic maps approach which is based on a physical analogy with elastic membranes. We have developed a general geometric framework for constructing ``principal objects'' of various dimensions and topologies with the simplest quadratic form of the smoothness penalty which allows very effective parallel implementations. Our approach is implemented in three programming languages (C++, Java and Delphi) with two graphical user interfaces (VidaExpert http://bioinfo.curie.fr/projects/vidaexpert and ViMiDa http://bioinfo-out.curie.fr/projects/vimida applications). In this paper we overview the method of elastic maps and present in detail one of its major applications: the visualization of microarray data in bioinformatics. We show that the method of elastic maps outperforms linear PCA in terms of data approximation, representation of between-point distance structure, preservation of local point neighborhood and representing point classes in low-dimensional spaces.Comment: 35 pages 10 figure

    Robust And Scalable Learning Of Complex Dataset Topologies Via Elpigraph

    Full text link
    Large datasets represented by multidimensional data point clouds often possess non-trivial distributions with branching trajectories and excluded regions, with the recent single-cell transcriptomic studies of developing embryo being notable examples. Reducing the complexity and producing compact and interpretable representations of such data remains a challenging task. Most of the existing computational methods are based on exploring the local data point neighbourhood relations, a step that can perform poorly in the case of multidimensional and noisy data. Here we present ElPiGraph, a scalable and robust method for approximation of datasets with complex structures which does not require computing the complete data distance matrix or the data point neighbourhood graph. This method is able to withstand high levels of noise and is capable of approximating complex topologies via principal graph ensembles that can be combined into a consensus principal graph. ElPiGraph deals efficiently with large and complex datasets in various fields from biology, where it can be used to infer gene dynamics from single-cell RNA-Seq, to astronomy, where it can be used to explore complex structures in the distribution of galaxies.Comment: 32 pages, 14 figure
    • …
    corecore