28 research outputs found

    Data compression and regression based on local principal curves.

    Get PDF
    Frequently the predictor space of a multivariate regression problem of the type y = m(x_1, …, x_p ) + ε is intrinsically one-dimensional, or at least of far lower dimension than p. Usual modeling attempts such as the additive model y = m_1(x_1) + … + m_p (x_p ) + ε, which try to reduce the complexity of the regression problem by making additional structural assumptions, are then inefficient as they ignore the inherent structure of the predictor space and involve complicated model and variable selection stages. In a fundamentally different approach, one may consider first approximating the predictor space by a (usually nonlinear) curve passing through it, and then regressing the response only against the one-dimensional projections onto this curve. This entails the reduction from a p- to a one-dimensional regression problem. As a tool for the compression of the predictor space we apply local principal curves. Taking things on from the results presented in Einbeck et al. (Classification – The Ubiquitous Challenge. Springer, Heidelberg, 2005, pp. 256–263), we show how local principal curves can be parametrized and how the projections are obtained. The regression step can then be carried out using any nonparametric smoother. We illustrate the technique using data from the physical sciences

    Ellipse-based Principal Component Analysis for Self-intersecting Curve Reconstruction from Noisy Point Sets

    Get PDF
    Surface reconstruction from cross cuts usually requires curve reconstruction from planar noisy point samples -- The output curves must form a possibly disconnected 1manifold for the surface reconstruction to proceed -- This article describes an implemented algorithm for the reconstruction of planar curves (1manifolds) out of noisy point samples of a sel-fintersecting or nearly sel-fintersecting planar curve C -- C:[a,b]⊂R→R is self-intersecting if C(u)=C(v), u≠v, u,v∈(a,b) (C(u) is the self-intersection point) -- We consider only transversal self-intersections, i.e. those for which the tangents of the intersecting branches at the intersection point do not coincide (C′(u)≠C′(v)) -- In the presence of noise, curves which self-intersect cannot be distinguished from curves which nearly sel fintersect -- Existing algorithms for curve reconstruction out of either noisy point samples or pixel data, do not produce a (possibly disconnected) Piecewise Linear 1manifold approaching the whole point sample -- The algorithm implemented in this work uses Principal Component Analysis (PCA) with elliptic support regions near the selfintersections -- The algorithm was successful in recovering contours out of noisy slice samples of a surface, for the Hand, Pelvis and Skull data sets -- As a test for the correctness of the obtained curves in the slice levels, they were input into an algorithm of surface reconstruction, leading to a reconstructed surface which reproduces the topological and geometrical properties of the original object -- The algorithm robustly reacts not only to statistical noncorrelation at the self-intersections(nonmanifold neighborhoods) but also to occasional high noise at the nonselfintersecting (1manifold) neighborhood

    Pose Estimation for Evaluating Standing Long Jumps via Dynamic Bayesian Networks

    Get PDF
    [[abstract]]A system is developed for analyzing poses in a standing long jump automatically. In the system, silhouette of the jumper is segmented from the background first. A thinning algorithm is then used to find a rough skeleton from the silhouette. Some image processing techniques are applied to make the resulted skeleton smoother and simpler. Key points are extracted from the skeleton. Finally, the dynamic Bayesian network (DBN) is used to determine the corresponding pose from the key points. The experimental result shows that pose estimation accuracy is quite good. According to the standing long jump standards, incorrect movements at different stages of the jump can thus be identified.[[conferencetype]]國際[[conferencedate]]20080617~20080620[[conferencelocation]]Beijing, Chin

    Principal manifolds and graphs in practice: from molecular biology to dynamical systems

    Full text link
    We present several applications of non-linear data modeling, using principal manifolds and principal graphs constructed using the metaphor of elasticity (elastic principal graph approach). These approaches are generalizations of the Kohonen's self-organizing maps, a class of artificial neural networks. On several examples we show advantages of using non-linear objects for data approximation in comparison to the linear ones. We propose four numerical criteria for comparing linear and non-linear mappings of datasets into the spaces of lower dimension. The examples are taken from comparative political science, from analysis of high-throughput data in molecular biology, from analysis of dynamical systems.Comment: 12 pages, 9 figure

    PCA Beyond The Concept of Manifolds: Principal Trees, Metro Maps, and Elastic Cubic Complexes

    Full text link
    Multidimensional data distributions can have complex topologies and variable local dimensions. To approximate complex data, we propose a new type of low-dimensional ``principal object'': a principal cubic complex. This complex is a generalization of linear and non-linear principal manifolds and includes them as a particular case. To construct such an object, we combine a method of topological grammars with the minimization of an elastic energy defined for its embedment into multidimensional data space. The whole complex is presented as a system of nodes and springs and as a product of one-dimensional continua (represented by graphs), and the grammars describe how these continua transform during the process of optimal complex construction. The simplest case of a topological grammar (``add a node'', ``bisect an edge'') is equivalent to the construction of ``principal trees'', an object useful in many practical applications. We demonstrate how it can be applied to the analysis of bacterial genomes and for visualization of cDNA microarray data using the ``metro map'' representation. The preprint is supplemented by animation: ``How the topological grammar constructs branching principal components (AnimatedBranchingPCA.gif)''.Comment: 19 pages, 8 figure

    A fast approximate skeleton with guarantees for any cloud of points in a Euclidean space

    Get PDF
    The tree reconstruction problem is to find an embedded straight-line tree that approximates a given cloud of unorganized points in Rm\mathbb{R}^m up to a certain error. A practical solution to this problem will accelerate a discovery of new colloidal products with desired physical properties such as viscosity. We define the Approximate Skeleton of any finite point cloud CC in a Euclidean space with theoretical guarantees. The Approximate Skeleton ASk(C)(C) always belongs to a given offset of CC, i.e. the maximum distance from CC to ASk(C)(C) can be a given maximum error. The number of vertices in the Approximate Skeleton is close to the minimum number in an optimal tree by factor 2. The new Approximate Skeleton of any unorganized point cloud CC is computed in a near linear time in the number of points in CC. Finally, the Approximate Skeleton outperforms past skeletonization algorithms on the size and accuracy of reconstruction for a large dataset of real micelles and random clouds

    Spectral Dimensionality Reduction

    Get PDF
    In this paper, we study and put under a common framework a number of non-linear dimensionality reduction methods, such as Locally Linear Embedding, Isomap, Laplacian Eigenmaps and kernel PCA, which are based on performing an eigen-decomposition (hence the name 'spectral'). That framework also includes classical methods such as PCA and metric multidimensional scaling (MDS). It also includes the data transformation step used in spectral clustering. We show that in all of these cases the learning algorithm estimates the principal eigenfunctions of an operator that depends on the unknown data density and on a kernel that is not necessarily positive semi-definite. This helps to generalize some of these algorithms so as to predict an embedding for out-of-sample examples without having to retrain the model. It also makes it more transparent what these algorithm are minimizing on the empirical data and gives a corresponding notion of generalization error. Dans cet article, nous étudions et développons un cadre unifié pour un certain nombre de méthodes non linéaires de réduction de dimensionalité, telles que LLE, Isomap, LE (Laplacian Eigenmap) et ACP à noyaux, qui font de la décomposition en valeurs propres (d'où le nom "spectral"). Ce cadre inclut également des méthodes classiques telles que l'ACP et l'échelonnage multidimensionnel métrique (MDS). Il inclut aussi l'étape de transformation de données utilisée dans l'agrégation spectrale. Nous montrons que, dans tous les cas, l'algorithme d'apprentissage estime les fonctions propres principales d'un opérateur qui dépend de la densité inconnue de données et d'un noyau qui n'est pas nécessairement positif semi-défini. Ce cadre aide à généraliser certains modèles pour prédire les coordonnées des exemples hors-échantillons sans avoir à réentraîner le modèle. Il aide également à rendre plus transparent ce que ces algorithmes minimisent sur les données empiriques et donne une notion correspondante d'erreur de généralisation.non-parametric models, non-linear dimensionality reduction, kernel models, modèles non paramétriques, réduction de dimensionalité non linéaire, modèles à noyau

    Graph Reconstruction by Discrete Morse Theory

    Get PDF
    Recovering hidden graph-like structures from potentially noisy data is a fundamental task in modern data analysis. Recently, a persistence-guided discrete Morse-based framework to extract a geometric graph from low-dimensional data has become popular. However, to date, there is very limited theoretical understanding of this framework in terms of graph reconstruction. This paper makes a first step towards closing this gap. Specifically, first, leveraging existing theoretical understanding of persistence-guided discrete Morse cancellation, we provide a simplified version of the existing discrete Morse-based graph reconstruction algorithm. We then introduce a simple and natural noise model and show that the aforementioned framework can correctly reconstruct a graph under this noise model, in the sense that it has the same loop structure as the hidden ground-truth graph, and is also geometrically close. We also provide some experimental results for our simplified graph-reconstruction algorithm

    Elastic principal manifolds and their practical applications

    Full text link
    Principal manifolds serve as useful tool for many practical applications. These manifolds are defined as lines or surfaces passing through "the middle" of data distribution. We propose an algorithm for fast construction of grid approximations of principal manifolds with given topology. It is based on analogy of principal manifold and elastic membrane. The first advantage of this method is a form of the functional to be minimized which becomes quadratic at the step of the vertices position refinement. This makes the algorithm very effective, especially for parallel implementations. Another advantage is that the same algorithmic kernel is applied to construct principal manifolds of different dimensions and topologies. We demonstrate how flexibility of the approach allows numerous adaptive strategies like principal graph constructing, etc. The algorithm is implemented as a C++ package elmap and as a part of stand-alone data visualization tool VidaExpert, available on the web. We describe the approach and provide several examples of its application with speed performance characteristics.Comment: 26 pages, 10 figures, edited final versio
    corecore