12 research outputs found

    Spectral Convergence of the connection Laplacian from random samples

    Full text link
    Spectral methods that are based on eigenvectors and eigenvalues of discrete graph Laplacians, such as Diffusion Maps and Laplacian Eigenmaps are often used for manifold learning and non-linear dimensionality reduction. It was previously shown by Belkin and Niyogi \cite{belkin_niyogi:2007} that the eigenvectors and eigenvalues of the graph Laplacian converge to the eigenfunctions and eigenvalues of the Laplace-Beltrami operator of the manifold in the limit of infinitely many data points sampled independently from the uniform distribution over the manifold. Recently, we introduced Vector Diffusion Maps and showed that the connection Laplacian of the tangent bundle of the manifold can be approximated from random samples. In this paper, we present a unified framework for approximating other connection Laplacians over the manifold by considering its principle bundle structure. We prove that the eigenvectors and eigenvalues of these Laplacians converge in the limit of infinitely many independent random samples. We generalize the spectral convergence results to the case where the data points are sampled from a non-uniform distribution, and for manifolds with and without boundary

    Multiscale Geometric Methods for Data Sets II: Geometric Multi-Resolution Analysis

    Get PDF
    Data sets are often modeled as point clouds in RDR^D, for DD large. It is often assumed that the data has some interesting low-dimensional structure, for example that of a dd-dimensional manifold MM, with dd much smaller than DD. When MM is simply a linear subspace, one may exploit this assumption for encoding efficiently the data by projecting onto a dictionary of dd vectors in RDR^D (for example found by SVD), at a cost (n+D)d(n+D)d for nn data points. When MM is nonlinear, there are no "explicit" constructions of dictionaries that achieve a similar efficiency: typically one uses either random dictionaries, or dictionaries obtained by black-box optimization. In this paper we construct data-dependent multi-scale dictionaries that aim at efficient encoding and manipulating of the data. Their construction is fast, and so are the algorithms that map data points to dictionary coefficients and vice versa. In addition, data points are guaranteed to have a sparse representation in terms of the dictionary. We think of dictionaries as the analogue of wavelets, but for approximating point clouds rather than functions.Comment: Re-formatted using AMS styl

    Multiscale Geometric Methods for Data Sets I: Multiscale SVD, Noise and Curvature

    Get PDF
    Large data sets are often modeled as being noisy samples from probability distributions in R^D, with D large. It has been noticed that oftentimes the support M of these probability distributions seems to be well-approximated by low-dimensional sets, perhaps even by manifolds. We shall consider sets that are locally well approximated by k-dimensional planes, with k << D, with k-dimensional manifolds isometrically embedded in R^D being a special case. Samples from this distribution; are furthermore corrupted by D-dimensional noise. Certain tools from multiscale geometric measure theory and harmonic analysis seem well-suited to be adapted to the study of samples from such probability distributions, in order to yield quantitative geometric information about them. In this paper we introduce and study multiscale covariance matrices, i.e. covariances corresponding to the distribution restricted to a ball of radius r, with a fixed center and varying r, and under rather general geometric assumptions we study how their empirical, noisy counterparts behave. We prove that in the range of scales where these covariance matrices are most informative, the empirical, noisy covariances are close to their expected, noiseless counterparts. In fact, this is true as soon as the number of samples in the balls where the covariance matrices are computed is linear in the intrinsic dimension of M. As an application, we present an algorithm for estimating the intrinsic dimension of M
    corecore