12 research outputs found
Spectral Convergence of the connection Laplacian from random samples
Spectral methods that are based on eigenvectors and eigenvalues of discrete
graph Laplacians, such as Diffusion Maps and Laplacian Eigenmaps are often used
for manifold learning and non-linear dimensionality reduction. It was
previously shown by Belkin and Niyogi \cite{belkin_niyogi:2007} that the
eigenvectors and eigenvalues of the graph Laplacian converge to the
eigenfunctions and eigenvalues of the Laplace-Beltrami operator of the manifold
in the limit of infinitely many data points sampled independently from the
uniform distribution over the manifold. Recently, we introduced Vector
Diffusion Maps and showed that the connection Laplacian of the tangent bundle
of the manifold can be approximated from random samples. In this paper, we
present a unified framework for approximating other connection Laplacians over
the manifold by considering its principle bundle structure. We prove that the
eigenvectors and eigenvalues of these Laplacians converge in the limit of
infinitely many independent random samples. We generalize the spectral
convergence results to the case where the data points are sampled from a
non-uniform distribution, and for manifolds with and without boundary
Multiscale Geometric Methods for Data Sets II: Geometric Multi-Resolution Analysis
Data sets are often modeled as point clouds in , for large. It is
often assumed that the data has some interesting low-dimensional structure, for
example that of a -dimensional manifold , with much smaller than .
When is simply a linear subspace, one may exploit this assumption for
encoding efficiently the data by projecting onto a dictionary of vectors in
(for example found by SVD), at a cost for data points. When
is nonlinear, there are no "explicit" constructions of dictionaries that
achieve a similar efficiency: typically one uses either random dictionaries, or
dictionaries obtained by black-box optimization. In this paper we construct
data-dependent multi-scale dictionaries that aim at efficient encoding and
manipulating of the data. Their construction is fast, and so are the algorithms
that map data points to dictionary coefficients and vice versa. In addition,
data points are guaranteed to have a sparse representation in terms of the
dictionary. We think of dictionaries as the analogue of wavelets, but for
approximating point clouds rather than functions.Comment: Re-formatted using AMS styl
Multiscale Geometric Methods for Data Sets I: Multiscale SVD, Noise and Curvature
Large data sets are often modeled as being noisy samples from probability distributions in R^D, with D large. It has been noticed that oftentimes the support M of these probability distributions seems to be well-approximated by low-dimensional sets, perhaps even by manifolds. We shall consider sets that are locally well approximated by k-dimensional planes, with k << D, with k-dimensional manifolds isometrically embedded in R^D being a special case. Samples from this distribution; are furthermore corrupted by D-dimensional noise. Certain tools from multiscale geometric measure theory and harmonic analysis seem well-suited to be adapted to the study of samples from such probability distributions, in order to yield quantitative geometric information about them. In this paper we introduce and study multiscale covariance matrices, i.e. covariances corresponding to the distribution restricted to a ball of radius r, with a fixed center and varying r, and under rather general geometric assumptions we study how their empirical, noisy counterparts behave. We prove that in the range of scales where these covariance matrices are most informative, the empirical, noisy covariances are close to their expected, noiseless counterparts. In fact, this is true as soon as the number of samples in the balls where the covariance matrices are computed is linear in the intrinsic dimension of M. As an application, we present an algorithm for estimating the intrinsic dimension of M