5,223 research outputs found

    Tangent space estimation for smooth embeddings of Riemannian manifolds

    Get PDF
    Numerous dimensionality reduction problems in data analysis involve the recovery of low-dimensional models or the learning of manifolds underlying sets of data. Many manifold learning methods require the estimation of the tangent space of the manifold at a point from locally available data samples. Local sampling conditions such as (i) the size of the neighborhood (sampling width) and (ii) the number of samples in the neighborhood (sampling density) affect the performance of learning algorithms. In this work, we propose a theoretical analysis of local sampling conditions for the estimation of the tangent space at a point P lying on a m-dimensional Riemannian manifold S in R^n. Assuming a smooth embedding of S in R^n, we estimate the tangent space T_P S by performing a Principal Component Analysis (PCA) on points sampled from the neighborhood of P on S. Our analysis explicitly takes into account the second order properties of the manifold at P, namely the principal curvatures as well as the higher order terms. We consider a random sampling framework and leverage recent results from random matrix theory to derive conditions on the sampling width and the local sampling density for an accurate estimation of tangent subspaces. We measure the estimation accuracy by the angle between the estimated tangent space and the true tangent space T_P S and we give conditions for this angle to be bounded with high probability. In particular, we observe that the local sampling conditions are highly dependent on the correlation between the components in the second-order local approximation of the manifold. We finally provide numerical simulations to validate our theoretical findings

    Uncertainty-Aware Principal Component Analysis

    Full text link
    We present a technique to perform dimensionality reduction on data that is subject to uncertainty. Our method is a generalization of traditional principal component analysis (PCA) to multivariate probability distributions. In comparison to non-linear methods, linear dimensionality reduction techniques have the advantage that the characteristics of such probability distributions remain intact after projection. We derive a representation of the PCA sample covariance matrix that respects potential uncertainty in each of the inputs, building the mathematical foundation of our new method: uncertainty-aware PCA. In addition to the accuracy and performance gained by our approach over sampling-based strategies, our formulation allows us to perform sensitivity analysis with regard to the uncertainty in the data. For this, we propose factor traces as a novel visualization that enables to better understand the influence of uncertainty on the chosen principal components. We provide multiple examples of our technique using real-world datasets. As a special case, we show how to propagate multivariate normal distributions through PCA in closed form. Furthermore, we discuss extensions and limitations of our approach

    A note on the Hanson-Wright inequality for random vectors with dependencies

    Full text link
    We prove that quadratic forms in isotropic random vectors XX in Rn\mathbb{R}^n, possessing the convex concentration property with constant KK, satisfy the Hanson-Wright inequality with constant CKCK, where CC is an absolute constant, thus eliminating the logarithmic (in the dimension) factors in a recent estimate by Vu and Wang. We also show that the concentration inequality for all Lipschitz functions implies a uniform version of the Hanson-Wright inequality for suprema of quadratic forms (in the spirit of the inequalities by Borell, Arcones-Gin\'e and Ledoux-Talagrand). Previous results of this type relied on stronger isoperimetric properties of XX and in some cases provided an upper bound on the deviations rather than a concentration inequality. In the last part of the paper we show that the uniform version of the Hanson-Wright inequality for Gaussian vectors can be used to recover a recent concentration inequality for empirical estimators of the covariance operator of BB-valued Gaussian variables due to Koltchinskii and Lounici
    • …
    corecore