39,071 research outputs found

    Generating low-dimensional denoised representations of nonlinear data with superparamagnetic agents

    Get PDF
    Copyright ©2016 IEICEVisualisation of high-dimensional data by means of a low-dimensional embedding plays a key role in explorative data analysis. Classical approaches to dimensionality reduction, such as principal component analysis (PCA) and multidimensional scaling (MDS), struggle or even fail to reveal the relevant data characteristics when applied to noisy or nonlinear data structures. We present a novel approach for dimensionality reduction in combination with an automatic noise cleaning. By employing self-organising agents that are governed by the dynamics of the superparamagnetic clustering algorithm, the method is able to generate denoised low-dimensional embeddings for which the characteristics of nonlinear data structures are preserved or even emphasised. These properties are illustrated and compared to other approaches by means of toy and real-world examples

    Non-Asymptotic Analysis of Tangent Space Perturbation

    Full text link
    Constructing an efficient parameterization of a large, noisy data set of points lying close to a smooth manifold in high dimension remains a fundamental problem. One approach consists in recovering a local parameterization using the local tangent plane. Principal component analysis (PCA) is often the tool of choice, as it returns an optimal basis in the case of noise-free samples from a linear subspace. To process noisy data samples from a nonlinear manifold, PCA must be applied locally, at a scale small enough such that the manifold is approximately linear, but at a scale large enough such that structure may be discerned from noise. Using eigenspace perturbation theory and non-asymptotic random matrix theory, we study the stability of the subspace estimated by PCA as a function of scale, and bound (with high probability) the angle it forms with the true tangent space. By adaptively selecting the scale that minimizes this bound, our analysis reveals an appropriate scale for local tangent plane recovery. We also introduce a geometric uncertainty principle quantifying the limits of noise-curvature perturbation for stable recovery. With the purpose of providing perturbation bounds that can be used in practice, we propose plug-in estimates that make it possible to directly apply the theoretical results to real data sets.Comment: 53 pages. Revised manuscript with new content addressing application of results to real data set

    Universal rank-order transform to extract signals from noisy data

    Get PDF
    We introduce an ordinate method for noisy data analysis, based solely on rank information and thus insensitive to outliers. The method is nonparametric and objective, and the required data processing is parsimonious. The main ingredients include a rank-order data matrix and its transform to a stable form, which provide linear trends in excellent agreement with least squares regression, despite the loss of magnitude information. A group symmetry orthogonal decomposition of the 2D rank-order transform for iid (white) noise is further ordered by principal component analysis. This two-step procedure provides a noise “etalon” used to characterize arbitrary stationary stochastic processes. The method readily distinguishes both the Ornstein-Uhlenbeck process and chaos generated by the logistic map from white noise. Ranking within randomness differs fundamentally from that in deterministic chaos and signals, thus forming the basis for signal detection. To further illustrate the breadth of applications, we apply this ordinate method to the canonical nonlinear parameter estimation problem of two-species radioactive decay, outperforming special-purpose least squares software. We demonstrate that the method excels when extracting trends in heavy-tailed noise and, unlike the Thiele-Sen estimator, is not limited to linear regression. A simple expression is given that yields a close approximation for signal extraction of an underlying, generally nonlinear signal
    • …
    corecore