813 research outputs found

    Declutter and Resample: Towards parameter free denoising

    Get PDF
    In many data analysis applications the following scenario is commonplace: we are given a point set that is supposed to sample a hidden ground truth KK in a metric space, but it got corrupted with noise so that some of the data points lie far away from KK creating outliers also termed as {\em ambient noise}. One of the main goals of denoising algorithms is to eliminate such noise so that the curated data lie within a bounded Hausdorff distance of KK. Popular denoising approaches such as deconvolution and thresholding often require the user to set several parameters and/or to choose an appropriate noise model while guaranteeing only asymptotic convergence. Our goal is to lighten this burden as much as possible while ensuring theoretical guarantees in all cases. Specifically, first, we propose a simple denoising algorithm that requires only a single parameter but provides a theoretical guarantee on the quality of the output on general input points. We argue that this single parameter cannot be avoided. We next present a simple algorithm that avoids even this parameter by paying for it with a slight strengthening of the sampling condition on the input points which is not unrealistic. We also provide some preliminary empirical evidence that our algorithms are effective in practice

    Approximating Loops in a Shortest Homology Basis from Point Data

    Full text link
    Inference of topological and geometric attributes of a hidden manifold from its point data is a fundamental problem arising in many scientific studies and engineering applications. In this paper we present an algorithm to compute a set of loops from a point data that presumably sample a smooth manifold MRdM\subset \mathbb{R}^d. These loops approximate a {\em shortest} basis of the one dimensional homology group H1(M)H_1(M) over coefficients in finite field Z2\mathbb{Z}_2. Previous results addressed the issue of computing the rank of the homology groups from point data, but there is no result on approximating the shortest basis of a manifold from its point sample. In arriving our result, we also present a polynomial time algorithm for computing a shortest basis of H1(K)H_1(K) for any finite {\em simplicial complex} KK whose edges have non-negative weights

    Towards Persistence-Based Reconstruction in Euclidean Spaces

    Get PDF
    Manifold reconstruction has been extensively studied for the last decade or so, especially in two and three dimensions. Recently, significant improvements were made in higher dimensions, leading to new methods to reconstruct large classes of compact subsets of Euclidean space Rd\R^d. However, the complexities of these methods scale up exponentially with d, which makes them impractical in medium or high dimensions, even for handling low-dimensional submanifolds. In this paper, we introduce a novel approach that stands in-between classical reconstruction and topological estimation, and whose complexity scales up with the intrinsic dimension of the data. Specifically, when the data points are sufficiently densely sampled from a smooth mm-submanifold of Rd\R^d, our method retrieves the homology of the submanifold in time at most c(m)n5c(m)n^5, where nn is the size of the input and c(m)c(m) is a constant depending solely on mm. It can also provably well handle a wide range of compact subsets of Rd\R^d, though with worse complexities. Along the way to proving the correctness of our algorithm, we obtain new results on \v{C}ech, Rips, and witness complex filtrations in Euclidean spaces

    Manifold Fitting

    Full text link
    While classical data analysis has addressed observations that are real numbers or elements of a real vector space, at present many statistical problems of high interest in the sciences address the analysis of data that consist of more complex objects, taking values in spaces that are naturally not (Euclidean) vector spaces but which still feature some geometric structure. Manifold fitting is a long-standing problem, and has finally been addressed in recent years by Fefferman et al. (2020, 2021a). We develop a method with a theory guarantee that fits a dd-dimensional underlying manifold from noisy observations sampled in the ambient space RD\mathbb{R}^D. The new approach uses geometric structures to obtain the manifold estimator in the form of image sets via a two-step mapping approach. We prove that, under certain mild assumptions and with a sample size N=O(σ(d+3))N=\mathcal{O}(\sigma^{(-d+3)}), these estimators are true dd-dimensional smooth manifolds whose estimation error, as measured by the Hausdorff distance, is bounded by O(σ2log(1/σ))\mathcal{O}(\sigma^2\log(1/\sigma)) with high probability. Compared with the existing approaches proposed in Fefferman et al. (2018, 2021b); Genovese et al. (2014); Yao and Xia (2019), our method exhibits superior efficiency while attaining very low error rates with a significantly reduced sample size, which scales polynomially in σ1\sigma^{-1} and exponentially in dd. Extensive simulations are performed to validate our theoretical results. Our findings are relevant to various fields involving high-dimensional data in machine learning. Furthermore, our method opens up new avenues for existing non-Euclidean statistical methods in the sense that it has the potential to unify them to analyze data on manifolds in the ambience space domain.Comment: 60 page

    What's the Situation with Intelligent Mesh Generation: A Survey and Perspectives

    Full text link
    Intelligent Mesh Generation (IMG) represents a novel and promising field of research, utilizing machine learning techniques to generate meshes. Despite its relative infancy, IMG has significantly broadened the adaptability and practicality of mesh generation techniques, delivering numerous breakthroughs and unveiling potential future pathways. However, a noticeable void exists in the contemporary literature concerning comprehensive surveys of IMG methods. This paper endeavors to fill this gap by providing a systematic and thorough survey of the current IMG landscape. With a focus on 113 preliminary IMG methods, we undertake a meticulous analysis from various angles, encompassing core algorithm techniques and their application scope, agent learning objectives, data types, targeted challenges, as well as advantages and limitations. We have curated and categorized the literature, proposing three unique taxonomies based on key techniques, output mesh unit elements, and relevant input data types. This paper also underscores several promising future research directions and challenges in IMG. To augment reader accessibility, a dedicated IMG project page is available at \url{https://github.com/xzb030/IMG_Survey}
    corecore