813 research outputs found
Declutter and Resample: Towards parameter free denoising
In many data analysis applications the following scenario is commonplace: we
are given a point set that is supposed to sample a hidden ground truth in a
metric space, but it got corrupted with noise so that some of the data points
lie far away from creating outliers also termed as {\em ambient noise}. One
of the main goals of denoising algorithms is to eliminate such noise so that
the curated data lie within a bounded Hausdorff distance of . Popular
denoising approaches such as deconvolution and thresholding often require the
user to set several parameters and/or to choose an appropriate noise model
while guaranteeing only asymptotic convergence. Our goal is to lighten this
burden as much as possible while ensuring theoretical guarantees in all cases.
Specifically, first, we propose a simple denoising algorithm that requires
only a single parameter but provides a theoretical guarantee on the quality of
the output on general input points. We argue that this single parameter cannot
be avoided. We next present a simple algorithm that avoids even this parameter
by paying for it with a slight strengthening of the sampling condition on the
input points which is not unrealistic. We also provide some preliminary
empirical evidence that our algorithms are effective in practice
Approximating Loops in a Shortest Homology Basis from Point Data
Inference of topological and geometric attributes of a hidden manifold from
its point data is a fundamental problem arising in many scientific studies and
engineering applications. In this paper we present an algorithm to compute a
set of loops from a point data that presumably sample a smooth manifold
. These loops approximate a {\em shortest} basis of the
one dimensional homology group over coefficients in finite field
. Previous results addressed the issue of computing the rank of
the homology groups from point data, but there is no result on approximating
the shortest basis of a manifold from its point sample. In arriving our result,
we also present a polynomial time algorithm for computing a shortest basis of
for any finite {\em simplicial complex} whose edges have
non-negative weights
Towards Persistence-Based Reconstruction in Euclidean Spaces
Manifold reconstruction has been extensively studied for the last decade or
so, especially in two and three dimensions. Recently, significant improvements
were made in higher dimensions, leading to new methods to reconstruct large
classes of compact subsets of Euclidean space . However, the complexities
of these methods scale up exponentially with d, which makes them impractical in
medium or high dimensions, even for handling low-dimensional submanifolds. In
this paper, we introduce a novel approach that stands in-between classical
reconstruction and topological estimation, and whose complexity scales up with
the intrinsic dimension of the data. Specifically, when the data points are
sufficiently densely sampled from a smooth -submanifold of , our
method retrieves the homology of the submanifold in time at most ,
where is the size of the input and is a constant depending solely on
. It can also provably well handle a wide range of compact subsets of
, though with worse complexities. Along the way to proving the
correctness of our algorithm, we obtain new results on \v{C}ech, Rips, and
witness complex filtrations in Euclidean spaces
Manifold Fitting
While classical data analysis has addressed observations that are real
numbers or elements of a real vector space, at present many statistical
problems of high interest in the sciences address the analysis of data that
consist of more complex objects, taking values in spaces that are naturally not
(Euclidean) vector spaces but which still feature some geometric structure.
Manifold fitting is a long-standing problem, and has finally been addressed in
recent years by Fefferman et al. (2020, 2021a). We develop a method with a
theory guarantee that fits a -dimensional underlying manifold from noisy
observations sampled in the ambient space . The new approach uses
geometric structures to obtain the manifold estimator in the form of image sets
via a two-step mapping approach. We prove that, under certain mild assumptions
and with a sample size , these estimators are
true -dimensional smooth manifolds whose estimation error, as measured by
the Hausdorff distance, is bounded by
with high probability. Compared with the existing approaches proposed in
Fefferman et al. (2018, 2021b); Genovese et al. (2014); Yao and Xia (2019), our
method exhibits superior efficiency while attaining very low error rates with a
significantly reduced sample size, which scales polynomially in
and exponentially in . Extensive simulations are performed to validate our
theoretical results. Our findings are relevant to various fields involving
high-dimensional data in machine learning. Furthermore, our method opens up new
avenues for existing non-Euclidean statistical methods in the sense that it has
the potential to unify them to analyze data on manifolds in the ambience space
domain.Comment: 60 page
What's the Situation with Intelligent Mesh Generation: A Survey and Perspectives
Intelligent Mesh Generation (IMG) represents a novel and promising field of
research, utilizing machine learning techniques to generate meshes. Despite its
relative infancy, IMG has significantly broadened the adaptability and
practicality of mesh generation techniques, delivering numerous breakthroughs
and unveiling potential future pathways. However, a noticeable void exists in
the contemporary literature concerning comprehensive surveys of IMG methods.
This paper endeavors to fill this gap by providing a systematic and thorough
survey of the current IMG landscape. With a focus on 113 preliminary IMG
methods, we undertake a meticulous analysis from various angles, encompassing
core algorithm techniques and their application scope, agent learning
objectives, data types, targeted challenges, as well as advantages and
limitations. We have curated and categorized the literature, proposing three
unique taxonomies based on key techniques, output mesh unit elements, and
relevant input data types. This paper also underscores several promising future
research directions and challenges in IMG. To augment reader accessibility, a
dedicated IMG project page is available at
\url{https://github.com/xzb030/IMG_Survey}
- …