241 research outputs found

    Optimal rates of convergence for persistence diagrams in Topological Data Analysis

    Full text link
    Computational topology has recently known an important development toward data analysis, giving birth to the field of topological data analysis. Topological persistence, or persistent homology, appears as a fundamental tool in this field. In this paper, we study topological persistence in general metric spaces, with a statistical approach. We show that the use of persistent homology can be naturally considered in general statistical frameworks and persistence diagrams can be used as statistics with interesting convergence properties. Some numerical experiments are performed in various contexts to illustrate our results

    A Statistical Approach to Topological Data Analysis

    Get PDF
    Until very recently, topological data analysis and topological inference methods mostlyrelied on deterministic approaches. The major part of this habilitation thesis presents astatistical approach to such topological methods. We first develop model selection toolsfor selecting simplicial complexes in a given filtration. Next, we study the estimationof persistent homology on metric spaces. We also study a robust version of topologicaldata analysis. Related to this last topic, we also investigate the problem of Wassersteindeconvolution. The second part of the habilitation thesis gathers our contributions inother fields of statistics, including a model selection method for Gaussian mixtures, animplementation of the slope heuristic for calibrating penalties, and a study of Breiman’spermutation importance measure in the context of random forests

    Estimating the Reach of a Manifold

    Get PDF
    Various problems in manifold estimation make use of a quantity called the reach, denoted by τ_M\tau\_M, which is a measure of the regularity of the manifold. This paper is the first investigation into the problem of how to estimate the reach. First, we study the geometry of the reach through an approximation perspective. We derive new geometric results on the reach for submanifolds without boundary. An estimator τ^\hat{\tau} of τ_M\tau\_{M} is proposed in a framework where tangent spaces are known, and bounds assessing its efficiency are derived. In the case of i.i.d. random point cloud X_n\mathbb{X}\_{n}, τ^(X_n)\hat{\tau}(\mathbb{X}\_{n}) is showed to achieve uniform expected loss bounds over a C3\mathcal{C}^3-like model. Finally, we obtain upper and lower bounds on the minimax rate for estimating the reach

    Statistical Analysis and Parameter Selection for Mapper

    Get PDF
    In this article, we study the question of the statistical convergence of the 1-dimensional Mapper to its continuous analogue, the Reeb graph. We show that the Mapper is an optimal estimator of the Reeb graph, which gives, as a byproduct, a method to automatically tune its parameters and compute confidence regions on its topological features, such as its loops and flares. This allows to circumvent the issue of testing a large grid of parameters and keeping the most stable ones in the brute-force setting, which is widely used in visualization, clustering and feature selection with the Mapper.Comment: Minor modification

    The bottleneck degree of algebraic varieties

    Full text link
    A bottleneck of a smooth algebraic variety XCnX \subset \mathbb{C}^n is a pair of distinct points (x,y)X(x,y) \in X such that the Euclidean normal spaces at xx and yy contain the line spanned by xx and yy. The narrowness of bottlenecks is a fundamental complexity measure in the algebraic geometry of data. In this paper we study the number of bottlenecks of affine and projective varieties, which we call the bottleneck degree. The bottleneck degree is a measure of the complexity of computing all bottlenecks of an algebraic variety, using for example numerical homotopy methods. We show that the bottleneck degree is a function of classical invariants such as Chern classes and polar classes. We give the formula explicitly in low dimension and provide an algorithm to compute it in the general case.Comment: Major revision. New introduction. Added some new illustrative lemmas and figures. Added pseudocode for the algorithm to compute bottleneck degree. Fixed some typo

    Improved rates for Wasserstein deconvolution with ordinary smooth error in dimension one

    Full text link
    This paper deals with the estimation of a probability measure on the real line from data observed with an additive noise. We are interested in rates of convergence for the Wasserstein metric of order p1p\geq 1. The distribution of the errors is assumed to be known and to belong to a class of supersmooth or ordinary smooth distributions. We obtain in the univariate situation an improved upper bound in the ordinary smooth case and less restrictive conditions for the existing bound in the supersmooth one. In the ordinary smooth case, a lower bound is also provided, and numerical experiments illustrating the rates of convergence are presented
    corecore