1,260 research outputs found

    Rates of convergence for robust geometric inference

    Get PDF
    Distances to compact sets are widely used in the field of Topological Data Analysis for inferring geometric and topological features from point clouds. In this context, the distance to a probability measure (DTM) has been introduced by Chazal et al. (2011) as a robust alternative to the distance a compact set. In practice, the DTM can be estimated by its empirical counterpart, that is the distance to the empirical measure (DTEM). In this paper we give a tight control of the deviation of the DTEM. Our analysis relies on a local analysis of empirical processes. In particular, we show that the rates of convergence of the DTEM directly depends on the regularity at zero of a particular quantile fonction which contains some local information about the geometry of the support. This quantile function is the relevant quantity to describe precisely how difficult is a geometric inference problem. Several numerical experiments illustrate the convergence of the DTEM and also confirm that our bounds are tight

    DTM-based Filtrations

    Full text link
    Despite strong stability properties, the persistent homology of filtrations classically used in Topological Data Analysis, such as, e.g. the Cech or Vietoris-Rips filtrations, are very sensitive to the presence of outliers in the data from which they are computed. In this paper, we introduce and study a new family of filtrations, the DTM-filtrations, built on top of point clouds in the Euclidean space which are more robust to noise and outliers. The approach adopted in this work relies on the notion of distance-to-measure functions, and extends some previous work on the approximation of such functions.Comment: Abel Symposia, Springer, In press, Topological Data Analysi

    A Statistical Approach to Topological Data Analysis

    Get PDF
    Until very recently, topological data analysis and topological inference methods mostlyrelied on deterministic approaches. The major part of this habilitation thesis presents astatistical approach to such topological methods. We first develop model selection toolsfor selecting simplicial complexes in a given filtration. Next, we study the estimationof persistent homology on metric spaces. We also study a robust version of topologicaldata analysis. Related to this last topic, we also investigate the problem of Wassersteindeconvolution. The second part of the habilitation thesis gathers our contributions inother fields of statistics, including a model selection method for Gaussian mixtures, animplementation of the slope heuristic for calibrating penalties, and a study of Breiman’spermutation importance measure in the context of random forests

    A Unifying Model of Genome Evolution Under Parsimony

    Get PDF
    We present a data structure called a history graph that offers a practical basis for the analysis of genome evolution. It conceptually simplifies the study of parsimonious evolutionary histories by representing both substitutions and double cut and join (DCJ) rearrangements in the presence of duplications. The problem of constructing parsimonious history graphs thus subsumes related maximum parsimony problems in the fields of phylogenetic reconstruction and genome rearrangement. We show that tractable functions can be used to define upper and lower bounds on the minimum number of substitutions and DCJ rearrangements needed to explain any history graph. These bounds become tight for a special type of unambiguous history graph called an ancestral variation graph (AVG), which constrains in its combinatorial structure the number of operations required. We finally demonstrate that for a given history graph GG, a finite set of AVGs describe all parsimonious interpretations of GG, and this set can be explored with a few sampling moves.Comment: 52 pages, 24 figure

    Tight bounds for the learning of homotopy \`a la Niyogi, Smale, and Weinberger for subsets of Euclidean spaces and of Riemannian manifolds

    Full text link
    In this article we extend and strengthen the seminal work by Niyogi, Smale, and Weinberger on the learning of the homotopy type from a sample of an underlying space. In their work, Niyogi, Smale, and Weinberger studied samples of C2C^2 manifolds with positive reach embedded in Rd\mathbb{R}^d. We extend their results in the following ways: In the first part of our paper we consider both manifolds of positive reach -- a more general setting than C2C^2 manifolds -- and sets of positive reach embedded in Rd\mathbb{R}^d. The sample PP of such a set S\mathcal{S} does not have to lie directly on it. Instead, we assume that the two one-sided Hausdorff distances -- ε\varepsilon and δ\delta -- between PP and S\mathcal{S} are bounded. We provide explicit bounds in terms of ε\varepsilon and δ \delta, that guarantee that there exists a parameter rr such that the union of balls of radius rr centred at the sample PP deformation-retracts to S\mathcal{S}. In the second part of our paper we study homotopy learning in a significantly more general setting -- we investigate sets of positive reach and submanifolds of positive reach embedded in a \emph{Riemannian manifold with bounded sectional curvature}. To this end we introduce a new version of the reach in the Riemannian setting inspired by the cut locus. Yet again, we provide tight bounds on ε\varepsilon and δ\delta for both cases (submanifolds as well as sets of positive reach), exhibiting the tightness by an explicit construction.Comment: 74 pages, 29 figure
    • …
    corecore