14 research outputs found

    Approximating Local Homology from Samples

    Full text link
    Recently, multi-scale notions of local homology (a variant of persistent homology) have been used to study the local structure of spaces around a given point from a point cloud sample. Current reconstruction guarantees rely on constructing embedded complexes which become difficult in high dimensions. We show that the persistence diagrams used for estimating local homology, can be approximated using families of Vietoris-Rips complexes, whose simple constructions are robust in any dimension. To the best of our knowledge, our results, for the first time, make applications based on local homology, such as stratification learning, feasible in high dimensions.Comment: 23 pages, 14 figure

    The Normalized Graph Cut and Cheeger Constant: from Discrete to Continuous

    Full text link
    Let M be a bounded domain of a Euclidian space with smooth boundary. We relate the Cheeger constant of M and the conductance of a neighborhood graph defined on a random sample from M. By restricting the minimization defining the latter over a particular class of subsets, we obtain consistency (after normalization) as the sample size increases, and show that any minimizing sequence of subsets has a subsequence converging to a Cheeger set of M

    The Theory of the Interleaving Distance on Multidimensional Persistence Modules

    Full text link
    In 2009, Chazal et al. introduced ϵ\epsilon-interleavings of persistence modules. ϵ\epsilon-interleavings induce a pseudometric dId_I on (isomorphism classes of) persistence modules, the interleaving distance. The definitions of ϵ\epsilon-interleavings and dId_I generalize readily to multidimensional persistence modules. In this paper, we develop the theory of multidimensional interleavings, with a view towards applications to topological data analysis. We present four main results. First, we show that on 1-D persistence modules, dId_I is equal to the bottleneck distance dBd_B. This result, which first appeared in an earlier preprint of this paper, has since appeared in several other places, and is now known as the isometry theorem. Second, we present a characterization of the ϵ\epsilon-interleaving relation on multidimensional persistence modules. This expresses transparently the sense in which two ϵ\epsilon-interleaved modules are algebraically similar. Third, using this characterization, we show that when we define our persistence modules over a prime field, dId_I satisfies a universality property. This universality result is the central result of the paper. It says that dId_I satisfies a stability property generalizing one which dBd_B is known to satisfy, and that in addition, if dd is any other pseudometric on multidimensional persistence modules satisfying the same stability property, then d≤dId\leq d_I. We also show that a variant of this universality result holds for dBd_B, over arbitrary fields. Finally, we show that dId_I restricts to a metric on isomorphism classes of finitely presented multidimensional persistence modules.Comment: Major revision; exposition improved throughout. To appear in Foundations of Computational Mathematics. 36 page

    Data Analysis with the Morse-Smale Complex: The msr Package for R

    Get PDF
    In many areas, scientists deal with increasingly high-dimensional data sets. An important aspect for these scientists is to gain a qualitative understanding of the process or system from which the data is gathered. Often, both input variables and an outcome are observed and the data can be characterized as a sample from a high-dimensional scalar function. This work presents the R package msr for exploratory data analysis of multivariate scalar functions based on the Morse-Smale complex. The Morse-Smale complex provides a topologically meaningful decomposition of the domain. The msr package implements a discrete approximation of the Morse-Smale complex for data sets. In previous work this approximation has been exploited for visualization and partition-based regression, which are both supported in the msr package. The visualization combines the Morse-Smale complex with dimension-reduction techniques for a visual summary representation that serves as a guide for interactive exploration of the high-dimensional function. In a similar fashion, the regression employs a combination of linear models based on the Morse-Smale decomposition of the domain. This regression approach yields topologically accurate estimates and facilitates interpretation of general trends and statistical comparisons between partitions. In this manner, the msr package supports high-dimensional data understanding and exploration through the Morse-Smale complex

    A new approximation Algorithm for the Matching Distance in Multidimensional Persistence

    Get PDF
    Topological Persistence has proven to be a promising framework for dealing with problems concerning shape analysis and comparison. In this contexts, it was originally introduced by taking into account 1-dimensional properties of shapes, modeled by real-valued functions. More recently, Topological Persistence has been generalized to consider multidimensional properties of shapes, coded by vector-valued functions. This extension has led to introduce suitable shape descriptors, named the multidimensional persistence Betti numbers functions, and a distance to compare them, the so-called multidimensional matching distance. In this paper we propose a new computational framework to deal with the multidimensional matching distance. We start by proving some new theoretical results, and then we use them to formulate an algorithm for computing such a distance up to an arbitrary threshold error

    Failure Filtrations for Fenced Sensor Networks

    Full text link
    In this paper we consider the question of sensor network coverage for a 2-dimensional domain. We seek to compute the probability that a set of sensors fails to cover given only non-metric, local (who is talking to whom) information and a probability distribution of failure of each node. This builds on the work of de Silva and Ghrist who analyzed this problem in the deterministic situation. We first show that a it is part of a slightly larger class of problems which is #P-complete, and thus fast algorithms likely do not exist unless P==NP. We then give a deterministic algorithm which is feasible in the case of a small set of sensors, and give a dynamic algorithm for an arbitrary set of sensors failing over time which utilizes a new criterion for coverage based on the one proposed by de Silva and Ghrist. These algorithms build on the theory of topological persistence

    Limit theory for point processes in manifolds

    Full text link
    Let Yi,i≥1Y_i,i\geq1, be i.i.d. random variables having values in an mm-dimensional manifold M⊂Rd\mathcal {M}\subset \mathbb{R}^d and consider sums ∑i=1nξ(n1/mYi,{n1/mYj}j=1n)\sum_{i=1}^n\xi(n^{1/m}Y_i,\{n^{1/m}Y_j\}_{j=1}^n), where ξ\xi is a real valued function defined on pairs (y,Y)(y,\mathcal {Y}), with y∈Rdy\in \mathbb{R}^d and Y⊂Rd\mathcal {Y}\subset \mathbb{R}^d locally finite. Subject to ξ\xi satisfying a weak spatial dependence and continuity condition, we show that such sums satisfy weak laws of large numbers, variance asymptotics and central limit theorems. We show that the limit behavior is controlled by the value of ξ\xi on homogeneous Poisson point processes on mm-dimensional hyperplanes tangent to M\mathcal {M}. We apply the general results to establish the limit theory of dimension and volume content estimators, R\'{e}nyi and Shannon entropy estimators and clique counts in the Vietoris-Rips complex on {Yi}i=1n\{Y_i\}_{i=1}^n.Comment: Published in at http://dx.doi.org/10.1214/12-AAP897 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Curvature as a Collective Coordinate in Enhanced Sampling Membrane Simulations

    Get PDF
    International audienceThe plasticity of membranes plays an important functional role in cells, cell components, and micelles, where bending, budding, and remodeling implement numerous recognition and communication processes. Comparatively, molecular simulation methods to induce, control, and quantitatively characterize such deformations remain scarce. This work defines a novel collective coordinate associated with membrane bending, which strives to combine realism (by preserving the notion of local atomic curvatures) and low computational cost (allowing its evaluation at every time step of a molecular dynamics simulation). Enhanced sampling simulations along this conformational coordinate provide convenient access to the underlying bending free energy landscape. To showcase its potential, the method is applied to three state-of-the-art problems: the determination of the bending free energy landscape of a 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphoethanolamine (POPE) bilayer, the formation of a POPE liposome, and the study of the influence of the Pseudomonas quinolone signal on the budding of Gram-negative bacterial outer membranes

    Persistence-Based Clustering in Riemannian Manifolds

    Get PDF
    We present a novel clustering algorithm that combines a mode-seeking phase with a cluster merging phase. While mode detection is performed by a standard graph-based hill-climbing scheme, the novelty of our approach resides in its use of {\em topological persistence} theory to guide the merges between clusters. An interesting feature of our algorithm is to provide additional feedback in the form of a finite set of points in the plane, called a {\em persistence diagram}, which provably reflects the prominence of each of the modes of the density. Such feedback is an invaluable tool in practice, as it enables the user to determine a set of parameter values that will make the algorithm compute a relevant clustering on the next run. In terms of generality, our approach requires the sole knowledge of (approximate) pairwise distances between the data points, as well as of rough estimates of the density at these points. It is therefore virtually applicable in any arbitrary metric space. In the meantime, its complexity remains reasonable: although the size of the input distance matrix may be up to quadratic in the number of data points, a careful implementation only uses a linear amount of main memory and barely takes more time to run than the one spent reading the input. Taking advantage of recent advances in topological persistence theory, we are able to give a theoretically sound notion of what the {\em correct} number kk of clusters is, and to prove that under mild sampling conditions and a relevant choice of parameters (made possible in practice by the persistence diagram) our clustering scheme computes a set of kk clusters whose spatial locations are bound to the ones of the basins of attraction of the peaks of the density. These guarantess hold in a large variety of contexts, including when data points are distributed along some unknown Riemannian manifold
    corecore