1,086 research outputs found

    Approximating Persistent Homology in Euclidean Space Through Collapses

    Full text link
    The \v{C}ech complex is one of the most widely used tools in applied algebraic topology. Unfortunately, due to the inclusive nature of the \v{C}ech filtration, the number of simplices grows exponentially in the number of input points. A practical consequence is that computations may have to terminate at smaller scales than what the application calls for. In this paper we propose two methods to approximate the \v{C}ech persistence module. Both are constructed on the level of spaces, i.e. as sequences of simplicial complexes induced by nerves. We also show how the bottleneck distance between such persistence modules can be understood by how tightly they are sandwiched on the level of spaces. In turn, this implies the correctness of our approximation methods. Finally, we implement our methods and apply them to some example point clouds in Euclidean space

    Deterministic Sampling and Range Counting in Geometric Data Streams

    Get PDF
    We present memory-efficient deterministic algorithms for constructing epsilon-nets and epsilon-approximations of streams of geometric data. Unlike probabilistic approaches, these deterministic samples provide guaranteed bounds on their approximation factors. We show how our deterministic samples can be used to answer approximate online iceberg geometric queries on data streams. We use these techniques to approximate several robust statistics of geometric data streams, including Tukey depth, simplicial depth, regression depth, the Thiel-Sen estimator, and the least median of squares. Our algorithms use only a polylogarithmic amount of memory, provided the desired approximation factors are inverse-polylogarithmic. We also include a lower bound for non-iceberg geometric queries.Comment: 12 pages, 1 figur

    Sparse Nerves in Practice

    Get PDF
    Topological data analysis combines machine learning with methods from algebraic topology. Persistent homology, a method to characterize topological features occurring in data at multiple scales is of particular interest. A major obstacle to the wide-spread use of persistent homology is its computational complexity. In order to be able to calculate persistent homology of large datasets, a number of approximations can be applied in order to reduce its complexity. We propose algorithms for calculation of approximate sparse nerves for classes of Dowker dissimilarities including all finite Dowker dissimilarities and Dowker dissimilarities whose homology is Cech persistent homology. All other sparsification methods and software packages that we are aware of calculate persistent homology with either an additive or a multiplicative interleaving. In dowker_homology, we allow for any non-decreasing interleaving function Ī±\alpha. We analyze the computational complexity of the algorithms and present some benchmarks. For Euclidean data in dimensions larger than three, the sizes of simplicial complexes we create are in general smaller than the ones created by SimBa. Especially when calculating persistent homology in higher homology dimensions, the differences can become substantial

    Adaptive Data Depth via Multi-Armed Bandits

    Full text link
    Data depth, introduced by Tukey (1975), is an important tool in data science, robust statistics, and computational geometry. One chief barrier to its broader practical utility is that many common measures of depth are computationally intensive, requiring on the order of ndn^d operations to exactly compute the depth of a single point within a data set of nn points in dd-dimensional space. Often however, we are not directly interested in the absolute depths of the points, but rather in their relative ordering. For example, we may want to find the most central point in a data set (a generalized median), or to identify and remove all outliers (points on the fringe of the data set with low depth). With this observation, we develop a novel and instance-adaptive algorithm for adaptive data depth computation by reducing the problem of exactly computing nn depths to an nn-armed stochastic multi-armed bandit problem which we can efficiently solve. We focus our exposition on simplicial depth, developed by Liu (1990), which has emerged as a promising notion of depth due to its interpretability and asymptotic properties. We provide general instance-dependent theoretical guarantees for our proposed algorithms, which readily extend to many other common measures of data depth including majority depth, Oja depth, and likelihood depth. When specialized to the case where the gaps in the data follow a power law distribution with parameter Ī±<2\alpha<2, we show that we can reduce the complexity of identifying the deepest point in the data set (the simplicial median) from O(nd)O(n^d) to O~(ndāˆ’(dāˆ’1)Ī±/2)\tilde{O}(n^{d-(d-1)\alpha/2}), where O~\tilde{O} suppresses logarithmic factors. We corroborate our theoretical results with numerical experiments on synthetic data, showing the practical utility of our proposed methods.Comment: Keywords: multi-armed bandits, data depth, adaptivity, large-scale computation, simplicial dept

    A Nonparametric Multivariate Control Chart Based on Data Depth

    Get PDF
    For the design of most multivariate control charts, it is assumed that the observations follow a multivariate normal distribution. In practice, this assumption is rarely satisfied. In this work, a distribution-free EWMA control chart for multivariate processes is proposed. This chart is based on equential rank of data depth measures. --
    • ā€¦
    corecore