1,086 research outputs found
Approximating Persistent Homology in Euclidean Space Through Collapses
The \v{C}ech complex is one of the most widely used tools in applied
algebraic topology. Unfortunately, due to the inclusive nature of the \v{C}ech
filtration, the number of simplices grows exponentially in the number of input
points. A practical consequence is that computations may have to terminate at
smaller scales than what the application calls for.
In this paper we propose two methods to approximate the \v{C}ech persistence
module. Both are constructed on the level of spaces, i.e. as sequences of
simplicial complexes induced by nerves. We also show how the bottleneck
distance between such persistence modules can be understood by how tightly they
are sandwiched on the level of spaces. In turn, this implies the correctness of
our approximation methods.
Finally, we implement our methods and apply them to some example point clouds
in Euclidean space
Deterministic Sampling and Range Counting in Geometric Data Streams
We present memory-efficient deterministic algorithms for constructing
epsilon-nets and epsilon-approximations of streams of geometric data. Unlike
probabilistic approaches, these deterministic samples provide guaranteed bounds
on their approximation factors. We show how our deterministic samples can be
used to answer approximate online iceberg geometric queries on data streams. We
use these techniques to approximate several robust statistics of geometric data
streams, including Tukey depth, simplicial depth, regression depth, the
Thiel-Sen estimator, and the least median of squares. Our algorithms use only a
polylogarithmic amount of memory, provided the desired approximation factors
are inverse-polylogarithmic. We also include a lower bound for non-iceberg
geometric queries.Comment: 12 pages, 1 figur
Sparse Nerves in Practice
Topological data analysis combines machine learning with methods from
algebraic topology. Persistent homology, a method to characterize topological
features occurring in data at multiple scales is of particular interest. A
major obstacle to the wide-spread use of persistent homology is its
computational complexity. In order to be able to calculate persistent homology
of large datasets, a number of approximations can be applied in order to reduce
its complexity. We propose algorithms for calculation of approximate sparse
nerves for classes of Dowker dissimilarities including all finite Dowker
dissimilarities and Dowker dissimilarities whose homology is Cech persistent
homology. All other sparsification methods and software packages that we are
aware of calculate persistent homology with either an additive or a
multiplicative interleaving. In dowker_homology, we allow for any
non-decreasing interleaving function . We analyze the computational
complexity of the algorithms and present some benchmarks. For Euclidean data in
dimensions larger than three, the sizes of simplicial complexes we create are
in general smaller than the ones created by SimBa. Especially when calculating
persistent homology in higher homology dimensions, the differences can become
substantial
Adaptive Data Depth via Multi-Armed Bandits
Data depth, introduced by Tukey (1975), is an important tool in data science,
robust statistics, and computational geometry. One chief barrier to its broader
practical utility is that many common measures of depth are computationally
intensive, requiring on the order of operations to exactly compute the
depth of a single point within a data set of points in -dimensional
space. Often however, we are not directly interested in the absolute depths of
the points, but rather in their relative ordering. For example, we may want to
find the most central point in a data set (a generalized median), or to
identify and remove all outliers (points on the fringe of the data set with low
depth). With this observation, we develop a novel and instance-adaptive
algorithm for adaptive data depth computation by reducing the problem of
exactly computing depths to an -armed stochastic multi-armed bandit
problem which we can efficiently solve. We focus our exposition on simplicial
depth, developed by Liu (1990), which has emerged as a promising notion of
depth due to its interpretability and asymptotic properties. We provide general
instance-dependent theoretical guarantees for our proposed algorithms, which
readily extend to many other common measures of data depth including majority
depth, Oja depth, and likelihood depth. When specialized to the case where the
gaps in the data follow a power law distribution with parameter , we
show that we can reduce the complexity of identifying the deepest point in the
data set (the simplicial median) from to
, where suppresses logarithmic
factors. We corroborate our theoretical results with numerical experiments on
synthetic data, showing the practical utility of our proposed methods.Comment: Keywords: multi-armed bandits, data depth, adaptivity, large-scale
computation, simplicial dept
A Nonparametric Multivariate Control Chart Based on Data Depth
For the design of most multivariate control charts, it is assumed that the observations follow a multivariate normal distribution. In practice, this assumption is rarely satisfied. In this work, a distribution-free EWMA control chart for multivariate processes is proposed. This chart is based on equential rank of data depth measures. --
- ā¦