4,204 research outputs found
Statistical topological data analysis using persistence landscapes
We define a new topological summary for data that we call the persistence
landscape. Since this summary lies in a vector space, it is easy to combine
with tools from statistics and machine learning, in contrast to the standard
topological summaries. Viewed as a random variable with values in a Banach
space, this summary obeys a strong law of large numbers and a central limit
theorem. We show how a number of standard statistical tests can be used for
statistical inference using this summary. We also prove that this summary is
stable and that it can be used to provide lower bounds for the bottleneck and
Wasserstein distances.Comment: 26 pages, final version, to appear in Journal of Machine Learning
Research, includes two additional examples not in the journal version: random
geometric complexes and Erdos-Renyi random clique complexe
Topological Data Analysis for Object Data
Statistical analysis on object data presents many challenges. Basic summaries
such as means and variances are difficult to compute. We apply ideas from
topology to study object data. We present a framework for using persistence
landscapes to vectorize object data and perform statistical analysis. We apply
to this pipeline to some biological images that were previously shown to be
challenging to study using shape theory. Surprisingly, the most persistent
features are shown to be "topological noise" and the statistical analysis
depends on the less persistent features which we refer to as the "geometric
signal". We also describe the first steps to a new approach to using topology
for object data analysis, which applies topology to distributions on object
spaces.Comment: 16 pages, 12 figure
Statistical Methods in Topological Data Analysis for Complex, High-Dimensional Data
The utilization of statistical methods an their applications within the new
field of study known as Topological Data Analysis has has tremendous potential
for broadening our exploration and understanding of complex, high-dimensional
data spaces. This paper provides an introductory overview of the mathematical
underpinnings of Topological Data Analysis, the workflow to convert samples of
data to topological summary statistics, and some of the statistical methods
developed for performing inference on these topological summary statistics. The
intention of this non-technical overview is to motivate statisticians who are
interested in learning more about the subject.Comment: 15 pages, 7 Figures, 27th Annual Conference on Applied Statistics in
Agricultur
The persistence landscape and some of its properties
Persistence landscapes map persistence diagrams into a function space, which
may often be taken to be a Banach space or even a Hilbert space. In the latter
case, it is a feature map and there is an associated kernel. The main advantage
of this summary is that it allows one to apply tools from statistics and
machine learning. Furthermore, the mapping from persistence diagrams to
persistence landscapes is stable and invertible. We introduce a weighted
version of the persistence landscape and define a one-parameter family of
Poisson-weighted persistence landscape kernels that may be useful for learning.
We also demonstrate some additional properties of the persistence landscape.
First, the persistence landscape may be viewed as a tropical rational function.
Second, in many cases it is possible to exactly reconstruct all of the
component persistence diagrams from an average persistence landscape. It
follows that the persistence landscape kernel is characteristic for certain
generic empirical measures. Finally, the persistence landscape distance may be
arbitrarily small compared to the interleaving distance.Comment: 18 pages, to appear in the Proceedings of the 2018 Abel Symposiu
Persistence Flamelets: multiscale Persistent Homology for kernel density exploration
In recent years there has been noticeable interest in the study of the "shape
of data". Among the many ways a "shape" could be defined, topology is the most
general one, as it describes an object in terms of its connectivity structure:
connected components (topological features of dimension 0), cycles (features of
dimension 1) and so on. There is a growing number of techniques, generally
denoted as Topological Data Analysis, aimed at estimating topological
invariants of a fixed object; when we allow this object to change, however,
little has been done to investigate the evolution in its topology. In this work
we define the Persistence Flamelets, a multiscale version of one of the most
popular tool in TDA, the Persistence Landscape. We examine its theoretical
properties and we show how it could be used to gain insights on KDEs bandwidth
parameter
- …