3,464 research outputs found
Topological Data Analysis with Bregman Divergences
Given a finite set in a metric space, the topological analysis generalizes
hierarchical clustering using a 1-parameter family of homology groups to
quantify connectivity in all dimensions. The connectivity is compactly
described by the persistence diagram. One limitation of the current framework
is the reliance on metric distances, whereas in many practical applications
objects are compared by non-metric dissimilarity measures. Examples are the
Kullback-Leibler divergence, which is commonly used for comparing text and
images, and the Itakura-Saito divergence, popular for speech and sound. These
are two members of the broad family of dissimilarities called Bregman
divergences.
We show that the framework of topological data analysis can be extended to
general Bregman divergences, widening the scope of possible applications. In
particular, we prove that appropriately generalized Cech and Delaunay (alpha)
complexes capture the correct homotopy type, namely that of the corresponding
union of Bregman balls. Consequently, their filtrations give the correct
persistence diagram, namely the one generated by the uniformly growing Bregman
balls. Moreover, we show that unlike the metric setting, the filtration of
Vietoris-Rips complexes may fail to approximate the persistence diagram. We
propose algorithms to compute the thus generalized Cech, Vietoris-Rips and
Delaunay complexes and experimentally test their efficiency. Lastly, we explain
their surprisingly good performance by making a connection with discrete Morse
theory
Statistical topological data analysis using persistence landscapes
We define a new topological summary for data that we call the persistence
landscape. Since this summary lies in a vector space, it is easy to combine
with tools from statistics and machine learning, in contrast to the standard
topological summaries. Viewed as a random variable with values in a Banach
space, this summary obeys a strong law of large numbers and a central limit
theorem. We show how a number of standard statistical tests can be used for
statistical inference using this summary. We also prove that this summary is
stable and that it can be used to provide lower bounds for the bottleneck and
Wasserstein distances.Comment: 26 pages, final version, to appear in Journal of Machine Learning
Research, includes two additional examples not in the journal version: random
geometric complexes and Erdos-Renyi random clique complexe
Using Topological Data Analysis for diagnosis pulmonary embolism
Pulmonary Embolism (PE) is a common and potentially lethal condition. Most
patients die within the first few hours from the event. Despite diagnostic
advances, delays and underdiagnosis in PE are common.To increase the diagnostic
performance in PE, current diagnostic work-up of patients with suspected acute
pulmonary embolism usually starts with the assessment of clinical pretest
probability using plasma d-Dimer measurement and clinical prediction rules. The
most validated and widely used clinical decision rules are the Wells and Geneva
Revised scores. We aimed to develop a new clinical prediction rule (CPR) for PE
based on topological data analysis and artificial neural network. Filter or
wrapper methods for features reduction cannot be applied to our dataset: the
application of these algorithms can only be performed on datasets without
missing data. Instead, we applied Topological data analysis (TDA) to overcome
the hurdle of processing datasets with null values missing data. A topological
network was developed using the Iris software (Ayasdi, Inc., Palo Alto). The PE
patient topology identified two ares in the pathological group and hence two
distinct clusters of PE patient populations. Additionally, the topological
netowrk detected several sub-groups among healthy patients that likely are
affected with non-PE diseases. TDA was further utilized to identify key
features which are best associated as diagnostic factors for PE and used this
information to define the input space for a back-propagation artificial neural
network (BP-ANN). It is shown that the area under curve (AUC) of BP-ANN is
greater than the AUCs of the scores (Wells and revised Geneva) used among
physicians. The results demonstrate topological data analysis and the BP-ANN,
when used in combination, can produce better predictive models than Wells or
revised Geneva scores system for the analyzed cohortComment: 18 pages, 5 figures, 6 tables. arXiv admin note: text overlap with
arXiv:cs/0308031 by other authors without attributio
Persistence Bag-of-Words for Topological Data Analysis
Persistent homology (PH) is a rigorous mathematical theory that provides a
robust descriptor of data in the form of persistence diagrams (PDs). PDs
exhibit, however, complex structure and are difficult to integrate in today's
machine learning workflows. This paper introduces persistence bag-of-words: a
novel and stable vectorized representation of PDs that enables the seamless
integration with machine learning. Comprehensive experiments show that the new
representation achieves state-of-the-art performance and beyond in much less
time than alternative approaches.Comment: Accepted for the Twenty-Eight International Joint Conference on
Artificial Intelligence (IJCAI-19). arXiv admin note: substantial text
overlap with arXiv:1802.0485
- …