3,464 research outputs found

    Topological Data Analysis with Bregman Divergences

    Get PDF
    Given a finite set in a metric space, the topological analysis generalizes hierarchical clustering using a 1-parameter family of homology groups to quantify connectivity in all dimensions. The connectivity is compactly described by the persistence diagram. One limitation of the current framework is the reliance on metric distances, whereas in many practical applications objects are compared by non-metric dissimilarity measures. Examples are the Kullback-Leibler divergence, which is commonly used for comparing text and images, and the Itakura-Saito divergence, popular for speech and sound. These are two members of the broad family of dissimilarities called Bregman divergences. We show that the framework of topological data analysis can be extended to general Bregman divergences, widening the scope of possible applications. In particular, we prove that appropriately generalized Cech and Delaunay (alpha) complexes capture the correct homotopy type, namely that of the corresponding union of Bregman balls. Consequently, their filtrations give the correct persistence diagram, namely the one generated by the uniformly growing Bregman balls. Moreover, we show that unlike the metric setting, the filtration of Vietoris-Rips complexes may fail to approximate the persistence diagram. We propose algorithms to compute the thus generalized Cech, Vietoris-Rips and Delaunay complexes and experimentally test their efficiency. Lastly, we explain their surprisingly good performance by making a connection with discrete Morse theory

    Statistical topological data analysis using persistence landscapes

    Full text link
    We define a new topological summary for data that we call the persistence landscape. Since this summary lies in a vector space, it is easy to combine with tools from statistics and machine learning, in contrast to the standard topological summaries. Viewed as a random variable with values in a Banach space, this summary obeys a strong law of large numbers and a central limit theorem. We show how a number of standard statistical tests can be used for statistical inference using this summary. We also prove that this summary is stable and that it can be used to provide lower bounds for the bottleneck and Wasserstein distances.Comment: 26 pages, final version, to appear in Journal of Machine Learning Research, includes two additional examples not in the journal version: random geometric complexes and Erdos-Renyi random clique complexe

    Using Topological Data Analysis for diagnosis pulmonary embolism

    Full text link
    Pulmonary Embolism (PE) is a common and potentially lethal condition. Most patients die within the first few hours from the event. Despite diagnostic advances, delays and underdiagnosis in PE are common.To increase the diagnostic performance in PE, current diagnostic work-up of patients with suspected acute pulmonary embolism usually starts with the assessment of clinical pretest probability using plasma d-Dimer measurement and clinical prediction rules. The most validated and widely used clinical decision rules are the Wells and Geneva Revised scores. We aimed to develop a new clinical prediction rule (CPR) for PE based on topological data analysis and artificial neural network. Filter or wrapper methods for features reduction cannot be applied to our dataset: the application of these algorithms can only be performed on datasets without missing data. Instead, we applied Topological data analysis (TDA) to overcome the hurdle of processing datasets with null values missing data. A topological network was developed using the Iris software (Ayasdi, Inc., Palo Alto). The PE patient topology identified two ares in the pathological group and hence two distinct clusters of PE patient populations. Additionally, the topological netowrk detected several sub-groups among healthy patients that likely are affected with non-PE diseases. TDA was further utilized to identify key features which are best associated as diagnostic factors for PE and used this information to define the input space for a back-propagation artificial neural network (BP-ANN). It is shown that the area under curve (AUC) of BP-ANN is greater than the AUCs of the scores (Wells and revised Geneva) used among physicians. The results demonstrate topological data analysis and the BP-ANN, when used in combination, can produce better predictive models than Wells or revised Geneva scores system for the analyzed cohortComment: 18 pages, 5 figures, 6 tables. arXiv admin note: text overlap with arXiv:cs/0308031 by other authors without attributio

    Persistence Bag-of-Words for Topological Data Analysis

    Full text link
    Persistent homology (PH) is a rigorous mathematical theory that provides a robust descriptor of data in the form of persistence diagrams (PDs). PDs exhibit, however, complex structure and are difficult to integrate in today's machine learning workflows. This paper introduces persistence bag-of-words: a novel and stable vectorized representation of PDs that enables the seamless integration with machine learning. Comprehensive experiments show that the new representation achieves state-of-the-art performance and beyond in much less time than alternative approaches.Comment: Accepted for the Twenty-Eight International Joint Conference on Artificial Intelligence (IJCAI-19). arXiv admin note: substantial text overlap with arXiv:1802.0485
    • …
    corecore