Topology, Metrics and Data: Computational Methods and Applications.

Abstract

PhD Theses.The eld of topological data analysis (TDA) combines computational geometry and algebraic topology notions for analyzing data. This thesis presents methods and e cient algorithms that extend the TDA toolset. After introducing the needed background information about Euler characteristic curves and persistent homology, the former objects are extended to bi-dimensional ltrations. The result are Euler characteristic surfaces, which capture insights about data over a pair of parameters. Moreover, algorithms to compute these objects are described for both image and point data. Persistent homology in `1 metric is also studied. It is proven that in this setting Alpha and Cech ltration are not equivalent in general. On the other hand, two new ltrations | Alpha ag and Minibox | are de ned and proven equivalent to Cech ltrations in homological dimensions zero and one. Algorithms for nding Minibox edges are described, and Minibox ltrations are empirically shown to speed up the computation of Cech persistence diagrams with computational experiments. Then a new family of summary functions of persistence diagrams is de ned, which is related to persistence landscapes. These are called cumulative landscapes and are used to vectorize the information contained in persistence diagrams. In particular, discretizations of these functions and their Fourier coe cients are used to obtain feature vectors that can be applied in supervised classi cation problems. The e ectiveness of these feature vectors for the classi cation of data is compared against vectors obtained using persistence landscapes on two open-source datasets. Finally, a novel method is described for the analysis of high-dimensional genomics data. Optimized metrics are de ned on genomic vectors making use of a loss function. These are used in combination with a distance-based classi cation method, showing good performance compared to standard machine learning algorithms. Moreover, the structure of the given optimized metrics helps identify coordinates of the genomic vectors, which are most important for the classi cation task under study

    Similar works