1,435 research outputs found

    Inference of Ancestral Recombination Graphs through Topological Data Analysis

    Get PDF
    The recent explosion of genomic data has underscored the need for interpretable and comprehensive analyses that can capture complex phylogenetic relationships within and across species. Recombination, reassortment and horizontal gene transfer constitute examples of pervasive biological phenomena that cannot be captured by tree-like representations. Starting from hundreds of genomes, we are interested in the reconstruction of potential evolutionary histories leading to the observed data. Ancestral recombination graphs represent potential histories that explicitly accommodate recombination and mutation events across orthologous genomes. However, they are computationally costly to reconstruct, usually being infeasible for more than few tens of genomes. Recently, Topological Data Analysis (TDA) methods have been proposed as robust and scalable methods that can capture the genetic scale and frequency of recombination. We build upon previous TDA developments for detecting and quantifying recombination, and present a novel framework that can be applied to hundreds of genomes and can be interpreted in terms of minimal histories of mutation and recombination events, quantifying the scales and identifying the genomic locations of recombinations. We implement this framework in a software package, called TARGet, and apply it to several examples, including small migration between different populations, human recombination, and horizontal evolution in finches inhabiting the Gal\'apagos Islands.Comment: 33 pages, 12 figures. The accompanying software, instructions and example files used in the manuscript can be obtained from https://github.com/RabadanLab/TARGe

    Distances and Isomorphism between Networks and the Stability of Network Invariants

    Full text link
    We develop the theoretical foundations of a network distance that has recently been applied to various subfields of topological data analysis, namely persistent homology and hierarchical clustering. While this network distance has previously appeared in the context of finite networks, we extend the setting to that of compact networks. The main challenge in this new setting is the lack of an easy notion of sampling from compact networks; we solve this problem in the process of obtaining our results. The generality of our setting means that we automatically establish results for exotic objects such as directed metric spaces and Finsler manifolds. We identify readily computable network invariants and establish their quantitative stability under this network distance. We also discuss the computational complexity involved in precisely computing this distance, and develop easily-computable lower bounds by using the identified invariants. By constructing a wide range of explicit examples, we show that these lower bounds are effective in distinguishing between networks. Finally, we provide a simple algorithm that computes a lower bound on the distance between two networks in polynomial time and illustrate our metric and invariant constructions on a database of random networks and a database of simulated hippocampal networks

    Optimal rates of convergence for persistence diagrams in Topological Data Analysis

    Full text link
    Computational topology has recently known an important development toward data analysis, giving birth to the field of topological data analysis. Topological persistence, or persistent homology, appears as a fundamental tool in this field. In this paper, we study topological persistence in general metric spaces, with a statistical approach. We show that the use of persistent homology can be naturally considered in general statistical frameworks and persistence diagrams can be used as statistics with interesting convergence properties. Some numerical experiments are performed in various contexts to illustrate our results

    Persistent topology for natural data analysis - A survey

    Full text link
    Natural data offer a hard challenge to data analysis. One set of tools is being developed by several teams to face this difficult task: Persistent topology. After a brief introduction to this theory, some applications to the analysis and classification of cells, lesions, music pieces, gait, oil and gas reservoirs, cyclones, galaxies, bones, brain connections, languages, handwritten and gestured letters are shown

    Forman's Ricci curvature - From networks to hypernetworks

    Full text link
    Networks and their higher order generalizations, such as hypernetworks or multiplex networks are ever more popular models in the applied sciences. However, methods developed for the study of their structural properties go little beyond the common name and the heavy reliance of combinatorial tools. We show that, in fact, a geometric unifying approach is possible, by viewing them as polyhedral complexes endowed with a simple, yet, the powerful notion of curvature - the Forman Ricci curvature. We systematically explore some aspects related to the modeling of weighted and directed hypernetworks and present expressive and natural choices involved in their definitions. A benefit of this approach is a simple method of structure-preserving embedding of hypernetworks in Euclidean N-space. Furthermore, we introduce a simple and efficient manner of computing the well established Ollivier-Ricci curvature of a hypernetwork.Comment: to appear: Complex Networks '18 (oral presentation
    • …
    corecore