719 research outputs found

    The space of ultrametric phylogenetic trees

    Get PDF
    The reliability of a phylogenetic inference method from genomic sequence data is ensured by its statistical consistency. Bayesian inference methods produce a sample of phylogenetic trees from the posterior distribution given sequence data. Hence the question of statistical consistency of such methods is equivalent to the consistency of the summary of the sample. More generally, statistical consistency is ensured by the tree space used to analyse the sample. In this paper, we consider two standard parameterisations of phylogenetic time-trees used in evolutionary models: inter-coalescent interval lengths and absolute times of divergence events. For each of these parameterisations we introduce a natural metric space on ultrametric phylogenetic trees. We compare the introduced spaces with existing models of tree space and formulate several formal requirements that a metric space on phylogenetic trees must possess in order to be a satisfactory space for statistical analysis, and justify them. We show that only a few known constructions of the space of phylogenetic trees satisfy these requirements. However, our results suggest that these basic requirements are not enough to distinguish between the two metric spaces we introduce and that the choice between metric spaces requires additional properties to be considered. Particularly, that the summary tree minimising the square distance to the trees from the sample might be different for different parameterisations. This suggests that further fundamental insight is needed into the problem of statistical consistency of phylogenetic inference methods.Comment: Minor changes. This version has been published in JTB. 27 pages, 9 figure

    Ultrametric embedding: application to data fingerprinting and to fast data clustering

    Get PDF
    We begin with pervasive ultrametricity due to high dimensionality and/or spatial sparsity. How extent or degree of ultrametricity can be quantified leads us to the discussion of varied practical cases when ultrametricity can be partially or locally present in data. We show how the ultrametricity can be assessed in text or document collections, and in time series signals. An aspect of importance here is that to draw benefit from this perspective the data may need to be recoded. Such data recoding can also be powerful in proximity searching, as we will show, where the data is embedded globally and not locally in an ultrametric space.Comment: 14 pages, 1 figure. New content and modified title compared to the 19 May 2006 versio

    Tropical Geometry of Phylogenetic Tree Space: A Statistical Perspective

    Full text link
    Phylogenetic trees are the fundamental mathematical representation of evolutionary processes in biology. As data objects, they are characterized by the challenges associated with "big data," as well as the complication that their discrete geometric structure results in a non-Euclidean phylogenetic tree space, which poses computational and statistical limitations. We propose and study a novel framework to study sets of phylogenetic trees based on tropical geometry. In particular, we focus on characterizing our framework for statistical analyses of evolutionary biological processes represented by phylogenetic trees. Our setting exhibits analytic, geometric, and topological properties that are desirable for theoretical studies in probability and statistics, as well as increased computational efficiency over the current state-of-the-art. We demonstrate our approach on seasonal influenza data.Comment: 28 pages, 5 figures, 1 tabl

    Replica symmetry breaking related to a general ultrametric space III: the case of general measure

    Full text link
    Family of replica matrices, related to general ultrametric spaces with general measures, is introduced. These matrices generalize the known Parisi matrices. Some functionals of replica approach are computed. Replica symmetry breaking solution is found.Comment: 21 page
    • …
    corecore