17 research outputs found

    Diagonalizing the genome II: toward possible applications

    Full text link
    In a previous paper, we showed that the orientable cover of the moduli space of real genus zero algebraic curves with marked points is a compact aspherical manifold tiled by associahedra, which resolves the singularities of the space of phylogenetic trees. In this draft of a sequel, we construct a related (stacky) resolution of a space of real quadratic forms, and suggest, perhaps without much justification, that systems of oscillators parametrized by such objects may may provide useful models in genomics.Comment: 11 pages, 3 figure

    Mathematical Models and Biological Meaning: Taking Trees Seriously

    Get PDF
    We compare three basic kinds of discrete mathematical models used to portray phylogenetic relationships among species and higher taxa: phylogenetic trees, Hennig trees and Nelson cladograms. All three models are trees, as that term is commonly used in mathematics; the difference between them lies in the biological interpretation of their vertices and edges. Phylogenetic trees and Hennig trees carry exactly the same information, and translation between these two kinds of trees can be accomplished by a simple algorithm. On the other hand, evolutionary concepts such as monophyly are represented as different mathematical substructures are represented differently in the two models. For each phylogenetic or Hennig tree, there is a Nelson cladogram carrying the same information, but the requirement that all taxa be represented by leaves necessarily makes the representation less efficient. Moreover, we claim that it is necessary to give some interpretation to the edges and internal vertices of a Nelson cladogram in order to make it useful as a biological model. One possibility is to interpret internal vertices as sets of characters and the edges as statements of inclusion; however, this interpretation carries little more than incomplete phenetic information. We assert that from the standpoint of phylogenetics, one is forced to regard each internal vertex of a Nelson cladogram as an actual (albeit unsampled) species simply to justify the use of synapomorphies rather than symplesiomorphies.Comment: 15 pages including 6 figures [5 pdf, 1 jpg]. Converted from original MS Word manuscript to PDFLaTe

    Skeletal Rigidity of Phylogenetic Trees

    Full text link
    Motivated by geometric origami and the straight skeleton construction, we outline a map between spaces of phylogenetic trees and spaces of planar polygons. The limitations of this map is studied through explicit examples, culminating in proving a structural rigidity result.Comment: 17 pages, 12 figure

    Consistency and convergence rate of phylogenetic inference via regularization

    Full text link
    It is common in phylogenetics to have some, perhaps partial, information about the overall evolutionary tree of a group of organisms and wish to find an evolutionary tree of a specific gene for those organisms. There may not be enough information in the gene sequences alone to accurately reconstruct the correct "gene tree." Although the gene tree may deviate from the "species tree" due to a variety of genetic processes, in the absence of evidence to the contrary it is parsimonious to assume that they agree. A common statistical approach in these situations is to develop a likelihood penalty to incorporate such additional information. Recent studies using simulation and empirical data suggest that a likelihood penalty quantifying concordance with a species tree can significantly improve the accuracy of gene tree reconstruction compared to using sequence data alone. However, the consistency of such an approach has not yet been established, nor have convergence rates been bounded. Because phylogenetics is a non-standard inference problem, the standard theory does not apply. In this paper, we propose a penalized maximum likelihood estimator for gene tree reconstruction, where the penalty is the square of the Billera-Holmes-Vogtmann geodesic distance from the gene tree to the species tree. We prove that this method is consistent, and derive its convergence rate for estimating the discrete gene tree structure and continuous edge lengths (representing the amount of evolution that has occurred on that branch) simultaneously. We find that the regularized estimator is "adaptive fast converging," meaning that it can reconstruct all edges of length greater than any given threshold from gene sequences of polynomial length. Our method does not require the species tree to be known exactly; in fact, our asymptotic theory holds for any such guide tree.Comment: 34 pages, 5 figures. To appear on The Annals of Statistic

    The Most Parsimonious Reconciliation Problem in the Presence of Incomplete Lineage Sorting and Hybridization Is NP-Hard

    Get PDF
    The maximum parsimony phylogenetic reconciliation problem seeks to explain incongruity between a gene phylogeny and a species phylogeny with respect to a set of evolutionary events. While the reconciliation problem is well-studied for species and gene trees subject to events such as duplication, transfer, loss, and deep coalescence, recent work has examined species phylogenies that incorporate hybridization and are thus represented by networks rather than trees. In this paper, we show that the problem of computing a maximum parsimony reconciliation for a gene tree and species network is NP-hard even when only considering deep coalescence. This result suggests that future work on maximum parsimony reconciliation for species networks should explore approximation algorithms and heuristics
    corecore