17 research outputs found
Diagonalizing the genome II: toward possible applications
In a previous paper, we showed that the orientable cover of the moduli space
of real genus zero algebraic curves with marked points is a compact aspherical
manifold tiled by associahedra, which resolves the singularities of the space
of phylogenetic trees. In this draft of a sequel, we construct a related
(stacky) resolution of a space of real quadratic forms, and suggest, perhaps
without much justification, that systems of oscillators parametrized by such
objects may may provide useful models in genomics.Comment: 11 pages, 3 figure
Mathematical Models and Biological Meaning: Taking Trees Seriously
We compare three basic kinds of discrete mathematical models used to portray
phylogenetic relationships among species and higher taxa: phylogenetic trees,
Hennig trees and Nelson cladograms. All three models are trees, as that term is
commonly used in mathematics; the difference between them lies in the
biological interpretation of their vertices and edges. Phylogenetic trees and
Hennig trees carry exactly the same information, and translation between these
two kinds of trees can be accomplished by a simple algorithm. On the other
hand, evolutionary concepts such as monophyly are represented as different
mathematical substructures are represented differently in the two models. For
each phylogenetic or Hennig tree, there is a Nelson cladogram carrying the same
information, but the requirement that all taxa be represented by leaves
necessarily makes the representation less efficient. Moreover, we claim that it
is necessary to give some interpretation to the edges and internal vertices of
a Nelson cladogram in order to make it useful as a biological model. One
possibility is to interpret internal vertices as sets of characters and the
edges as statements of inclusion; however, this interpretation carries little
more than incomplete phenetic information. We assert that from the standpoint
of phylogenetics, one is forced to regard each internal vertex of a Nelson
cladogram as an actual (albeit unsampled) species simply to justify the use of
synapomorphies rather than symplesiomorphies.Comment: 15 pages including 6 figures [5 pdf, 1 jpg]. Converted from original
MS Word manuscript to PDFLaTe
Skeletal Rigidity of Phylogenetic Trees
Motivated by geometric origami and the straight skeleton construction, we
outline a map between spaces of phylogenetic trees and spaces of planar
polygons. The limitations of this map is studied through explicit examples,
culminating in proving a structural rigidity result.Comment: 17 pages, 12 figure
Consistency and convergence rate of phylogenetic inference via regularization
It is common in phylogenetics to have some, perhaps partial, information
about the overall evolutionary tree of a group of organisms and wish to find an
evolutionary tree of a specific gene for those organisms. There may not be
enough information in the gene sequences alone to accurately reconstruct the
correct "gene tree." Although the gene tree may deviate from the "species tree"
due to a variety of genetic processes, in the absence of evidence to the
contrary it is parsimonious to assume that they agree. A common statistical
approach in these situations is to develop a likelihood penalty to incorporate
such additional information. Recent studies using simulation and empirical data
suggest that a likelihood penalty quantifying concordance with a species tree
can significantly improve the accuracy of gene tree reconstruction compared to
using sequence data alone. However, the consistency of such an approach has not
yet been established, nor have convergence rates been bounded. Because
phylogenetics is a non-standard inference problem, the standard theory does not
apply. In this paper, we propose a penalized maximum likelihood estimator for
gene tree reconstruction, where the penalty is the square of the
Billera-Holmes-Vogtmann geodesic distance from the gene tree to the species
tree. We prove that this method is consistent, and derive its convergence rate
for estimating the discrete gene tree structure and continuous edge lengths
(representing the amount of evolution that has occurred on that branch)
simultaneously. We find that the regularized estimator is "adaptive fast
converging," meaning that it can reconstruct all edges of length greater than
any given threshold from gene sequences of polynomial length. Our method does
not require the species tree to be known exactly; in fact, our asymptotic
theory holds for any such guide tree.Comment: 34 pages, 5 figures. To appear on The Annals of Statistic
The Most Parsimonious Reconciliation Problem in the Presence of Incomplete Lineage Sorting and Hybridization Is NP-Hard
The maximum parsimony phylogenetic reconciliation problem seeks to explain incongruity between a gene phylogeny and a species phylogeny with respect to a set of evolutionary events. While the reconciliation problem is well-studied for species and gene trees subject to events such as duplication, transfer, loss, and deep coalescence, recent work has examined species phylogenies that incorporate hybridization and are thus represented by networks rather than trees. In this paper, we show that the problem of computing a maximum parsimony reconciliation for a gene tree and species network is NP-hard even when only considering deep coalescence. This result suggests that future work on maximum parsimony reconciliation for species networks should explore approximation algorithms and heuristics