134 research outputs found

    Consistency Properties of Species Tree Inference by Minimizing Deep Coalescences

    Full text link
    Methods for inferring species trees from sets of gene trees need to account for the possibility of discordance among the gene trees. Assuming that discordance is caused by incomplete lineage sorting, species tree estimates can be obtained by finding those species trees that minimize the number of -deep- coalescence events required for a given collection of gene trees. Efficient algorithms now exist for applying the minimizing-deep-coalescence (MDC) criterion, and simulation experiments have demonstrated its promising performance. However, it has also been noted from simulation results that the MDC criterion is not always guaranteed to infer the correct species tree estimate. In this article, we investigate the consistency of the MDC criterion. Using the multipscies coalescent model, we show that there are indeed anomaly zones for the MDC criterion for asymmetric four-taxon species tree topologies, and for all species tree topologies with five or more taxa.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/90434/1/cmb-2E2010-2E0102.pd

    A polynomial time algorithm for calculating the probability of a ranked gene tree given a species tree

    Get PDF
    In this paper, we provide a polynomial time algorithm to calculate the probability of a {\it ranked} gene tree topology for a given species tree, where a ranked tree topology is a tree topology with the internal vertices being ordered. The probability of a gene tree topology can thus be calculated in polynomial time if the number of orderings of the internal vertices is a polynomial number. However, the complexity of calculating the probability of a gene tree topology with an exponential number of rankings for a given species tree remains unknown

    An analytical comparison of coalescent-based multilocus methods: The three-taxon case

    Full text link
    Incomplete lineage sorting (ILS) is a common source of gene tree incongruence in multilocus analyses. A large number of methods have been developed to infer species trees in the presence of ILS. Here we provide a mathematical analysis of several coalescent-based methods. Our analysis is performed on a three-taxon species tree and assumes that the gene trees are correctly reconstructed along with their branch lengths

    Coalescent histories for lodgepole species trees

    Full text link
    Coalescent histories are combinatorial structures that describe for a given gene tree and species tree the possible lists of branches of the species tree on which the gene tree coalescences take place. Properties of the number of coalescent histories for gene trees and species trees affect a variety of probabilistic calculations in mathematical phylogenetics. Exact and asymptotic evaluations of the number of coalescent histories, however, are known only in a limited number of cases. Here we introduce a particular family of species trees, the \emph{lodgepole} species trees (λn)n0(\lambda_n)_{n\geq 0}, in which tree λn\lambda_n has m=2n+1m=2n+1 taxa. We determine the number of coalescent histories for the lodgepole species trees, in the case that the gene tree matches the species tree, showing that this number grows with m!!m!! in the number of taxa mm. This computation demonstrates the existence of tree families in which the growth in the number of coalescent histories is faster than exponential. Further, it provides a substantial improvement on the lower bound for the ratio of the largest number of matching coalescent histories to the smallest number of matching coalescent histories for trees with mm taxa, increasing a previous bound of (π/32)[(5m12)/(4m6)]mm(\sqrt{\pi} / 32)[(5m-12)/(4m-6)] m \sqrt{m} to [m1/(4e)]m[ \sqrt{m-1}/(4 \sqrt{e}) ]^{m}. We discuss the implications of our enumerative results for phylogenetic computations

    Parsimonious Inference of Hybridization in the Presence of Incomplete Lineage Sorting

    Get PDF
    Hybridization plays an important evolutionary role in several groups of organisms. A phylogenetic approach to detect hybridization entails sequencing multiple loci across the genomes of a group of species of interest, reconstructing their gene trees, and taking their differences as indicators of hybridization. However, methods that follow this approach mostly ignore population effects, such as incomplete lineage sorting (ILS). Given that hybridization occurs between closely related organisms, ILS may very well be at play and, hence, must be accounted for in the analysis framework. To address this issue, we present a parsimony criterion for reconciling gene trees within the branches of a phylogenetic network, and a local search heuristic for inferring phylogenetic networks from collections of gene-tree topologies under this criterion. This framework enables phylogenetic analyses while accounting for both hybridization and ILS. Further, we propose two techniques for incorporating information about uncertainty in gene-tree estimates. Our simulation studies demonstrate the good performance of our framework in terms of identifying the location of hybridization events, as well as estimating the proportions of genes that underwent hybridization. Also, our framework shows good performance in terms of efficiency on handling large data sets in our experiments. Further, in analyzing a yeast data set, we demonstrate issues that arise when analyzing real data sets. While a probabilistic approach was recently introduced for this problem, and while parsimonious reconciliations have accuracy issues under certain settings, our parsimony framework provides a much more computationally efficient technique for this type of analysis. Our framework now allows for genome-wide scans for hybridization, while also accounting for ILS

    Model-based approach to test hard polytomies in the Eulaemus clade of the most diverse South American lizard genus Liolaemus (Liolaemini, Squamata)

    Get PDF
    Lack of resolution in a phylogenetic tree is usually represented as a polytomy, and often adding more data (loci and taxa) resolves the species tree. These are the ‘soft’ polytomies, but in other cases additional data fail to resolve relationships; these are the ‘hard’ polytomies. This latter case is often interpreted as a simultaneous radiation of lineages in the history of a clade. Although hard polytomies are difficult to address, model-based approaches provide new tools to test these hypotheses. Here, we used a clade of 144 species of the South American lizard clade Eulaemus to estimate phylogenies using a traditional concatenated matrix and three species tree methods: *BEAST, BEST, and minimizing deep coalescences (MDC). The different species tree methods recovered largely discordant results, but all resolved the same polytomy (e.g. very short internodes amongst lineages and low nodal support in Bayesian methods). We simulated data sets under eight explicit evolutionary models (including hard polytomies), tested these against empirical data (a total of 14 loci), and found support for two polytomies as the most plausible hypothesis for diversification of this clade. We discuss the performance of these methods and their limitations under the challenging scenario of hard polytomies.Fil: Olave, Melisa. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Nacional Patagónico; ArgentinaFil: Avila, Luciano Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Nacional Patagónico; ArgentinaFil: Sites, Jack W.. University Brigham Young; Estados UnidosFil: Morando, Mariana. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Nacional Patagónico; Argentin

    AUGIST: inferring species trees while accommodating gene tree uncertainty

    Get PDF
    Summary: AUGIST (accomodating uncertainty in genealogies while inferring species tress) is a new software package for inferring species trees while accommodating uncertainty in gene genealogies. It is written for the Mesquite software system and provides sampling procedures to incorporate uncertainty in gene tree reconstruction while providing confidence estimates for inferred species trees

    The Influence of Gene Flow on Species Tree Estimation: A Simulation Study

    Get PDF
    Gene flow among populations or species and incomplete lineage sorting (ILS) are two evolutionary processes responsible for generating gene tree discordance and therefore hindering species tree estimation. Numerous studies have evaluated the impacts of ILS on species tree inference, yet the ramifications of gene flow on species trees remain less studied. Here, we simulate and analyse multilocus sequence data generated with ILS and gene flow to quantify their impacts on species tree inference. We characterize species tree estimation errors under various models of gene flow, such as the isolation-migration model, the n-island model, and gene flow between non-sister species or involving ancestral species, and species boundaries crossed by a single gene copy (allelic introgression) or by a single migrant individual. These patterns of gene flow are explored on species trees of different sizes (4 vs. 10 species), at different time scales (shallow vs. deep), and with different migration rates. Species trees are estimated with the multispecies coalescent model using Bayesian methods (BEST and *BEAST) and with a summary statistic approach (MPEST) that facilitates phylogenomic-scale analysis. Even in cases where the topology of the species tree is estimated with high accuracy, we find that gene flow can result in overestimates of population sizes (species tree dilation) and underestimates of species divergence times (species tree compression). Signatures of migration events remain present in the distribution of coalescent times for gene trees, and with sufficient data it is possible to identify those loci that have crossed species boundaries. These results highlight the need for careful sampling design in phylogeographic and species delimitation studies as gene flow, introgression, or incorrect sample assignments can bias the estimation of the species tree topology and of parameter estimates such as population sizes and divergence times
    corecore