132 research outputs found

    Parsimonious Inference of Hybridization in the Presence of Incomplete Lineage Sorting

    Get PDF
    Hybridization plays an important evolutionary role in several groups of organisms. A phylogenetic approach to detect hybridization entails sequencing multiple loci across the genomes of a group of species of interest, reconstructing their gene trees, and taking their differences as indicators of hybridization. However, methods that follow this approach mostly ignore population effects, such as incomplete lineage sorting (ILS). Given that hybridization occurs between closely related organisms, ILS may very well be at play and, hence, must be accounted for in the analysis framework. To address this issue, we present a parsimony criterion for reconciling gene trees within the branches of a phylogenetic network, and a local search heuristic for inferring phylogenetic networks from collections of gene-tree topologies under this criterion. This framework enables phylogenetic analyses while accounting for both hybridization and ILS. Further, we propose two techniques for incorporating information about uncertainty in gene-tree estimates. Our simulation studies demonstrate the good performance of our framework in terms of identifying the location of hybridization events, as well as estimating the proportions of genes that underwent hybridization. Also, our framework shows good performance in terms of efficiency on handling large data sets in our experiments. Further, in analyzing a yeast data set, we demonstrate issues that arise when analyzing real data sets. While a probabilistic approach was recently introduced for this problem, and while parsimonious reconciliations have accuracy issues under certain settings, our parsimony framework provides a much more computationally efficient technique for this type of analysis. Our framework now allows for genome-wide scans for hybridization, while also accounting for ILS

    A polynomial time algorithm for calculating the probability of a ranked gene tree given a species tree

    Get PDF
    In this paper, we provide a polynomial time algorithm to calculate the probability of a {\it ranked} gene tree topology for a given species tree, where a ranked tree topology is a tree topology with the internal vertices being ordered. The probability of a gene tree topology can thus be calculated in polynomial time if the number of orderings of the internal vertices is a polynomial number. However, the complexity of calculating the probability of a gene tree topology with an exponential number of rankings for a given species tree remains unknown

    Model-based approach to test hard polytomies in the Eulaemus clade of the most diverse South American lizard genus Liolaemus (Liolaemini, Squamata)

    Get PDF
    Lack of resolution in a phylogenetic tree is usually represented as a polytomy, and often adding more data (loci and taxa) resolves the species tree. These are the ‘soft’ polytomies, but in other cases additional data fail to resolve relationships; these are the ‘hard’ polytomies. This latter case is often interpreted as a simultaneous radiation of lineages in the history of a clade. Although hard polytomies are difficult to address, model-based approaches provide new tools to test these hypotheses. Here, we used a clade of 144 species of the South American lizard clade Eulaemus to estimate phylogenies using a traditional concatenated matrix and three species tree methods: *BEAST, BEST, and minimizing deep coalescences (MDC). The different species tree methods recovered largely discordant results, but all resolved the same polytomy (e.g. very short internodes amongst lineages and low nodal support in Bayesian methods). We simulated data sets under eight explicit evolutionary models (including hard polytomies), tested these against empirical data (a total of 14 loci), and found support for two polytomies as the most plausible hypothesis for diversification of this clade. We discuss the performance of these methods and their limitations under the challenging scenario of hard polytomies.Fil: Olave, Melisa. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Nacional Patagónico; ArgentinaFil: Avila, Luciano Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Nacional Patagónico; ArgentinaFil: Sites, Jack W.. University Brigham Young; Estados UnidosFil: Morando, Mariana. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Nacional Patagónico; Argentin

    Understanding How Stochasticity Impacts Reconstructions of Recent Species Divergent History.

    Full text link
    Molecular phylogenetic studies are complicated by the fact that differentiation between orthologous gene copies is determined by two stochastic process–lineage sorting (coalescent) and mutational processes. The former could lead to discrepancies between the species divergent history and genealogies, while the later could result in differences between genealogies and estimated gene trees. Only recently has the idea of incorporating the coalescent process into species-tree estimation been applied in empirical phylogenetics. My thesis focuses on examining the impacts of these two stochasticities on reconstructing recent species divergent histories where incomplete lineage sorting is prevalent. Using simulated data, the effect of mutation variance is re-evaluated on accuracy of species-tree estimates with different methods, ranging from the simplest “democratic voting”, to the Maximum-likelihood method includes the branch length information, and the implications in terms of sampling design, methods for gene-tree and species-tree estimation, are discussed in Chapter II&III. While future phylogenetic studies will benefit from the new species-tree estimation methods, it is not clear is the extent to which species relationships estimated with data and methods that predate these developments are robust. I proposed a parametric bootstrap species tree (PBST) approach to assess the reliability of past phylogenetic studies in which the stochastic lineage sorting processes were overlooked, and applied the approach as a meta-analysis of east African cichlid phylogenies in Chapter IV. Another problem for empirical phylogenetic studies to applying species-tree estimation is to having a multi-locus sequencing dataset, Next-generation sequencing (NGS) combined with Reduce Representation Library technique has the premise but concerns exist about whether the high NGS error rates are amenable for directly use for phylogenetic analysis. The use of NGS as primary data for reconstructing the divergent history was explored of four montane grasshopper species in Chapter IV, and parametric simulation was used to three possible sources of uncertainty in the estimated species tree: the true species divergent history, sequencing errors and error correction method. Possible improvement on sampling design and the methodological developments needed for future studies are discussed. The last chapter explored the use of gene divergent history combined with geographic information to infer speciation models.Ph.D.Ecology and Evolutionary BiologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91486/1/huatengh_1.pd

    Species Delimitation Using a Combined Coalescent and Information-Theoretic Approach: An Example from North American Myotis Bats

    Get PDF
    Coalescent model–based methods for phylogeny estimation force systematists to confront issues related to the identification of species boundaries. Unlike conventional phylogenetic analysis, where species membership can be assessed qualitatively after the phylogeny is estimated, the phylogenies that are estimated under a coalescent model treat aggregates of individuals as the operational taxonomic units and thus require a priori definition of these sets because the models assume that the alleles in a given lineage are sampled from a single panmictic population. Fortunately, the use of coalescent model–based approaches allows systematists to conduct probabilistic tests of species limits by calculating the probability of competing models of lineage composition. Here, we conduct the first exploration of the issues related to applying such tests to a complex empirical system. Sequence data from multiple loci were used to assess species limits and phylogeny in a clade of North American Myotis bats. After estimating gene trees at each locus, the likelihood of models representing all hierarchical permutations of lineage composition was calculated and Akaike information criterion scores were computed. Metrics borrowed from information theory suggest that there is strong support for several models that include multiple evolutionary lineages within the currently described species Myotis lucifugus and M. evotis. Although these results are preliminary, they illustrate the practical importance of coupled species delimitation and phylogeny estimation

    From Gene Trees to Species Trees: Algorithms for Parsimonious Reconciliation

    Get PDF
    One of the criteria for inferring a species tree from a collection of gene trees, when gene tree incongruence is assumed to be due to incomplete lineage sorting (ILS), is minimize deep coalescence , or MDC. Exact algorithms for inferring the species tree from rooted, binary trees under MDC were recently introduced. Nevertheless, in phylogenetic analyses of biological data sets, estimated gene trees may differ from true gene trees, be incompletely resolved, and not necessarily rooted. Further, the MDC criterion considers only the topologies of the gene trees. So the contributions of my work are three-fold: 1. We propose new MDC formulations for the cases in which the gene trees are unrooted/binary, rooted/non-binary, and unrooted/non-binary, prove structural theorems that allow me to extend the algorithms for the rooted/binary gene tree case to these cases in a straightforward manner. 2. We propose an algorithm for inferring a species tree from a collection of gene trees with coalescence times that takes into account not only the topology of the gene trees but also the coalescence times. 3. We devise MDC-based algorithms for cases in which multiple alleles per species may be sampled. We have implemented all of the algorithms in the PhyloNet software package and studied their performance in coalescent-based simulation studies in comparison with other methods including democratic vote, greedy consensus, STEM, and GLASS
    corecore