134 research outputs found
Consistency Properties of Species Tree Inference by Minimizing Deep Coalescences
Methods for inferring species trees from sets of gene trees need to account for the possibility of discordance among the gene trees. Assuming that discordance is caused by incomplete lineage sorting, species tree estimates can be obtained by finding those species trees that minimize the number of -deep- coalescence events required for a given collection of gene trees. Efficient algorithms now exist for applying the minimizing-deep-coalescence (MDC) criterion, and simulation experiments have demonstrated its promising performance. However, it has also been noted from simulation results that the MDC criterion is not always guaranteed to infer the correct species tree estimate. In this article, we investigate the consistency of the MDC criterion. Using the multipscies coalescent model, we show that there are indeed anomaly zones for the MDC criterion for asymmetric four-taxon species tree topologies, and for all species tree topologies with five or more taxa.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/90434/1/cmb-2E2010-2E0102.pd
A polynomial time algorithm for calculating the probability of a ranked gene tree given a species tree
In this paper, we provide a polynomial time algorithm to calculate the
probability of a {\it ranked} gene tree topology for a given species tree,
where a ranked tree topology is a tree topology with the internal vertices
being ordered. The probability of a gene tree topology can thus be calculated
in polynomial time if the number of orderings of the internal vertices is a
polynomial number. However, the complexity of calculating the probability of a
gene tree topology with an exponential number of rankings for a given species
tree remains unknown
An analytical comparison of coalescent-based multilocus methods: The three-taxon case
Incomplete lineage sorting (ILS) is a common source of gene tree incongruence
in multilocus analyses. A large number of methods have been developed to infer
species trees in the presence of ILS. Here we provide a mathematical analysis
of several coalescent-based methods. Our analysis is performed on a three-taxon
species tree and assumes that the gene trees are correctly reconstructed along
with their branch lengths
Coalescent histories for lodgepole species trees
Coalescent histories are combinatorial structures that describe for a given
gene tree and species tree the possible lists of branches of the species tree
on which the gene tree coalescences take place. Properties of the number of
coalescent histories for gene trees and species trees affect a variety of
probabilistic calculations in mathematical phylogenetics. Exact and asymptotic
evaluations of the number of coalescent histories, however, are known only in a
limited number of cases. Here we introduce a particular family of species
trees, the \emph{lodgepole} species trees , in which
tree has taxa. We determine the number of coalescent
histories for the lodgepole species trees, in the case that the gene tree
matches the species tree, showing that this number grows with in the
number of taxa . This computation demonstrates the existence of tree
families in which the growth in the number of coalescent histories is faster
than exponential. Further, it provides a substantial improvement on the lower
bound for the ratio of the largest number of matching coalescent histories to
the smallest number of matching coalescent histories for trees with taxa,
increasing a previous bound of
to . We discuss the implications of our
enumerative results for phylogenetic computations
Parsimonious Inference of Hybridization in the Presence of Incomplete Lineage Sorting
Hybridization plays an important evolutionary role in several groups of organisms.
A phylogenetic approach to detect hybridization entails sequencing multiple loci
across the genomes of a group of species of interest, reconstructing their gene trees,
and taking their differences as indicators of hybridization. However, methods that
follow this approach mostly ignore population effects, such as incomplete lineage
sorting (ILS). Given that hybridization occurs between closely related organisms, ILS
may very well be at play and, hence, must be accounted for in the analysis
framework. To address this issue, we present a parsimony criterion for reconciling
gene trees within the branches of a phylogenetic network, and a local search heuristic
for inferring phylogenetic networks from collections of gene-tree topologies under this
criterion. This framework enables phylogenetic analyses while accounting for both
hybridization and ILS. Further, we propose two techniques for incorporating
information about uncertainty in gene-tree estimates. Our simulation studies
demonstrate the good performance of our framework in terms of identifying the
location of hybridization events, as well as estimating the proportions of genes that
underwent hybridization. Also, our framework shows good performance in terms of
efficiency on handling large data sets in our experiments. Further, in analyzing a
yeast data set, we demonstrate issues that arise when analyzing real data sets. While
a probabilistic approach was recently introduced for this problem, and while
parsimonious reconciliations have accuracy issues under certain settings, our
parsimony framework provides a much more computationally efficient technique for
this type of analysis. Our framework now allows for genome-wide scans for
hybridization, while also accounting for ILS
Model-based approach to test hard polytomies in the Eulaemus clade of the most diverse South American lizard genus Liolaemus (Liolaemini, Squamata)
Lack of resolution in a phylogenetic tree is usually represented as a polytomy, and often adding more data (loci and taxa) resolves the species tree. These are the ‘soft’ polytomies, but in other cases additional data fail to resolve relationships; these are the ‘hard’ polytomies. This latter case is often interpreted as a simultaneous radiation of lineages in the history of a clade. Although hard polytomies are difficult to address, model-based approaches provide new tools to test these hypotheses. Here, we used a clade of 144 species of the South American lizard clade Eulaemus to estimate phylogenies using a traditional concatenated matrix and three species tree methods: *BEAST, BEST, and minimizing deep coalescences (MDC). The different species tree methods recovered largely discordant results, but all resolved the same polytomy (e.g. very short internodes amongst lineages and low nodal support in Bayesian methods). We simulated data sets under eight explicit evolutionary models (including hard polytomies), tested these against empirical data (a total of 14 loci), and found support for two polytomies as the most plausible hypothesis for diversification of this clade. We discuss the performance of these methods and their limitations under the challenging scenario of hard polytomies.Fil: Olave, Melisa. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Nacional Patagónico; ArgentinaFil: Avila, Luciano Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Nacional Patagónico; ArgentinaFil: Sites, Jack W.. University Brigham Young; Estados UnidosFil: Morando, Mariana. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Nacional Patagónico; Argentin
AUGIST: inferring species trees while accommodating gene tree uncertainty
Summary: AUGIST (accomodating uncertainty in genealogies while inferring species tress) is a new software package for inferring species trees while accommodating uncertainty in gene genealogies. It is written for the Mesquite software system and provides sampling procedures to incorporate uncertainty in gene tree reconstruction while providing confidence estimates for inferred species trees
The Influence of Gene Flow on Species Tree Estimation: A Simulation Study
Gene flow among populations or species and incomplete lineage sorting (ILS) are two evolutionary processes
responsible for generating gene tree discordance and therefore hindering species tree estimation. Numerous studies have evaluated the impacts of ILS on species tree inference, yet the ramifications of gene flow on species trees remain less studied. Here, we simulate and analyse multilocus sequence data generated with ILS and gene flow to quantify their impacts on species tree inference. We characterize species tree estimation errors under various models of gene flow, such as the isolation-migration model, the n-island model, and gene flow between non-sister species or involving ancestral species, and species boundaries crossed by a single gene copy (allelic introgression) or by a single migrant individual. These
patterns of gene flow are explored on species trees of different sizes (4 vs. 10 species), at different time scales (shallow vs. deep), and with different migration rates. Species trees are estimated with the multispecies coalescent model using Bayesian methods (BEST and *BEAST) and with a summary statistic approach (MPEST) that facilitates phylogenomic-scale analysis. Even in cases where the topology of the species tree is estimated with high accuracy, we find that gene flow can result in overestimates of population sizes (species tree dilation) and underestimates of species divergence times (species
tree compression). Signatures of migration events remain present in the distribution of coalescent times for gene trees, and with sufficient data it is possible to identify those loci that have crossed species boundaries. These results highlight the need for careful sampling design in phylogeographic and species delimitation studies as gene flow, introgression, or incorrect sample assignments can bias the estimation of the species tree topology and of parameter estimates such as
population sizes and divergence times
- …