667 research outputs found
Efficient Bayesian species tree inference under the multispecies coalescent
We develop a Bayesian method for inferring the species phylogeny under the multispecies coalescent (MSC)
model. To improve the mixing properties of the Markov chain Monte Carlo (MCMC) algorithm that traverses
the space of species trees, we implement two efficient MCMC proposals: the first is based on the
Subtree Pruning and Regrafting (SPR) algorithm and the second is based on a node-slider algorithm. Like
the Nearest-Neighbor Interchange (NNI) algorithm we implemented previously, both new algorithms propose
changes to the species tree while simultaneously altering the gene trees at multiple genetic loci to automatically
avoid conflicts with the newly proposed species tree. The method integrates over gene trees, naturally
taking account of the uncertainty of gene tree topology and branch lengths given the sequence data. A simulation
study was performed to examine the statistical properties of the new method. The method was found
to show excellent statistical performance, inferring the correct species tree with near certainty when 10 loci
were included in the dataset. The prior on species trees has some impact, particularly for small numbers
of loci. We analyzed several previously published datasets (both real and simulated) for rattlesnakes and
Philippine shrews, in comparison with alternative methods. The results suggest that the Bayesian coalescentbased
method is statistically more efficient than heuristic methods based on summary statistics, and that our
implementation is computationally more efficient than alternative full-likelihood methods under the MSC.
Parameter estimates for the rattlesnake data suggest drastically different evolutionary dynamics between the
nuclear and mitochondrial loci, even though they support largely consistent species trees. We discuss the different
challenges facing the marginal likelihood calculation and transmodel MCMC as alternative strategies
for estimating posterior probabilities for species trees
Bayesian Species Delimitation Can Be Robust to Guide-Tree Inference Errors
distribution, and reproduction in any medium, provided the original work is properly cited
Unguided Species Delimitation Using DNA Sequence Data from Multiple Loci
A method was developed for simultaneous Bayesian inference of species delimitation and species phylogeny using the multispecies coalescent model. The method eliminates the need for a user-specified guide tree in species delimitation and incorporates phylogenetic uncertainty in a Bayesian framework. The nearest-neighbor interchange algorithm was adapted to propose changes to the species tree, with the gene trees for multiple loci altered in the proposal to avoid conflicts with the newly proposed species tree. We also modify our previous scheme for specifying priors for species delimitation models to construct joint priors for models of species delimitation and species phylogeny. As in our earlier method, the modified algorithm integrates over gene trees, taking account of the uncertainty of gene tree topology and branch lengths given the sequence data. We conducted a simulation study to examine the statistical properties of the method using six populations (two sequences each) and a true number of three species, with values of divergence times and ancestral population sizes that are realistic for recently diverged species. The results suggest that the method tends to be conservative with high posterior probabilities being a confident indicator of species status. Simulation results also indicate that the power of the method to delimit species increases with an increase of the divergence times in the species tree, and with an increased number of gene loci. Reanalyses of two data sets of cavefish and coast horned lizards suggest considerable phylogenetic uncertainty even though the data are informative about species delimitation. We discuss the impact of the prior on models of species delimitation and species phylogeny and of the prior on population size parameters (θ) on Bayesian species delimitation
Molecular phylogenetics: principles and practice
Phylogenies are important for addressing various biological questions such as relationships among species or genes, the origin and spread of viral infection and the demographic changes and migration patterns of species. The advancement of sequencing technologies has taken phylogenetic analysis to a new height. Phylogenies have permeated nearly every branch of biology, and the plethora of phylogenetic methods and software packages that are now available may seem daunting to an experimental biologist. Here, we review the major methods of phylogenetic analysis, including parsimony, distance, likelihood and Bayesian methods. We discuss their strengths and weaknesses and provide guidance for their use
Molecular clock dating
This chapter reviews the history of the molecular clock,
its impact on molecular evolution, and the controversies
surrounding mechanisms of evolutionary rate variation
and the application of the clock to date species divergences.
We review current molecular clock dating methods,
including maximum likelihood and Bayesian methods,
with an emphasis on relaxing the clock and on
incorporating uncertainties into fossil calibrations
The Influence of Gene Flow on Species Tree Estimation: A Simulation Study
Gene flow among populations or species and incomplete lineage sorting (ILS) are two evolutionary processes
responsible for generating gene tree discordance and therefore hindering species tree estimation. Numerous studies have evaluated the impacts of ILS on species tree inference, yet the ramifications of gene flow on species trees remain less studied. Here, we simulate and analyse multilocus sequence data generated with ILS and gene flow to quantify their impacts on species tree inference. We characterize species tree estimation errors under various models of gene flow, such as the isolation-migration model, the n-island model, and gene flow between non-sister species or involving ancestral species, and species boundaries crossed by a single gene copy (allelic introgression) or by a single migrant individual. These
patterns of gene flow are explored on species trees of different sizes (4 vs. 10 species), at different time scales (shallow vs. deep), and with different migration rates. Species trees are estimated with the multispecies coalescent model using Bayesian methods (BEST and *BEAST) and with a summary statistic approach (MPEST) that facilitates phylogenomic-scale analysis. Even in cases where the topology of the species tree is estimated with high accuracy, we find that gene flow can result in overestimates of population sizes (species tree dilation) and underestimates of species divergence times (species
tree compression). Signatures of migration events remain present in the distribution of coalescent times for gene trees, and with sufficient data it is possible to identify those loci that have crossed species boundaries. These results highlight the need for careful sampling design in phylogeographic and species delimitation studies as gene flow, introgression, or incorrect sample assignments can bias the estimation of the species tree topology and of parameter estimates such as
population sizes and divergence times
Neutral Evolution as Diffusion in phenotype space: reproduction with mutation but without selection
The process of `Evolutionary Diffusion', i.e. reproduction with local
mutation but without selection in a biological population, resembles standard
Diffusion in many ways. However, Evolutionary Diffusion allows the formation of
local peaks with a characteristic width that undergo drift, even in the
infinite population limit. We analytically calculate the mean peak width and
the effective random walk step size, and obtain the distribution of the peak
width which has a power law tail. We find that independent local mutations act
as a diffusion of interacting particles with increased stepsize.Comment: 4 pages, 2 figures. Paper now representative of published articl
Coalescent-based genome analyses resolve the early branches of the euarchontoglires
Despite numerous large-scale phylogenomic studies, certain parts of the mammalian tree are extraordinarily difficult to resolve. We used the coding regions from 19 completely sequenced genomes to study the relationships within the super-clade Euarchontoglires (Primates, Rodentia, Lagomorpha, Dermoptera and Scandentia) because the placement of Scandentia within this clade is controversial. The difficulty in resolving this issue is due to the short time spans between the early divergences of Euarchontoglires, which may cause incongruent gene trees. The conflict in the data can be depicted by network analyses and the contentious relationships are best reconstructed by coalescent-based analyses. This method is expected to be superior to analyses of concatenated data in reconstructing a species tree from numerous gene trees. The total concatenated dataset used to study the relationships in this group comprises 5,875 protein-coding genes (9,799,170 nucleotides) from all orders except Dermoptera (flying lemurs). Reconstruction of the species tree from 1,006 gene trees using coalescent models placed Scandentia as sister group to the primates, which is in agreement with maximum likelihood analyses of concatenated nucleotide sequence data. Additionally, both analytical approaches favoured the Tarsier to be sister taxon to Anthropoidea, thus belonging to the Haplorrhine clade. When divergence times are short such as in radiations over periods of a few million years, even genome scale analyses struggle to resolve phylogenetic relationships. On these short branches processes such as incomplete lineage sorting and possibly hybridization occur and make it preferable to base phylogenomic analyses on coalescent methods
Clades and clans: a comparison study of two evolutionary models
The Yule-Harding-Kingman (YHK) model and the proportional to distinguishable
arrangements (PDA) model are two binary tree generating models that are widely
used in evolutionary biology. Understanding the distributions of clade sizes
under these two models provides valuable insights into macro-evolutionary
processes, and is important in hypothesis testing and Bayesian analyses in
phylogenetics. Here we show that these distributions are log-convex, which
implies that very large clades or very small clades are more likely to occur
under these two models. Moreover, we prove that there exists a critical value
for each such that for a given clade with size ,
the probability that this clade is contained in a random tree with leaves
generated under the YHK model is higher than that under the PDA model if
, and lower if . Finally, we extend our results
to binary unrooted trees, and obtain similar results for the distributions of
clan sizes.Comment: 21page
The Impact of Cross-Species Gene Flow on Species Tree Estimation
Recent analyses of genomic sequence data suggest cross-species gene flow is common in both plants and animals, posing challenges to species tree estimation. We examine the levels of gene flow needed to mislead species tree estimation with three species and either episodic introgressive hybridization or continuous migration between an outgroup and one ingroup species. Several species tree estimation methods are examined, including the majority-vote method based on the most common gene tree topology (with either the true or reconstructed gene trees used), the UPGMA method based on the average sequence distances (or average coalescent times) between species, and the full-likelihood method based on multilocus sequence data. Our results suggest that the majority-vote method based on gene tree topologies is more robust to gene flow than the UPGMA method based on coalescent times and both are more robust than likelihood assuming a multispecies coalescent (MSC) model with no cross-species gene flow. Comparison of the continuous migration model with the episodic introgression model suggests that a small amount of gene flow per generation can cause drastic changes to the genetic history of the species and mislead species tree methods, especially if the species diverged through radiative speciation events. Estimates of parameters under the MSC with gene flow suggest that African mosquito species in the Anopheles gambiae species complex constitute such an example of extreme impact of gene flow on species phylogeny. [IM; introgression; migration; MSci; multispecies coalescent; species tree.
- …
