30,872 research outputs found
An analytical comparison of coalescent-based multilocus methods: The three-taxon case
Incomplete lineage sorting (ILS) is a common source of gene tree incongruence
in multilocus analyses. A large number of methods have been developed to infer
species trees in the presence of ILS. Here we provide a mathematical analysis
of several coalescent-based methods. Our analysis is performed on a three-taxon
species tree and assumes that the gene trees are correctly reconstructed along
with their branch lengths
Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting
Phylogenetic networks are necessary to represent the tree of life expanded by
edges to represent events such as horizontal gene transfers, hybridizations or
gene flow. Not all species follow the paradigm of vertical inheritance of their
genetic material. While a great deal of research has flourished into the
inference of phylogenetic trees, statistical methods to infer phylogenetic
networks are still limited and under development. The main disadvantage of
existing methods is a lack of scalability. Here, we present a statistical
method to infer phylogenetic networks from multi-locus genetic data in a
pseudolikelihood framework. Our model accounts for incomplete lineage sorting
through the coalescent model, and for horizontal inheritance of genes through
reticulation nodes in the network. Computation of the pseudolikelihood is fast
and simple, and it avoids the burdensome calculation of the full likelihood
which can be intractable with many species. Moreover, estimation at the
quartet-level has the added computational benefit that it is easily
parallelizable. Simulation studies comparing our method to a full likelihood
approach show that our pseudolikelihood approach is much faster without
compromising accuracy. We applied our method to reconstruct the evolutionary
relationships among swordtails and platyfishes (: Poeciliidae),
which is characterized by widespread hybridizations
PoMo : an allele frequency-based approach for species tree estimation
This work was supported by a grant from the Austrian Science Fund (FWF, P24551-B25 to C.K.). N.D.M. and D.S. were members of the Vienna Graduate School of Population Genetics which is supported by a grant of the Austrian Science Fund (FWF, W1225-B20). N.D.M. was partially supported by the Institute for Emerging Infections, funded by the Oxford Martin School.Incomplete lineage sorting can cause incongruencies of the overall species-level phylogenetic tree with the phylogenetic trees for individual genes or genomic segments. If these incongruencies are not accounted for, it is possible to incur several biases in species tree estimation. Here, we present a simple maximum likelihood approach that accounts for ancestral variation and incomplete lineage sorting. We use a POlymorphisms-aware phylogenetic MOdel (PoMo) that we have recently shown to efficiently estimate mutation rates and fixation biases from within and between-species variation data. We extend this model to perform efficient estimation of species trees. We test the performance of PoMo in several different scenarios of incomplete lineage sorting using simulations and compare it with existing methods both in accuracy and computational speed. In contrast to other approaches, our model does not use coalescent theory but is allele frequency based. We show that PoMo is well suited for genome-wide species tree estimation and that on such data it is more accurate than previous approaches.Publisher PDFPeer reviewe
Parsimonious Inference of Hybridization in the Presence of Incomplete Lineage Sorting
Hybridization plays an important evolutionary role in several groups of organisms.
A phylogenetic approach to detect hybridization entails sequencing multiple loci
across the genomes of a group of species of interest, reconstructing their gene trees,
and taking their differences as indicators of hybridization. However, methods that
follow this approach mostly ignore population effects, such as incomplete lineage
sorting (ILS). Given that hybridization occurs between closely related organisms, ILS
may very well be at play and, hence, must be accounted for in the analysis
framework. To address this issue, we present a parsimony criterion for reconciling
gene trees within the branches of a phylogenetic network, and a local search heuristic
for inferring phylogenetic networks from collections of gene-tree topologies under this
criterion. This framework enables phylogenetic analyses while accounting for both
hybridization and ILS. Further, we propose two techniques for incorporating
information about uncertainty in gene-tree estimates. Our simulation studies
demonstrate the good performance of our framework in terms of identifying the
location of hybridization events, as well as estimating the proportions of genes that
underwent hybridization. Also, our framework shows good performance in terms of
efficiency on handling large data sets in our experiments. Further, in analyzing a
yeast data set, we demonstrate issues that arise when analyzing real data sets. While
a probabilistic approach was recently introduced for this problem, and while
parsimonious reconciliations have accuracy issues under certain settings, our
parsimony framework provides a much more computationally efficient technique for
this type of analysis. Our framework now allows for genome-wide scans for
hybridization, while also accounting for ILS
Linking great apes genome evolution across time scales using polymorphism-aware phylogenetic models
The genomes of related species contain valuable information on the history of the considered taxa. Great apes in particular exhibit variation of evolutionary patterns along their genomes. However, the great ape data also bring new challenges, such as the presence of incomplete lineage sorting and ancestral shared polymorphisms. Previous methods for genome-scale analysis are restricted to very few individuals or cannot disentangle the contribution of mutation rates and fixation biases. This represents a limitation both for the understanding of these forces as well as for the detection of regions affected by selection. Here, we present a new model designed to estimate mutation rates and fixation biases from genetic variation within and between species. We relax the assumption of instantaneous substitutions, modeling substitutions as mutational events followed by a gradual fixation. Hence, we straightforwardly account for shared ancestral polymorphisms and incomplete lineage sorting. We analyze genome-wide synonymous site alignments of human, chimpanzee, and two orangutan species. From each taxon, we include data from several individuals. We estimate mutation rates and GC-biased gene conversion intensity. We find that both mutation rates and biased gene conversion vary with GC content. We also find lineage-specific differences, with weaker fixation biases in orangutan species, suggesting a reduced historical effective population size. Finally, our results are consistent with directional selection acting on coding sequences in relation to exonic splicing enhancers.Publisher PDFPeer reviewe
An HMM-based Comparative Genomic Framework for Detecting Introgression in Eukaryotes
One outcome of interspecific hybridization and subsequent effects of
evolutionary forces is introgression, which is the integration of genetic
material from one species into the genome of an individual in another species.
The evolution of several groups of eukaryotic species has involved
hybridization, and cases of adaptation through introgression have been already
established. In this work, we report on a new comparative genomic framework for
detecting introgression in genomes, called PhyloNet-HMM, which combines
phylogenetic networks, that capture reticulate evolutionary relationships among
genomes, with hidden Markov models (HMMs), that capture dependencies within
genomes. A novel aspect of our work is that it also accounts for incomplete
lineage sorting and dependence across loci.
Application of our model to variation data from chromosome 7 in the mouse
(Mus musculus domesticus) genome detects a recently reported adaptive
introgression event involving the rodent poison resistance gene Vkorc1, in
addition to other newly detected introgression regions. Based on our analysis,
it is estimated that about 12% of all sites withinchromosome 7 are of
introgressive origin (these cover about 18 Mbp of chromosome 7, and over 300
genes). Further, our model detects no introgression in two negative control
data sets. Our work provides a powerful framework for systematic analysis of
introgression while simultaneously accounting for dependence across sites,
point mutations, recombination, and ancestral polymorphism
A polynomial time algorithm for calculating the probability of a ranked gene tree given a species tree
In this paper, we provide a polynomial time algorithm to calculate the
probability of a {\it ranked} gene tree topology for a given species tree,
where a ranked tree topology is a tree topology with the internal vertices
being ordered. The probability of a gene tree topology can thus be calculated
in polynomial time if the number of orderings of the internal vertices is a
polynomial number. However, the complexity of calculating the probability of a
gene tree topology with an exponential number of rankings for a given species
tree remains unknown
- …
