21,869 research outputs found
AsymmeTree: A Flexible Python Package for the Simulation of Complex Gene Family Histories
AsymmeTree is a flexible and easy-to-use Python package for the simulation of gene family
histories. It simulates species trees and considers the joint action of gene duplication, loss, conversion,
and horizontal transfer to evolve gene families along the species tree. To generate realistic scenarios,
evolution rate heterogeneity from various sources is modeled. Finally, nucleotide or amino acid
sequences (optionally with indels, among-site rate heterogeneity, and invariant sites) can be simulated
along the gene phylogenies. For all steps, users can choose from a spectrum of alternative methods
and parameters. These choices include most options that are commonly used in comparable tools but
also some that are usually not found, such as the innovation model for species evolution. While output
files for each individual step can be generated, AsymmeTree is primarily intended to be integrated in
complex Python pipelines designed to assess the performance of data analysis methods. It allows the
user to interact with, analyze, and possibly manipulate the simulated scenarios. AsymmeTree is freely
available on GitHub
A two-phase approach for detecting recombination in nucleotide sequences
Genetic recombination can produce heterogeneous phylogenetic histories within
a set of homologous genes. Delineating recombination events is important in the
study of molecular evolution, as inference of such events provides a clearer
picture of the phylogenetic relationships among different gene sequences or
genomes. Nevertheless, detecting recombination events can be a daunting task,
as the performance of different recombinationdetecting approaches can vary,
depending on evolutionary events that take place after recombination. We
recently evaluated the effects of postrecombination events on the prediction
accuracy of recombination-detecting approaches using simulated nucleotide
sequence data. The main conclusion, supported by other studies, is that one
should not depend on a single method when searching for recombination events.
In this paper, we introduce a two-phase strategy, applying three statistical
measures to detect the occurrence of recombination events, and a Bayesian
phylogenetic approach in delineating breakpoints of such events in nucleotide
sequences. We evaluate the performance of these approaches using simulated
data, and demonstrate the applicability of this strategy to empirical data. The
two-phase strategy proves to be time-efficient when applied to large datasets,
and yields high-confidence results.Comment: 5 pages, 3 figures. Chan CX, Beiko RG and Ragan MA (2007). A
two-phase approach for detecting recombination in nucleotide sequences. In
Hazelhurst S and Ramsay M (Eds) Proceedings of the First Southern African
Bioinformatics Workshop, 28-30 January, Johannesburg, 9-1
Accurate reconstruction of insertion-deletion histories by statistical phylogenetics
The Multiple Sequence Alignment (MSA) is a computational abstraction that
represents a partial summary either of indel history, or of structural
similarity. Taking the former view (indel history), it is possible to use
formal automata theory to generalize the phylogenetic likelihood framework for
finite substitution models (Dayhoff's probability matrices and Felsenstein's
pruning algorithm) to arbitrary-length sequences. In this paper, we report
results of a simulation-based benchmark of several methods for reconstruction
of indel history. The methods tested include a relatively new algorithm for
statistical marginalization of MSAs that sums over a stochastically-sampled
ensemble of the most probable evolutionary histories. For mammalian
evolutionary parameters on several different trees, the single most likely
history sampled by our algorithm appears less biased than histories
reconstructed by other MSA methods. The algorithm can also be used for
alignment-free inference, where the MSA is explicitly summed out of the
analysis. As an illustration of our method, we discuss reconstruction of the
evolutionary histories of human protein-coding genes.Comment: 28 pages, 15 figures. arXiv admin note: text overlap with
arXiv:1103.434
High genetic diversity at the extreme range edge: nucleotide variation at nuclear loci in Scots pine (Pinus sylvestris L.) in Scotland
Nucleotide polymorphism at 12 nuclear loci was studied in Scots pine populations across an environmental gradient in Scotland, to evaluate the impacts of demographic history and selection on genetic diversity. At eight loci, diversity patterns were compared between Scottish and continental European populations. At these loci, a similar level of diversity (θsil=~0.01) was found in Scottish vs mainland European populations, contrary to expectations for recent colonization, however, less rapid decay of linkage disequilibrium was observed in the former (ρ=0.0086±0.0009, ρ=0.0245±0.0022, respectively). Scottish populations also showed a deficit of rare nucleotide variants (multi-locus Tajima's D=0.316 vs D=−0.379) and differed significantly from mainland populations in allelic frequency and/or haplotype structure at several loci. Within Scotland, western populations showed slightly reduced nucleotide diversity (πtot=0.0068) compared with those from the south and east (0.0079 and 0.0083, respectively) and about three times higher recombination to diversity ratio (ρ/θ=0.71 vs 0.15 and 0.18, respectively). By comparison with results from coalescent simulations, the observed allelic frequency spectrum in the western populations was compatible with a relatively recent bottleneck (0.00175 × 4Ne generations) that reduced the population to about 2% of the present size. However, heterogeneity in the allelic frequency distribution among geographical regions in Scotland suggests that subsequent admixture of populations with different demographic histories may also have played a role
The inference of gene trees with species trees
Molecular phylogeny has focused mainly on improving models for the
reconstruction of gene trees based on sequence alignments. Yet, most
phylogeneticists seek to reveal the history of species. Although the histories
of genes and species are tightly linked, they are seldom identical, because
genes duplicate, are lost or horizontally transferred, and because alleles can
co-exist in populations for periods that may span several speciation events.
Building models describing the relationship between gene and species trees can
thus improve the reconstruction of gene trees when a species tree is known, and
vice-versa. Several approaches have been proposed to solve the problem in one
direction or the other, but in general neither gene trees nor species trees are
known. Only a few studies have attempted to jointly infer gene trees and
species trees. In this article we review the various models that have been used
to describe the relationship between gene trees and species trees. These models
account for gene duplication and loss, transfer or incomplete lineage sorting.
Some of them consider several types of events together, but none exists
currently that considers the full repertoire of processes that generate gene
trees along the species tree. Simulations as well as empirical studies on
genomic data show that combining gene tree-species tree models with models of
sequence evolution improves gene tree reconstruction. In turn, these better
gene trees provide a better basis for studying genome evolution or
reconstructing ancestral chromosomes and ancestral gene sequences. We predict
that gene tree-species tree methods that can deal with genomic data sets will
be instrumental to advancing our understanding of genomic evolution.Comment: Review article in relation to the "Mathematical and Computational
Evolutionary Biology" conference, Montpellier, 201
A phylogeny of birds based on over 1,500 loci collected by target enrichment and high-throughput sequencing
Evolutionary relationships among birds in Neoaves, the clade comprising the
vast majority of avian diversity, have vexed systematists due to the ancient,
rapid radiation of numerous lineages. We applied a new phylogenomic approach to
resolve relationships in Neoaves using target enrichment (sequence capture) and
high-throughput sequencing of ultraconserved elements (UCEs) in avian genomes.
We collected sequence data from UCE loci for 32 members of Neoaves and one
outgroup (chicken) and analyzed data sets that differed in their amount of
missing data. An alignment of 1,541 loci that allowed missing data was 87%
complete and resulted in a highly resolved phylogeny with broad agreement
between the Bayesian and maximum-likelihood (ML) trees. Although results from
the 100% complete matrix of 416 UCE loci were similar, the Bayesian and ML
trees differed to a greater extent in this analysis, suggesting that increasing
from 416 to 1,541 loci led to increased stability and resolution of the tree.
Novel results of our study include surprisingly close relationships between
phenotypically divergent bird families, such as tropicbirds (Phaethontidae) and
the sunbittern (Eurypygidae) as well as between bustards (Otididae) and turacos
(Musophagidae). This phylogeny bolsters support for monophyletic waterbird and
landbird clades and also strongly supports controversial results from previous
studies, including the sister relationship between passerines and parrots and
the non-monophyly of raptorial birds in the hawk and falcon families. Although
significant challenges remain to fully resolving some of the deep relationships
in Neoaves, especially among lineages outside the waterbirds and landbirds,
this study suggests that increased data will yield an increasingly resolved
avian phylogeny.Comment: 30 pages, 1 table, 4 figures, 1 supplementary table, 3 supplementary
figure
- …