1,086 research outputs found
transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences
BACKGROUND: Alignments of homologous DNA sequences are crucial for comparative genomics and phylogenetic analysis. However, multiple alignment represents a computationally difficult problem. For protein-coding DNA sequences, it is more advantageous in terms of both speed and accuracy to align the amino-acid sequences specified by the DNA sequences rather than the DNA sequences themselves. Many implementations making use of this concept of "translated alignments" are incomplete in the sense that they require the user to manually translate the DNA sequences and to perform the amino-acid alignment. As such, they are not well suited to large-scale automated alignments of large and/or numerous DNA data sets. RESULTS: transAlign is an open-source Perl script that aligns protein-coding DNA sequences via their amino-acid translations to take advantage of the superior multiple-alignment capabilities and speed of an amino-acid alignment. It operates by translating each DNA sequence into its corresponding amino-acid sequence, passing the entire matrix to ClustalW for alignment, and then back-translating the resulting amino-acid alignment to derive the aligned DNA sequences. In the translation step, transAlign determines the optimal orientation and reading frame for each DNA sequence according to the desired genetic code. It also checks for apparent frame shifts in the DNA sequences and can handle frame-shifted sequences in one of three ways (delete, align as amino acids regardless, or profile align as DNA). As a set of comparative benchmarks derived from six protein-coding genes for mammals shows, the strategy implemented in transAlign always improves the speed and usually the apparent accuracy of the alignment of protein-coding DNA sequences. CONCLUSION: transAlign represents one of few full and cross-platform implementations of the concept of translated alignments. Both the advantages accruing from performing a translated alignment and the suite of user-definable options available in the program mean that transAlign is ideally suited for large-scale automated alignments of very large and/or very numerous protein-coding DNA data sets. However, the good performance offered by the program also translates to the alignment of any set of protein-coding sequences. transAlign, including the source code, is freely available at http://www.tierzucht.tum.de/Bininda-Emonds/ (under "Programs")
Fast Genes and Slow Clades: Comparative Rates of Molecular Evolution in Mammals
Although interest in the rate of molecular evolution and the molecular clock remains high, our knowledge for most groups in these areas is derived largely from a patchwork of studies limited in both their taxon coverage and the number of genes examined. Using a comprehensive molecular data set of 44 genes (18 nDNA, 11 tRNA and 15 additional mtDNA genes) together with a virtually complete and dated phylogeny of extant mammals, I 1) describe differences in the rate of molecular evolution (i.e. substitution rate) within this group in an explicit phylogenetic and quantitative framework and 2) present the first attempt to localize the phylogenetic positions of any rate shifts. Significant rate differences were few and confirmed several long-held trends, including a progressive rate slowdown within hominids and a reduced substitution rate within Cetacea. However, many new patterns were also uncovered, including the mammalian orders being characterized generally by basal rate slowdowns. A link between substitution rate and the size of a clade (which derives from its net speciation rate) is also suggested, with the species-poor major clades (“orders”) showing more decreased rates that often extend throughout the entire clade. Significant rate increases were rare, with the rates within (murid) rodents being fast, but not significantly so with respect to other mammals as a whole. Despite clear lineage-specific differences, rates generally change gradually along these lineages, supporting the potential existence of a local molecular clock in mammals. Together, these results will lay the foundation for a broad-scale analysis to establish the correlates and causes of the rate of molecular evolution in mammals
Correlates of substitution rate variation in mammalian protein-coding sequences
BACKGROUND: Rates of molecular evolution in different lineages can vary widely, and some of this
variation might be predictable from aspects of species' biology. Investigating such predictable rate
variation can help us to understand the causes of molecular evolution, and could also help to
improve molecular dating methods. Here we present a comprehensive study of the life history
correlates of substitution rate variation across the mammals, comparing results for mitochondrial
and nuclear loci, and for synonymous and non-synonymous sites. We use phylogenetic comparative
methods, refined to take into account the special nature of substitution rate data. Particular
attention is paid to the widespread correlations between the components of mammalian life
history, which can complicate the interpretation of results.
RESULTS: We find that mitochondrial synonymous substitution rates, estimated from the 9 longest
mitochondrial genes, show strong negative correlations with body mass and with maximum
recorded lifespan. But lifespan is the sole variable to remain after multiple regression and model
simplification. Nuclear synonymous substitution rates, estimated from 6 genes, show strong
negative correlations with body mass and generation time, and a strong positive correlation with
fecundity. In contrast to the mitochondrial results, the same trends are evident in rates of
nonsynonymous substitution.
CONCLUSION: A substantial proportion of variation in mammalian substitution rates can be
explained by aspects of their life history, implying that molecular and life history evolution are
closely interlinked in this group. The strength and consistency of the nuclear body mass effect
suggests that molecular dating studies may have been systematically misled, but also that methods
could be improved by incorporating the finding as a priori information. Mitochondrial synonymous
rates also show the body mass effect, but for apparently quite different reasons, and the strength
of the relationship with maximum lifespan provides support for the hypothesis that mtDNA
damage is causally linked to aging
Correlates of substitution rate variation in mammalian protein-coding sequences
<p>Abstract</p> <p>Background</p> <p>Rates of molecular evolution in different lineages can vary widely, and some of this variation might be predictable from aspects of species' biology. Investigating such predictable rate variation can help us to understand the causes of molecular evolution, and could also help to improve molecular dating methods. Here we present a comprehensive study of the life history correlates of substitution rate variation across the mammals, comparing results for mitochondrial and nuclear loci, and for synonymous and non-synonymous sites. We use phylogenetic comparative methods, refined to take into account the special nature of substitution rate data. Particular attention is paid to the widespread correlations between the components of mammalian life history, which can complicate the interpretation of results.</p> <p>Results</p> <p>We find that mitochondrial synonymous substitution rates, estimated from the 9 longest mitochondrial genes, show strong negative correlations with body mass and with maximum recorded lifespan. But lifespan is the sole variable to remain after multiple regression and model simplification. Nuclear synonymous substitution rates, estimated from 6 genes, show strong negative correlations with body mass and generation time, and a strong positive correlation with fecundity. In contrast to the mitochondrial results, the same trends are evident in rates of nonsynonymous substitution.</p> <p>Conclusion</p> <p>A substantial proportion of variation in mammalian substitution rates can be explained by aspects of their life history, implying that molecular and life history evolution are closely interlinked in this group. The strength and consistency of the nuclear body mass effect suggests that molecular dating studies may have been systematically misled, but also that methods could be improved by incorporating the finding as <it>a priori </it>information. Mitochondrial synonymous rates also show the body mass effect, but for apparently quite different reasons, and the strength of the relationship with maximum lifespan provides support for the hypothesis that mtDNA damage is causally linked to aging.</p
A phylogenetic supertree of the fowls (Galloanserae, Aves)
The fowls (Anseriformes and Galliformes) comprise one of the major lineages of birds and occupy almost all biogeographical regions of the world. The group contains the most economically important of all bird species, each with a long history of domestication, and is an ideal model for studying ecological and evolutionary patterns. Yet, despite the relatively large amount of systematic attention fowls have attracted because of their socio-economic and biological importance, the species-level relationships within this clade remain controversial. Here we used the supertree method matrix representation with parsimony to generate a robust estimate of species-level relationships of fowls. The supertree represents one of the most comprehensive estimates for the group to date, including 376 species (83.2% of all species; all 162 Anseriformes and 214 Galliformes) and all but one genera. The supertree was well-resolved (81.1%) and supported the monophyly of both Anseriformes and Galliformes. The supertree supported the partitioning of Anseriformes into the three traditional families Anhimidae, Anseranatidae, and Anatidae, although it provided relatively poor resolution within Anatidae. For Galliformes, the majority-rule supertree was largely consistent with the hypothesis of sequential sister-group relationships between Megapodiidae, Cracidae, and the remaining Galliformes. However, our species-level supertree indicated that more than 30% of the polytypic genera examined were not monophyletic, suggesting that results from genus-level comparative studies using the average of the constituent species’ traits should be interpreted with caution until analogous species-level comparative studies are available. Poorly resolved areas of the supertree reflect gaps or outstanding conflict within the existing phylogenetic database, highlighting areas in need of more study in addition to those species not present on the tree at all due to insufficient information. Even so, our supertree will provide a valuable foundation for understanding the diverse biology of fowls in a robust phylogenetic framework
SuperCAT: a supertree database for combined and integrative multilocus sequence typing analysis of the Bacillus cereus group of bacteria (including B. cereus, B. anthracis and B. thuringiensis)
The Bacillus cereus group of bacteria is an important group including mammalian and insect pathogens, such as B. anthracis, the anthrax bacterium, B. thuringiensis, used as a biological pesticide and B. cereus, often involved in food poisoning incidents. To characterize the population structure and epidemiology of these bacteria, five separate multilocus sequence typing (MLST) schemes have been developed, which makes results difficult to compare. Therefore, we have developed a database that compiles and integrates MLST data from all five schemes for the B. cereus group, accessible at http://mlstoslo.uio.no/. Supertree techniques were used to combine the phylogenetic information from analysis of all schemes and datasets, in order to produce an integrated view of the B. cereus group population. The database currently contains strain information and sequence data for 1029 isolates and 26 housekeeping gene fragments, which can be searched by keywords, MLST scheme, or sequence similarity. Supertrees can be browsed according to various criteria such as species, isolate source, or genetic distance, and subtrees containing strains of interest can be extracted. Besides analysis of the available data, the user has the possibility to enter her/his own sequences and compare them to the database and/or include them into the supertree reconstructions
Inferring the Tree of Life: chopping a phylogenomic problem down to size?
The combination of molecular sequence data and bioinformatics has revolutionized phylogenetic inference over the past decade, vastly increasing the scope of the evolutionary trees that we are able to infer. A recent paper in BMC Biology describing a new phylogenomic pipeline to help automate the inference of evolutionary trees from public sequence databases provides another important tool in our efforts to derive the Tree of Life
Phylogeny and divergence of the pinnipeds (Carnivora: Mammalia) assessed using a multigene dataset
Abstract Background Phylogenetic comparative methods are often improved by complete phylogenies with meaningful branch lengths (e.g., divergence dates). This study presents a dated molecular supertree for all 34 world pinniped species derived from a weighted matrix representation with parsimony (MRP) supertree analysis of 50 gene trees, each determined under a maximum likelihood (ML) framework. Divergence times were determined by mapping the same sequence data (plus two additional genes) on to the supertree topology and calibrating the ML branch lengths against a range of fossil calibrations. We assessed the sensitivity of our supertree topology in two ways: 1) a second supertree with all mtDNA genes combined into a single source tree, and 2) likelihood-based supermatrix analyses. Divergence dates were also calculated using a Bayesian relaxed molecular clock with rate autocorrelation to test the sensitivity of our supertree results further. Results The resulting phylogenies all agreed broadly with recent molecular studies, in particular supporting the monophyly of Phocidae, Otariidae, and the two phocid subfamilies, as well as an Odobenidae + Otariidae sister relationship; areas of disagreement were limited to four more poorly supported regions. Neither the supertree nor supermatrix analyses supported the monophyly of the two traditional otariid subfamilies, supporting suggestions for the need for taxonomic revision in this group. Phocid relationships were similar to other recent studies and deeper branches were generally well-resolved. Halichoerus grypus was nested within a paraphyletic Pusa, although relationships within Phocina tend to be poorly supported. Divergence date estimates for the supertree were in good agreement with other studies and the available fossil record; however, the Bayesian relaxed molecular clock divergence date estimates were significantly older. Conclusion Our results join other recent studies and highlight the need for a re-evaluation of pinniped taxonomy, especially as regards the subfamilial classification of otariids and the generic nomenclature of Phocina. Even with the recent publication of new sequence data, the available genetic sequence information for several species, particularly those in Arctocephalus, remains very limited, especially for nuclear markers. However, resolution of parts of the tree will probably remain difficult, even with additional data, due to apparent rapid radiations. Our study addresses the lack of a recent pinniped phylogeny that includes all species and robust divergence dates for all nodes, and will therefore prove indispensable to comparative and macroevolutionary studies of this group of carnivores.</p
- …