18 research outputs found

    transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences

    Get PDF
    BACKGROUND: Alignments of homologous DNA sequences are crucial for comparative genomics and phylogenetic analysis. However, multiple alignment represents a computationally difficult problem. For protein-coding DNA sequences, it is more advantageous in terms of both speed and accuracy to align the amino-acid sequences specified by the DNA sequences rather than the DNA sequences themselves. Many implementations making use of this concept of "translated alignments" are incomplete in the sense that they require the user to manually translate the DNA sequences and to perform the amino-acid alignment. As such, they are not well suited to large-scale automated alignments of large and/or numerous DNA data sets. RESULTS: transAlign is an open-source Perl script that aligns protein-coding DNA sequences via their amino-acid translations to take advantage of the superior multiple-alignment capabilities and speed of an amino-acid alignment. It operates by translating each DNA sequence into its corresponding amino-acid sequence, passing the entire matrix to ClustalW for alignment, and then back-translating the resulting amino-acid alignment to derive the aligned DNA sequences. In the translation step, transAlign determines the optimal orientation and reading frame for each DNA sequence according to the desired genetic code. It also checks for apparent frame shifts in the DNA sequences and can handle frame-shifted sequences in one of three ways (delete, align as amino acids regardless, or profile align as DNA). As a set of comparative benchmarks derived from six protein-coding genes for mammals shows, the strategy implemented in transAlign always improves the speed and usually the apparent accuracy of the alignment of protein-coding DNA sequences. CONCLUSION: transAlign represents one of few full and cross-platform implementations of the concept of translated alignments. Both the advantages accruing from performing a translated alignment and the suite of user-definable options available in the program mean that transAlign is ideally suited for large-scale automated alignments of very large and/or very numerous protein-coding DNA data sets. However, the good performance offered by the program also translates to the alignment of any set of protein-coding sequences. transAlign, including the source code, is freely available at http://www.tierzucht.tum.de/Bininda-Emonds/ (under "Programs")

    Correlates of substitution rate variation in mammalian protein-coding sequences

    Get PDF
    BACKGROUND: Rates of molecular evolution in different lineages can vary widely, and some of this variation might be predictable from aspects of species' biology. Investigating such predictable rate variation can help us to understand the causes of molecular evolution, and could also help to improve molecular dating methods. Here we present a comprehensive study of the life history correlates of substitution rate variation across the mammals, comparing results for mitochondrial and nuclear loci, and for synonymous and non-synonymous sites. We use phylogenetic comparative methods, refined to take into account the special nature of substitution rate data. Particular attention is paid to the widespread correlations between the components of mammalian life history, which can complicate the interpretation of results. RESULTS: We find that mitochondrial synonymous substitution rates, estimated from the 9 longest mitochondrial genes, show strong negative correlations with body mass and with maximum recorded lifespan. But lifespan is the sole variable to remain after multiple regression and model simplification. Nuclear synonymous substitution rates, estimated from 6 genes, show strong negative correlations with body mass and generation time, and a strong positive correlation with fecundity. In contrast to the mitochondrial results, the same trends are evident in rates of nonsynonymous substitution. CONCLUSION: A substantial proportion of variation in mammalian substitution rates can be explained by aspects of their life history, implying that molecular and life history evolution are closely interlinked in this group. The strength and consistency of the nuclear body mass effect suggests that molecular dating studies may have been systematically misled, but also that methods could be improved by incorporating the finding as a priori information. Mitochondrial synonymous rates also show the body mass effect, but for apparently quite different reasons, and the strength of the relationship with maximum lifespan provides support for the hypothesis that mtDNA damage is causally linked to aging

    Correlates of substitution rate variation in mammalian protein-coding sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Rates of molecular evolution in different lineages can vary widely, and some of this variation might be predictable from aspects of species' biology. Investigating such predictable rate variation can help us to understand the causes of molecular evolution, and could also help to improve molecular dating methods. Here we present a comprehensive study of the life history correlates of substitution rate variation across the mammals, comparing results for mitochondrial and nuclear loci, and for synonymous and non-synonymous sites. We use phylogenetic comparative methods, refined to take into account the special nature of substitution rate data. Particular attention is paid to the widespread correlations between the components of mammalian life history, which can complicate the interpretation of results.</p> <p>Results</p> <p>We find that mitochondrial synonymous substitution rates, estimated from the 9 longest mitochondrial genes, show strong negative correlations with body mass and with maximum recorded lifespan. But lifespan is the sole variable to remain after multiple regression and model simplification. Nuclear synonymous substitution rates, estimated from 6 genes, show strong negative correlations with body mass and generation time, and a strong positive correlation with fecundity. In contrast to the mitochondrial results, the same trends are evident in rates of nonsynonymous substitution.</p> <p>Conclusion</p> <p>A substantial proportion of variation in mammalian substitution rates can be explained by aspects of their life history, implying that molecular and life history evolution are closely interlinked in this group. The strength and consistency of the nuclear body mass effect suggests that molecular dating studies may have been systematically misled, but also that methods could be improved by incorporating the finding as <it>a priori </it>information. Mitochondrial synonymous rates also show the body mass effect, but for apparently quite different reasons, and the strength of the relationship with maximum lifespan provides support for the hypothesis that mtDNA damage is causally linked to aging.</p

    Phylogeny and divergence of the pinnipeds (Carnivora: Mammalia) assessed using a multigene dataset

    Get PDF
    Abstract Background Phylogenetic comparative methods are often improved by complete phylogenies with meaningful branch lengths (e.g., divergence dates). This study presents a dated molecular supertree for all 34 world pinniped species derived from a weighted matrix representation with parsimony (MRP) supertree analysis of 50 gene trees, each determined under a maximum likelihood (ML) framework. Divergence times were determined by mapping the same sequence data (plus two additional genes) on to the supertree topology and calibrating the ML branch lengths against a range of fossil calibrations. We assessed the sensitivity of our supertree topology in two ways: 1) a second supertree with all mtDNA genes combined into a single source tree, and 2) likelihood-based supermatrix analyses. Divergence dates were also calculated using a Bayesian relaxed molecular clock with rate autocorrelation to test the sensitivity of our supertree results further. Results The resulting phylogenies all agreed broadly with recent molecular studies, in particular supporting the monophyly of Phocidae, Otariidae, and the two phocid subfamilies, as well as an Odobenidae + Otariidae sister relationship; areas of disagreement were limited to four more poorly supported regions. Neither the supertree nor supermatrix analyses supported the monophyly of the two traditional otariid subfamilies, supporting suggestions for the need for taxonomic revision in this group. Phocid relationships were similar to other recent studies and deeper branches were generally well-resolved. Halichoerus grypus was nested within a paraphyletic Pusa, although relationships within Phocina tend to be poorly supported. Divergence date estimates for the supertree were in good agreement with other studies and the available fossil record; however, the Bayesian relaxed molecular clock divergence date estimates were significantly older. Conclusion Our results join other recent studies and highlight the need for a re-evaluation of pinniped taxonomy, especially as regards the subfamilial classification of otariids and the generic nomenclature of Phocina. Even with the recent publication of new sequence data, the available genetic sequence information for several species, particularly those in Arctocephalus, remains very limited, especially for nuclear markers. However, resolution of parts of the tree will probably remain difficult, even with additional data, due to apparent rapid radiations. Our study addresses the lack of a recent pinniped phylogeny that includes all species and robust divergence dates for all nodes, and will therefore prove indispensable to comparative and macroevolutionary studies of this group of carnivores.</p

    A higher-level MRP supertree of placental mammals

    Get PDF
    BACKGROUND: The higher-level phylogeny of placental mammals has long been a phylogenetic Gordian knot, with disagreement about both the precise contents of, and relationships between, the extant orders. A recent MRP supertree that favoured 'outdated' hypotheses (notably, monophyly of both Artiodactyla and Lipotyphla) has been heavily criticised for including low-quality and redundant data. We apply a stringent data selection protocol designed to minimise these problems to a much-expanded data set of morphological, molecular and combined source trees, to produce a supertree that includes every family of extant placental mammals. RESULTS: The supertree is well-resolved and supports both polyphyly of Lipotyphla and paraphyly of Artiodactyla with respect to Cetacea. The existence of four 'superorders' – Afrotheria, Xenarthra, Laurasiatheria and Euarchontoglires – is also supported. The topology is highly congruent with recent (molecular) phylogenetic analyses of placental mammals, but is considerably more comprehensive, being the first phylogeny to include all 113 extant families without making a priori assumptions of suprafamilial monophyly. Subsidiary analyses reveal that the data selection protocol played a key role in the major changes relative to a previously published higher-level supertree of placentals. CONCLUSION: The supertree should provide a useful framework for hypothesis testing in phylogenetic comparative biology, and supports the idea that biogeography has played a crucial role in the evolution of placental mammals. Our results demonstrate the importance of minimising poor and redundant data when constructing supertrees

    Inferring the Tree of Life: chopping a phylogenomic problem down to size?

    Get PDF
    The combination of molecular sequence data and bioinformatics has revolutionized phylogenetic inference over the past decade, vastly increasing the scope of the evolutionary trees that we are able to infer. A recent paper in BMC Biology describing a new phylogenomic pipeline to help automate the inference of evolutionary trees from public sequence databases provides another important tool in our efforts to derive the Tree of Life

    Updating the evolutionary history of Carnivora (Mammalia): a new species-level supertree complete with divergence time estimates

    Get PDF

    Updating the evolutionary history of Carnivora (Mammalia): a new species-level supertree complete with divergence time estimates

    No full text
    Abstract Background Although it has proven to be an important foundation for investigations of carnivoran ecology, biology and evolution, the complete species-level supertree for Carnivora of Bininda-Emonds et al. is showing its age. Additional, largely molecular sequence data are now available for many species and the advancement of computer technology means that many of the limitations of the original analysis can now be avoided. We therefore sought to provide an updated estimate of the phylogenetic relationships within all extant Carnivora, again using supertree analysis to be able to analyze as much of the global phylogenetic database for the group as possible. Results In total, 188 source trees were combined, representing 114 trees from the literature together with 74 newly constructed gene trees derived from nearly 45,000 bp of sequence data from GenBank. The greater availability of sequence data means that the new supertree is almost completely resolved and also better reflects current phylogenetic opinion (for example, supporting a monophyletic Mephitidae, Eupleridae and Prionodontidae; placing Nandinia binotata as sister to the remaining Feliformia). Following an initial rapid radiation, diversification rate analyses indicate a downturn in the net speciation rate within the past three million years as well as a possible increase some 18.0 million years ago; numerous diversification rate shifts within the order were also identified. Conclusions Together, the two carnivore supertrees remain the only complete phylogenetic estimates for all extant species and the new supertree, like the old one, will form a key tool in helping us to further understand the biology of this charismatic group of carnivores.</p
    corecore