38 research outputs found

    Assessing the Accuracy of Ancestral Protein Reconstruction Methods

    Get PDF
    The phylogenetic inference of ancestral protein sequences is a powerful technique for the study of molecular evolution, but any conclusions drawn from such studies are only as good as the accuracy of the reconstruction method. Every inference method leads to errors in the ancestral protein sequence, resulting in potentially misleading estimates of the ancestral protein's properties. To assess the accuracy of ancestral protein reconstruction methods, we performed computational population evolution simulations featuring near-neutral evolution under purifying selection, speciation, and divergence using an off-lattice protein model where fitness depends on the ability to be stable in a specified target structure. We were thus able to compare the thermodynamic properties of the true ancestral sequences with the properties of “ancestral sequences” inferred by maximum parsimony, maximum likelihood, and Bayesian methods. Surprisingly, we found that methods such as maximum parsimony and maximum likelihood that reconstruct a “best guess” amino acid at each position overestimate thermostability, while a Bayesian method that sometimes chooses less-probable residues from the posterior probability distribution does not. Maximum likelihood and maximum parsimony apparently tend to eliminate variants at a position that are slightly detrimental to structural stability simply because such detrimental variants are less frequent. Other properties of ancestral proteins might be similarly overestimated. This suggests that ancestral reconstruction studies require greater care to come to credible conclusions regarding functional evolution. Inferred functional patterns that mimic reconstruction bias should be reevaluated

    Bringing Molecules Back into Molecular Evolution

    Get PDF
    Much molecular-evolution research is concerned with sequence analysis. Yet these sequences represent real, three-dimensional molecules with complex structure and function. Here I highlight a growing trend in the field to incorporate molecular structure and function into computational molecular-evolution work. I consider three focus areas: reconstruction and analysis of past evolutionary events, such as phylogenetic inference or methods to infer selection pressures; development of toy models and simulations to identify fundamental principles of molecular evolution; and atom-level, highly realistic computational modeling of molecular structure and function aimed at making predictions about possible future evolutionary events

    More Taxa Are Not Necessarily Better for the Reconstruction of Ancestral Character States

    Full text link
    We show that the accuracy of reconstrucing an ancestral state is not an increasing function of the size of taxon sampling.Comment: 21 page

    Resurrection of an ancestral 5S rRNA

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In addition to providing phylogenetic relationships, tree making procedures such as parsimony and maximum likelihood can make specific predictions of actual historical sequences. Resurrection of such sequences can be used to understand early events in evolution. In the case of RNA, the nature of parsimony is such that when applied to multiple RNA sequences it typically predicts ancestral sequences that satisfy the base pairing constraints associated with secondary structure. The case for such sequences being actual ancestors is greatly improved, if they can be shown to be biologically functional.</p> <p>Results</p> <p>A unique common ancestral sequence of 28 <it>Vibrio </it>5S ribosomal RNA sequences predicted by parsimony was resurrected and found to be functional in the context of the <it>E. coli </it>cellular environment. The functionality of various point variants and intermediates that were constructed as part of the resurrection were examined in detail. When separately introduced the changes at single stranded positions and individual double variants at base-paired positions were also viable. An additional double variant was examined at a different base-paired position and it was also valid.</p> <p>Conclusions</p> <p>The results show that at least in the case of the 5S rRNAs considered here, ancestors predicted by parsimony are likely to be realistic when the prediction is not overly influenced by single outliers. It is especially noteworthy that the phenotype of the predicted ancestors could be anticipated as a cumulative consequence of the phenotypes of the individual variants that comprised them. Thus, point mutation data is potentially useful in evaluating the reasonableness of ancestral sequences predicted by parsimony or other methods. The results also suggest that in the absence of significant tertiary structure constraints double variants that preserve pairing in stem regions will typically be accepted. Overall, the results suggest that it will be feasible to resurrect additional meaningful 5S rRNA ancestors as well as ancestral sequences of many different types of RNA.</p

    Mutational Patterns in RNA Secondary Structure Evolution Examined in Three RNA Families

    Get PDF
    The goal of this work was to study mutational patterns in the evolution of RNA secondary structure. We analyzed bacterial tmRNA, RNaseP and eukaryotic telomerase RNA secondary structures, mapping structural variability onto phylogenetic trees constructed primarily from rRNA sequences. We found that secondary structures evolve both by whole stem insertion/deletion, and by mutations that create or disrupt stem base pairing. We analyzed the evolution of stem lengths and constructed substitution matrices describing the changes responsible for the variation in the RNA stem length. In addition, we used principal component analysis of the stem length data to determine the most variable stems in different families of RNA. This data provides new insights into the evolution of RNA secondary structures and patterns of variation in the lengths of double helical regions of RNA molecules. Our findings will facilitate design of improved mutational models for RNA structure evolution

    MetaPIGA v2.0: maximum likelihood large phylogeny estimation using the metapopulation genetic algorithm and other stochastic heuristics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The development, in the last decade, of stochastic heuristics implemented in robust application softwares has made large phylogeny inference a key step in most comparative studies involving molecular sequences. Still, the choice of a phylogeny inference software is often dictated by a combination of parameters not related to the raw performance of the implemented algorithm(s) but rather by practical issues such as ergonomics and/or the availability of specific functionalities.</p> <p>Results</p> <p>Here, we present MetaPIGA v2.0, a robust implementation of several stochastic heuristics for large phylogeny inference (under maximum likelihood), including a Simulated Annealing algorithm, a classical Genetic Algorithm, and the Metapopulation Genetic Algorithm (metaGA) together with complex substitution models, discrete Gamma rate heterogeneity, and the possibility to partition data. MetaPIGA v2.0 also implements the Likelihood Ratio Test, the Akaike Information Criterion, and the Bayesian Information Criterion for automated selection of substitution models that best fit the data. Heuristics and substitution models are highly customizable through manual batch files and command line processing. However, MetaPIGA v2.0 also offers an extensive graphical user interface for parameters setting, generating and running batch files, following run progress, and manipulating result trees. MetaPIGA v2.0 uses standard formats for data sets and trees, is platform independent, runs in 32 and 64-bits systems, and takes advantage of multiprocessor and multicore computers.</p> <p>Conclusions</p> <p>The metaGA resolves the major problem inherent to classical Genetic Algorithms by maintaining high inter-population variation even under strong intra-population selection. Implementation of the metaGA together with additional stochastic heuristics into a single software will allow rigorous optimization of each heuristic as well as a meaningful comparison of performances among these algorithms. MetaPIGA v2.0 gives access both to high customization for the phylogeneticist, as well as to an ergonomic interface and functionalities assisting the non-specialist for sound inference of large phylogenetic trees using nucleotide sequences. MetaPIGA v2.0 and its extensive user-manual are freely available to academics at <url>http://www.metapiga.org</url>.</p

    Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs

    Get PDF
    Abstract Background Accurately modeling the sequence substitution process is required for the correct estimation of evolutionary parameters, be they phylogenetic relationships, substitution rates or ancestral states; it is also crucial to simulate realistic data sets. Such simulation procedures are needed to estimate the null-distribution of complex statistics, an approach referred to as parametric bootstrapping, and are also used to test the quality of phylogenetic reconstruction programs. It has often been observed that homologous sequences can vary widely in their nucleotide or amino-acid compositions, revealing that sequence evolution has changed importantly among lineages, and may therefore be most appropriately approached through non-homogeneous models. Several programs implementing such models have been developed, but they are limited in their possibilities: only a few particular models are available for likelihood optimization, and data sets cannot be easily generated using the resulting estimated parameters. Results We hereby present a general implementation of non-homogeneous models of substitutions. It is available as dedicated classes in the Bio++ libraries and can hence be used in any C++ program. Two programs that use these classes are also presented. The first one, Bio++ Maximum Likelihood (BppML), estimates parameters of any non-homogeneous model and the second one, Bio++ Sequence Generator (BppSeqGen), simulates the evolution of sequences from these models. These programs allow the user to describe non-homogeneous models through a property file with a simple yet powerful syntax, without any programming required. Conclusion We show that the general implementation introduced here can accommodate virtually any type of non-homogeneous models of sequence evolution, including heterotachous ones, while being computer efficient. We furthermore illustrate the use of such general models for parametric bootstrapping, using tests of non-homogeneity applied to an already published ribosomal RNA data set.</p

    Reconstructed Ancestral Enzymes Impose a Fitness Cost upon Modern Bacteria Despite Exhibiting Favourable Biochemical Properties

    Get PDF
    Ancestral sequence reconstruction has been widely used to study historical enzyme evolution, both from biochemical and cellular perspectives. Two properties of reconstructed ancestral proteins/enzymes are commonly reported—high thermostability and high catalytic activity—compared with their contemporaries. Increased protein stability is associated with lower aggregation rates, higher soluble protein abundance and a greater capacity to evolve, and therefore, these proteins could be considered “superior” to their contemporary counterparts. In this study, we investigate the relationship between the favourable in vitro biochemical properties of reconstructed ancestral enzymes and the organismal fitness they confer in vivo. We have previously reconstructed several ancestors of the enzyme LeuB, which is essential for leucine biosynthesis. Our initial fitness experiments revealed that overexpression of ANC4, a reconstructed LeuB that exhibits high stability and activity, was only able to partially rescue the growth of a ΔleuB strain, and that a strain complemented with this enzyme was outcompeted by strains carrying one of its descendants. When we expanded our study to include five reconstructed LeuBs and one contemporary, we found that neither in vitro protein stability nor the catalytic rate was correlated with fitness. Instead, fitness showed a strong, negative correlation with estimated evolutionary age (based on phylogenetic relationships). Our findings suggest that, for reconstructed ancestral enzymes, superior in vitro properties do not translate into organismal fitness in vivo. The molecular basis of the relationship between fitness and the inferred age of ancestral LeuB enzymes is unknown, but may be related to the reconstruction process. We also hypothesise that the ancestral enzymes may be incompatible with the other, contemporary enzymes of the metabolic network.France. Agence nationale de la recherch
    corecore