1,115 research outputs found

    Correcting the Bias of Empirical Frequency Parameter Estimators in Codon Models

    Get PDF
    Markov models of codon substitution are powerful inferential tools for studying biological processes such as natural selection and preferences in amino acid substitution. The equilibrium character distributions of these models are almost always estimated using nucleotide frequencies observed in a sequence alignment, primarily as a matter of historical convention. In this note, we demonstrate that a popular class of such estimators are biased, and that this bias has an adverse effect on goodness of fit and estimates of substitution rates. We propose a β€œcorrected” empirical estimator that begins with observed nucleotide counts, but accounts for the nucleotide composition of stop codons. We show via simulation that the corrected estimates outperform the de facto standard estimates not just by providing better estimates of the frequencies themselves, but also by leading to improved estimation of other parameters in the evolutionary models. On a curated collection of sequence alignments, our estimators show a significant improvement in goodness of fit compared to the approach. Maximum likelihood estimation of the frequency parameters appears to be warranted in many cases, albeit at a greater computational cost. Our results demonstrate that there is little justification, either statistical or computational, for continued use of the -style estimators

    Episodic Evolution and Adaptation of Chloroplast Genomes in Ancestral Grasses

    Get PDF
    It has been suggested that the chloroplast genomes of the grass family, Poaceae, have undergone an elevated evolutionary rate compared to most other angiosperms, yet the details of this phenomenon have remained obscure. To know how the rate change occurred during evolution, estimation of the time-scale with reliable calibrations is needed. The recent finding of 65 Ma grass phytoliths in Cretaceous dinosaur coprolites places the diversification of the grasses to the Cretaceous period, and provides a reliable calibration in studying the tempo and mode of grass chloroplast evolution.By using chloroplast genome data from angiosperms and by taking account of new paleontological evidence, we now show that episodic rate acceleration both in terms of non-synonymous and synonymous substitutions occurred in the common ancestral branch of the core Poaceae (a group formed by rice, wheat, maize, and their allies) accompanied by adaptive evolution in several chloroplast proteins, while the rate reverted to the slow rate typical of most monocot species in the terminal branches.Our finding of episodic rate acceleration in the ancestral grasses accompanied by adaptive molecular evolution has a profound bearing on the evolution of grasses, which form a highly successful group of plants. The widely used model for estimating divergence times was based on the assumption of correlated rates between ancestral and descendant lineages. However, the assumption is proved to be inadequate in approximating the episodic rate acceleration in the ancestral grasses, and the assumption of independent rates is more appropriate. This finding has implications for studies of molecular evolutionary rates and time-scale of evolution in other groups of organisms

    Chromatin signature of embryonic pluripotency is established during genome activation

    Get PDF
    available in PMC 2011 April 8.After fertilization the embryonic genome is inactive until transcription is initiated during the maternal–zygotic transition. This transition coincides with the formation of pluripotent cells, which in mammals can be used to generate embryonic stem cells. To study the changes in chromatin structure that accompany pluripotency and genome activation, we mapped the genomic locations of histone H3 molecules bearing lysine trimethylation modifications before and after the maternal–zygotic transition in zebrafish. Histone H3 lysine 27 trimethylation (H3K27me3), which is repressive, and H3K4me3, which is activating, were not detected before the transition. After genome activation, more than 80% of genes were marked by H3K4me3, including many inactive developmental regulatory genes that were also marked by H3K27me3. Sequential chromatin immunoprecipitation demonstrated that the same promoter regions had both trimethylation marks. Such bivalent chromatin domains also exist in embryonic stem cells and are thought to poise genes for activation while keeping them repressed. Furthermore, we found many inactive genes that were uniquely marked by H3K4me3. Despite this activating modification, these monovalent genes were neither expressed nor stably bound by RNA polymerase II. Inspection of published data sets revealed similar monovalent domains in embryonic stem cells. Moreover, H3K4me3 marks could form in the absence of both sequence-specific transcriptional activators and stable association of RNA polymerase II, as indicated by the analysis of an inducible transgene. These results indicate that bivalent and monovalent domains might poise embryonic genes for activation and that the chromatin profile associated with pluripotency is established during the maternal–zygotic transition.National Institutes of Health (U.S.) (grant 1R01 HG004069)National Institutes of Health (U.S.) (grant 5R01 GM56211)Human Frontier Science Program (Strasbourg, France) (LT-00090/2007)European Molecular Biology Organization (fellowship

    CodonTest: Modeling Amino Acid Substitution Preferences in Coding Sequences

    Get PDF
    Codon models of evolution have facilitated the interpretation of selective forces operating on genomes. These models, however, assume a single rate of non-synonymous substitution irrespective of the nature of amino acids being exchanged. Recent developments have shown that models which allow for amino acid pairs to have independent rates of substitution offer improved fit over single rate models. However, these approaches have been limited by the necessity for large alignments in their estimation. An alternative approach is to assume that substitution rates between amino acid pairs can be subdivided into rate classes, dependent on the information content of the alignment. However, given the combinatorially large number of such models, an efficient model search strategy is needed. Here we develop a Genetic Algorithm (GA) method for the estimation of such models. A GA is used to assign amino acid substitution pairs to a series of rate classes, where is estimated from the alignment. Other parameters of the phylogenetic Markov model, including substitution rates, character frequencies and branch lengths are estimated using standard maximum likelihood optimization procedures. We apply the GA to empirical alignments and show improved model fit over existing models of codon evolution. Our results suggest that current models are poor approximations of protein evolution and thus gene and organism specific multi-rate models that incorporate amino acid substitution biases are preferred. We further anticipate that the clustering of amino acid substitution rates into classes will be biologically informative, such that genes with similar functions exhibit similar clustering, and hence this clustering will be useful for the evolutionary fingerprinting of genes

    Genome-Wide Modeling of Transcription Preinitiation Complex Disassembly Mechanisms using ChIP-chip Data

    Get PDF
    Apparent occupancy levels of proteins bound to DNA in vivo can now be routinely measured on a genomic scale. A challenge in relating these occupancy levels to assembly mechanisms that are defined with biochemically isolated components lies in the veracity of assumptions made regarding the in vivo system. Assumptions regarding behavior of molecules in vivo can neither be proven true nor false, and thus is necessarily subjective. Nevertheless, within those confines, connecting in vivo protein-DNA interaction observations with defined biochemical mechanisms is an important step towards fully defining and understanding assembly/disassembly mechanisms in vivo. To this end, we have developed a computational program PathCom that models in vivo protein-DNA occupancy data as biochemical mechanisms under the assumption that occupancy levels can be related to binding duration and explicitly defined assembly/disassembly reactions. We exemplify the process with the assembly of the general transcription factors (TBP, TFIIB, TFIIE, TFIIF, TFIIH, and RNA polymerase II) at the genes of the budding yeast Saccharomyces. Within the assumption inherent in the system our modeling suggests that TBP occupancy at promoters is rather transient compared to other general factors, despite the importance of TBP in nucleating assembly of the preinitiation complex. PathCom is suitable for modeling any assembly/disassembly pathway, given that all the proteins (or species) come together to form a complex

    Implications of the Plastid Genome Sequence of Typha (Typhaceae, Poales) for Understanding Genome Evolution in Poaceae

    Get PDF
    Plastid genomes of the grasses (Poaceae) are unusual in their organization and rates of sequence evolution. There has been a recent surge in the availability of grass plastid genome sequences, but a comprehensive comparative analysis of genome evolution has not been performed that includes any related families in the Poales. We report on the plastid genome of Typha latifolia, the first non-grass Poales sequenced to date, and we present comparisons of genome organization and sequence evolution within Poales. Our results confirm that grass plastid genomes exhibit acceleration in both genomic rearrangements and nucleotide substitutions. Poaceae have multiple structural rearrangements, including three inversions, three genes losses (accD, ycf1, ycf2), intron losses in two genes (clpP, rpoC1), and expansion of the inverted repeat (IR) into both large and small single-copy regions. These rearrangements are restricted to the Poaceae, and IR expansion into the small single-copy region correlates with the phylogeny of the family. Comparisons of 73 protein-coding genes for 47 angiosperms including nine Poaceae genera confirm that the branch leading to Poaceae has significantly accelerated rates of change relative to other monocots and angiosperms. Furthermore, rates of sequence evolution within grasses are lower, indicating a deceleration during diversification of the family. Overall there is a strong correlation between accelerated rates of genomic rearrangements and nucleotide substitutions in Poaceae, a phenomenon that has been noted recently throughout angiosperms. The cause of the correlation is unknown, but faulty DNA repair has been suggested in other systems including bacterial and animal mitochondrial genomes

    Advantages of a Mechanistic Codon Substitution Model for Evolutionary Analysis of Protein-Coding Sequences

    Get PDF
    A mechanistic codon substitution model, in which each codon substitution rate is proportional to the product of a codon mutation rate and the average fixation probability depending on the type of amino acid replacement, has advantages over nucleotide, amino acid, and empirical codon substitution models in evolutionary analysis of protein-coding sequences. It can approximate a wide range of codon substitution processes. If no selection pressure on amino acids is taken into account, it will become equivalent to a nucleotide substitution model. If mutation rates are assumed not to depend on the codon type, then it will become essentially equivalent to an amino acid substitution model. Mutation at the nucleotide level and selection at the amino acid level can be separately evaluated.The present scheme for single nucleotide mutations is equivalent to the general time-reversible model, but multiple nucleotide changes in infinitesimal time are allowed. Selective constraints on the respective types of amino acid replacements are tailored to each gene in a linear function of a given estimate of selective constraints. Their good estimates are those calculated by maximizing the respective likelihoods of empirical amino acid or codon substitution frequency matrices. Akaike and Bayesian information criteria indicate that the present model performs far better than the other substitution models for all five phylogenetic trees of highly-divergent to highly-homologous sequences of chloroplast, mitochondrial, and nuclear genes. It is also shown that multiple nucleotide changes in infinitesimal time are significant in long branches, although they may be caused by compensatory substitutions or other mechanisms. The variation of selective constraint over sites fits the datasets significantly better than variable mutation rates, except for 10 slow-evolving nuclear genes of 10 mammals. An critical finding for phylogenetic analysis is that assuming variable mutation rates over sites lead to the overestimation of branch lengths

    Mutationism and the Dual Causation of Evolutionary Change

    Get PDF
    The rediscovery of Mendel's laws a century ago launched the science that William Bateson called "genetics," and led to a new view of evolution combining selection, particulate inheritance, and the newly characterized phenomenon of "mutation." This "mutationist" view clashed with the earlier view of Darwin, and the later "Modern Synthesis," by allowing discontinuity, and by recognizing mutation (or more properly, mutation-and-altered-development) as a source of creativity, direction, and initiative. By the mid-20th century, the opposing Modern Synthesis view was a prevailing orthodoxy: under its influence, "evolution" was redefined as "shifting gene frequencies," that is, the sorting out of pre-existing variation without new mutations; and the notion that mutation-and-altered-development can exert a predictable influence on the course of evolutionary change was seen as heretical. Nevertheless, mutationist ideas re-surfaced: the notion of mutational determinants of directionality emerged in molecular evolution by 1962, followed in the 1980s by an interest among evolutionary developmental biologists in a shaping or creative role of developmental propensities of variation, and more recently, a recognition by theoretical evolutionary geneticists of the importance of discontinuity and of new mutations in adaptive dynamics. The synthetic challenge presented by these innovations is to integrate mutation-and-altered-development into a new understanding of the dual causation of evolutionary change--a broader and more predictive understanding that already can lay claim to important empirical and theoretical results--and to develop a research program appropriately emphasizing the emergence of variation as a cause of propensities of evolutionary change

    Contrasting Patterns of Transposable Element Insertions in Drosophila Heat-Shock Promoters

    Get PDF
    The proximal promoter regions of heat-shock genes harbor a remarkable number of P transposable element (TE) insertions relative to both positive and negative control proximal promoter regions in natural populations of Drosophila melanogaster. We have screened the sequenced genomes of 12 species of Drosophila to test whether this pattern is unique to these populations. In the 12 species' genomes, transposable element insertions are no more abundant in promoter regions of single-copy heat-shock genes than in promoters with similar or dissimilar architecture. Also, insertions appear randomly distributed across the promoter region, whereas insertions clustered near the transcription start site in promoters of single-copy heat-shock genes in D. melanogaster natural populations. Hsp70 promoters exhibit more TE insertions per promoter than all other genesets in the 12 species, similarly to in natural populations of D. melanogaster. Insertions in the Hsp70 promoter region, however, cluster away from the transcription start site in the 12 species, but near it in natural populations of D. melanogaster. These results suggest that D. melanogaster heat-shock promoters are unique in terms of their interaction with transposable elements, and confirm that Hsp70 promoters are distinctive in TE insertions across Drosophila
    • …