1,993 research outputs found

    Variation in the Correlation of G + C Composition with Synonymous Codon Usage Bias among Bacteria

    Get PDF
    G + C composition at the third codon position (GC3) is widely reported to be correlated with synonymous codon usage bias. However, no quantitative attempt has been made to compare the extent of this correlation among different genomes. Here, we applied Shannon entropy from information theory to measure the degree of GC3 bias and that of synonymous codon usage bias of each gene. The strength of the correlation of GC3 with synonymous codon usage bias, quantified by a correlation coefficient, varied widely among bacterial genomes, ranging from −0.07 to 0.95. Previous analyses suggesting that the relationship between GC3 and synonymous codon usage bias is independent of species are thus inconsistent with the more detailed analyses obtained here for individual species

    Environmental shaping of codon usage and functional adaptation across microbial communities.

    Get PDF
    Microbial communities represent the largest portion of the Earth's biomass. Metagenomics projects use high-throughput sequencing to survey these communities and shed light on genetic capabilities that enable microbes to inhabit every corner of the biosphere. Metagenome studies are generally based on (i) classifying and ranking functions of identified genes; and (ii) estimating the phyletic distribution of constituent microbial species. To understand microbial communities at the systems level, it is necessary to extend these studies beyond the species' boundaries and capture higher levels of metabolic complexity. We evaluated 11 metagenome samples and demonstrated that microbes inhabiting the same ecological niche share common preferences for synonymous codons, regardless of their phylogeny. By exploring concepts of translational optimization through codon usage adaptation, we demonstrated that community-wide bias in codon usage can be used as a prediction tool for lifestyle-specific genes across the entire microbial community, effectively considering microbial communities as meta-genomes. These findings set up a 'functional metagenomics' platform for the identification of genes relevant for adaptations of entire microbial communities to environments. Our results provide valuable arguments in defining the concept of microbial species through the context of their interactions within the community

    Conspiracy in bacterial genomes

    Full text link
    The rank ordered distribution of the codon usage frequencies for 123 bacteriae is best fitted by a three parameters function that is the sum of a constant, an exponential and a linear term in the rank n. The parameters depend (two parabolically) from the total GC content. The rank ordered distribution of the amino acids is fitted by a straight line. The Shannon entropy computed over all the codons is well fitted by a parabola in the GC content, while the partial entropies computed over subsets of the codons show peculiar different behavior, exhibiting therefore a first conspiracy effect. Moreover the sum of the codon usage frequencies over particular sets, e.g. with C and A (respectively G and U) as i-th nucleotide, shows a clear linear dependence from the GC content, exhibiting another conspiracy effect.Comment: revised version: introduction and conclusion enhanced, references added, figures added, some tables remove

    The Mystery of Two Straight Lines in Bacterial Genome Statistics. Release 2007

    Full text link
    In special coordinates (codon position--specific nucleotide frequencies) bacterial genomes form two straight lines in 9-dimensional space: one line for eubacterial genomes, another for archaeal genomes. All the 348 distinct bacterial genomes available in Genbank in April 2007, belong to these lines with high accuracy. The main challenge now is to explain the observed high accuracy. The new phenomenon of complementary symmetry for codon position--specific nucleotide frequencies is observed. The results of analysis of several codon usage models are presented. We demonstrate that the mean--field approximation, which is also known as context--free, or complete independence model, or Segre variety, can serve as a reasonable approximation to the real codon usage. The first two principal components of codon usage correlate strongly with genomic G+C content and the optimal growth temperature respectively. The variation of codon usage along the third component is related to the curvature of the mean-field approximation. First three eigenvalues in codon usage PCA explain 59.1%, 7.8% and 4.7% of variation. The eubacterial and archaeal genomes codon usage is clearly distributed along two third order curves with genomic G+C content as a parameter.Comment: Significantly extended version with new data for all the 348 distinct bacterial genomes available in Genbank in April 200

    Measure of synonymous codon usage diversity among genes in bacteria

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In many bacteria, intragenomic diversity in synonymous codon usage among genes has been reported. However, no quantitative attempt has been made to compare the diversity levels among different genomes. Here, we introduce a mean dissimilarity-based index (<it>D</it>mean) for quantifying the level of diversity in synonymous codon usage among all genes within a genome.</p> <p>Results</p> <p>The application of <it>D</it>mean to 268 bacterial genomes shows that in bacteria with extremely biased genomic G+C compositions there is little diversity in synonymous codon usage among genes. Furthermore, our findings contradict previous reports. For example, a low level of diversity in codon usage among genes has been reported for <it>Helicobacter pylori</it>, but based on <it>D</it>mean, the diversity level of this species is higher than those of more than half of bacteria tested here. The discrepancies between our findings and previous reports are probably due to differences in the methods used for measuring codon usage diversity.</p> <p>Conclusion</p> <p>We recommend that <it>D</it>mean be used to measure the diversity level of codon usage among genes. This measure can be applied to other compositional features such as amino acid usage and dinucleotide relative abundance as a genomic signature.</p

    Estimating translational selection in Eukaryotic Genomes

    Get PDF
    Natural selection on codon usage is a pervasive force that acts on a large variety of prokaryotic and eukaryotic genomes. Despite this, obtaining reliable estimates of selection on codon usage has proved complicated, perhaps due to the fact that the selection coefficients involved are very small. In this work, a population genetics model is used to measure the strength of selected codon usage bias, S, in 10 eukaryotic genomes. It is shown that the strength of selection is closely linked to expression and that reliable estimates of selection coefficients can only be obtained for genes with very similar expression levels. We compare the strength of selected codon usage for orthologous genes across all 10 genomes classified according to expression categories. Fungi genomes present the largest S values (2.24–2.56), whereas multicellular invertebrate and plant genomes present more moderate values (0.61–1.91). The large mammalian genomes (human and mouse) show low S values (0.22–0.51) for the most highly expressed genes. This might not be evidence for selection in these organisms as the technique used here to estimate S does not properly account for nucleotide composition heterogeneity along such genomes. The relationship between estimated S values and empirical estimates of population size is presented here for the first time. It is shown, as theoretically expected, that population size has an important role in the operativity of translational selection

    Quantification of codon selection for comparative bacterial genomics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Statistics measuring codon selection seek to compare genes by their sensitivity to selection for translational efficiency, but existing statistics lack a model for testing the significance of differences between genes. Here, we introduce a new statistic for measuring codon selection, the Adaptive Codon Enrichment (ACE).</p> <p>Results</p> <p>This statistic represents codon usage bias in terms of a probabilistic distribution, quantifying the extent that preferred codons are over-represented in the gene of interest relative to the mean and variance that would result from stochastic sampling of codons. Expected codon frequencies are derived from the observed codon usage frequencies of a broad set of genes, such that they are likely to reflect nonselective, genome wide influences on codon usage (<it>e.g</it>. mutational biases). The relative adaptiveness of synonymous codons is deduced from the frequency of codon usage in a pre-selected set of genes relative to the expected frequency. The ACE can predict both transcript abundance during rapid growth and the rate of synonymous substitutions, with accuracy comparable to or greater than existing metrics. We further examine how the composition of reference gene sets affects the accuracy of the statistic, and suggest methods for selecting appropriate reference sets for any genome, including bacteriophages. Finally, we demonstrate that the ACE may naturally be extended to quantify the genome-wide influence of codon selection in a manner that is sensitive to a large fraction of codons in the genome. This reveals substantial variation among genomes, correlated with the tRNA gene number, even among groups of bacteria where previously proposed whole-genome measures show little variation.</p> <p>Conclusions</p> <p>The statistical framework of the ACE allows rigorous comparison of the level of codon selection acting on genes, both within a genome and between genomes.</p
    corecore