Quantitative relationship between synonymous codon usage bias and GC composition across unicellular genomes

Abstract

BACKGROUND: Codon usage bias has been widely reported to correlate with GC composition. However, the quantitative relationship between codon usage bias and GC composition across species has not been reported. RESULTS: Based on an informatics method (SCUO) we developed previously using Shannon informational theory and maximum entropy theory, we investigated the quantitative relationship between codon usage bias and GC composition. The regression based on 70 bacterial and 16 archaeal genomes showed that in bacteria, SCUO = -2.06 * GC3 + 2.05*(GC3)(2 )+ 0.65, r = 0.91, and that in archaea, SCUO = -1.79 * GC3 + 1.85*(GC3)(2 )+ 0.56, r = 0.89. We developed an analytical model to quantify synonymous codon usage bias by GC compositions based on SCUO. The parameters within this model were inferred by inspecting the relationship between codon usage bias and GC composition across 70 bacterial and 16 archaeal genomes. We further simplified this relationship using only GC3. This simple model was supported by computational simulation. CONCLUSIONS: The synonymous codon usage bias could be simply expressed as 1+ (p/2)log(2)(p/2) + ((1-p)/2)log(2)((l-p)/2), where p = GC3. The software we developed for measuring SCUO (codonO) is available at

    Similar works