56 research outputs found
The Mystery of Two Straight Lines in Bacterial Genome Statistics. Release 2007
In special coordinates (codon position--specific nucleotide frequencies)
bacterial genomes form two straight lines in 9-dimensional space: one line for
eubacterial genomes, another for archaeal genomes. All the 348 distinct
bacterial genomes available in Genbank in April 2007, belong to these lines
with high accuracy. The main challenge now is to explain the observed high
accuracy. The new phenomenon of complementary symmetry for codon
position--specific nucleotide frequencies is observed. The results of analysis
of several codon usage models are presented. We demonstrate that the
mean--field approximation, which is also known as context--free, or complete
independence model, or Segre variety, can serve as a reasonable approximation
to the real codon usage. The first two principal components of codon usage
correlate strongly with genomic G+C content and the optimal growth temperature
respectively. The variation of codon usage along the third component is related
to the curvature of the mean-field approximation. First three eigenvalues in
codon usage PCA explain 59.1%, 7.8% and 4.7% of variation. The eubacterial and
archaeal genomes codon usage is clearly distributed along two third order
curves with genomic G+C content as a parameter.Comment: Significantly extended version with new data for all the 348 distinct
bacterial genomes available in Genbank in April 200
- …