21 research outputs found

    Investigations of Oligonucleotide Usage Variance Within and Between Prokaryotes

    Get PDF
    Oligonucleotide usage in archaeal and bacterial genomes can be linked to a number of properties, including codon usage (trinucleotides), DNA base-stacking energy (dinucleotides), and DNA structural conformation (di- to tetranucleotides). We wanted to assess the statistical information potential of different DNA ‘word-sizes’ and explore how oligonucleotide frequencies differ in coding and non-coding regions. In addition, we used oligonucleotide frequencies to investigate DNA composition and how DNA sequence patterns change within and between prokaryotic organisms. Among the results found was that prokaryotic chromosomes can be described by hexanucleotide frequencies, suggesting that prokaryotic DNA is predominantly short range correlated, i.e., information in prokaryotic genomes is encoded in short oligonucleotides. Oligonucleotide usage varied more within AT-rich and host-associated genomes than in GC-rich and free-living genomes, and this variation was mainly located in non-coding regions. Bias (selectional pressure) in tetranucleotide usage correlated with GC content, and coding regions were more biased than non-coding regions. Non-coding regions were also found to be approximately 5.5% more AT-rich than coding regions, on average, in the 402 chromosomes examined. Pronounced DNA compositional differences were found both within and between AT-rich and GC-rich genomes. GC-rich genomes were more similar and biased in terms of tetranucleotide usage in non-coding regions than AT-rich genomes. The differences found between AT-rich and GC-rich genomes may possibly be attributed to lifestyle, since tetranucleotide usage within host-associated bacteria was, on average, more dissimilar and less biased than free-living archaea and bacteria

    Analysis of intra-genomic GC content homogeneity within prokaryotes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Bacterial genomes possess varying GC content (total guanines (Gs) and cytosines (Cs) per total of the four bases within the genome) but within a given genome, GC content can vary locally along the chromosome, with some regions significantly more or less GC rich than on average. We have examined how the GC content varies within microbial genomes to assess whether this property can be associated with certain biological functions related to the organism's environment and phylogeny. We utilize a new quantity <it>GCVAR</it>, the intra-genomic GC content variability with respect to the average GC content of the total genome. A low <it>GCVAR </it>indicates intra-genomic GC homogeneity and high <it>GCVAR </it>heterogeneity.</p> <p>Results</p> <p>The regression analyses indicated that <it>GCVAR </it>was significantly associated with domain (i.e. archaea or bacteria), phylum, and oxygen requirement. <it>GCVAR </it>was significantly higher among anaerobes than both aerobic and facultative microbes. Although an association has previously been found between mean genomic GC content and oxygen requirement, our analysis suggests that no such association exits when phylogenetic bias is accounted for. A significant association between <it>GCVAR </it>and mean GC content was also found but appears to be non-linear and varies greatly among phyla.</p> <p>Conclusions</p> <p>Our findings show that <it>GCVAR </it>is linked with oxygen requirement, while mean genomic GC content is not. We therefore suggest that <it>GCVAR </it>should be used as a complement to mean GC content.</p

    Abundant Oligonucleotides Common to Most Bacteria

    Get PDF
    BACKGROUND: Bacteria show a bias in their genomic oligonucleotide composition far beyond that dictated by G+C content. Patterns of over- and underrepresented oligonucleotides carry a phylogenetic signal and are thus diagnostic for individual species. Patterns of short oligomers have been investigated by multiple groups in large numbers of bacteria genomes. However, global distributions of the most highly overrepresented mid-sized oligomers have not been assessed across all prokaryotes to date. We surveyed overrepresented mid-length oligomers across all prokaryotes and normalised for base composition and embedded oligomers using zero and second order Markov models. PRINCIPAL FINDINGS: Here we report a presumably ancient set of oligomers conserved and overrepresented in nearly all branches of prokaryotic life, including Archaea. These oligomers are either adenine rich homopurines with one to three guanine nucleosides, or homopyridimines with one to four cytosine nucleosides. They do not show a consistent preference for coding or non-coding regions or aggregate in any coding frame, implying a role in DNA structure and as polypeptide binding sites. Structural parameters indicate these oligonucleotides to be an extreme and rigid form of B-DNA prone to forming triple stranded helices under common physiological conditions. Moreover, the narrow minor grooves of these structures are recognised by DNA binding and nucleoid associated proteins such as HU. CONCLUSION: Homopurine and homopyrimidine oligomers exhibit distinct and unusual structural features and are present at high copy number in nearly all prokaryotic lineages. This fact suggests a non-neutral role of these oligonucleotides for bacterial genome organization that has been maintained throughout evolution

    Amino Acid Usage Is Asymmetrically Biased in AT- and GC-Rich Microbial Genomes.

    Get PDF
    INTRODUCTION: Genomic base composition ranges from less than 25% AT to more than 85% AT in prokaryotes. Since only a small fraction of prokaryotic genomes is not protein coding even a minor change in genomic base composition will induce profound protein changes. We examined how amino acid and codon frequencies were distributed in over 2000 microbial genomes and how these distributions were affected by base compositional changes. In addition, we wanted to know how genome-wide amino acid usage was biased in the different genomes and how changes to base composition and mutations affected this bias. To carry this out, we used a Generalized Additive Mixed-effects Model (GAMM) to explore non-linear associations and strong data dependences in closely related microbes; principal component analysis (PCA) was used to examine genomic amino acid- and codon frequencies, while the concept of relative entropy was used to analyze genomic mutation rates. RESULTS: We found that genomic amino acid frequencies carried a stronger phylogenetic signal than codon frequencies, but that this signal was weak compared to that of genomic %AT. Further, in contrast to codon usage bias (CUB), amino acid usage bias (AAUB) was differently distributed in AT- and GC-rich genomes in the sense that AT-rich genomes did not prefer specific amino acids over others to the same extent as GC-rich genomes. AAUB was also associated with relative entropy; genomes with low AAUB contained more random mutations as a consequence of relaxed purifying selection than genomes with higher AAUB. CONCLUSION: Genomic base composition has a substantial effect on both amino acid- and codon frequencies in bacterial genomes. While phylogeny influenced amino acid usage more in GC-rich genomes, AT-content was driving amino acid usage in AT-rich genomes. We found the GAMM model to be an excellent tool to analyze the genomic data used in this study

    Relative entropy differences in bacterial chromosomes, plasmids, phages and genomic islands

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We sought to assess whether the concept of relative entropy (information capacity), could aid our understanding of the process of horizontal gene transfer in microbes. We analyzed the differences in information capacity between prokaryotic chromosomes, genomic islands (GI), phages, and plasmids. Relative entropy was estimated using the Kullback-Leibler measure.</p> <p>Results</p> <p>Relative entropy was highest in bacterial chromosomes and had the sequence chromosomes > GI > phage > plasmid. There was an association between relative entropy and AT content in chromosomes, phages, plasmids and GIs with the strongest association being in phages. Relative entropy was also found to be lower in the obligate intracellular <it>Mycobacterium leprae </it>than in the related <it>M. tuberculosis </it>when measured on a shared set of highly conserved genes.</p> <p>Conclusions</p> <p>We argue that relative entropy differences reflect how plasmids, phages and GIs interact with microbial host chromosomes and that all these biological entities are, or have been, subjected to different selective pressures. The rate at which amelioration of horizontally acquired DNA occurs within the chromosome is likely to account for the small differences between chromosomes and stably incorporated GIs compared to the transient or independent replicons such as phages and plasmids.</p

    Serratia symbiotica from the Aphid Cinara cedri: A Missing Link from Facultative to Obligate Insect Endosymbiont

    Get PDF
    The genome sequencing of Buchnera aphidicola BCc from the aphid Cinara cedri, which is the smallest known Buchnera genome, revealed that this bacterium had lost its symbiotic role, as it was not able to synthesize tryptophan and riboflavin. Moreover, the biosynthesis of tryptophan is shared with the endosymbiont Serratia symbiotica SCc, which coexists with B. aphidicola in this aphid. The whole-genome sequencing of S. symbiotica SCc reveals an endosymbiont in a stage of genome reduction that is closer to an obligate endosymbiont, such as B. aphidicola from Acyrthosiphon pisum, than to another S. symbiotica, which is a facultative endosymbiont in this aphid, and presents much less gene decay. The comparison between both S. symbiotica enables us to propose an evolutionary scenario of the transition from facultative to obligate endosymbiont. Metabolic inferences of B. aphidicola BCc and S. symbiotica SCc reveal that most of the functions carried out by B. aphidicola in A. pisum are now either conserved in B. aphidicola BCc or taken over by S. symbiotica. In addition, there are several cases of metabolic complementation giving functional stability to the whole consortium and evolutionary preservation of the actors involved

    Examination of Genome Homogeneity in Prokaryotes Using Genomic Signatures

    Get PDF
    BACKGROUND:DNA word frequencies, normalized for genomic AT content, are remarkably stable within prokaryotic genomes and are therefore said to reflect a "genomic signature." The genomic signatures can be used to phylogenetically classify organisms from arbitrary sampled DNA. Genomic signatures can also be used to search for horizontally transferred DNA or DNA regions subjected to special selection forces. Thus, the stability of the genomic signature can be used as a measure of genomic homogeneity. The factors associated with the stability of the genomic signatures are not known, and this motivated us to investigate further. We analyzed the intra-genomic variance of genomic signatures based on AT content normalization (0(th) order Markov model) as well as genomic signatures normalized by smaller DNA words (1(st) and 2(nd) order Markov models) for 636 sequenced prokaryotic genomes. Regression models were fitted, with intra-genomic signature variance as the response variable, to a set of factors representing genomic properties such as genomic AT content, genome size, habitat, phylum, oxygen requirement, optimal growth temperature and oligonucleotide usage variance (OUV, a measure of oligonucleotide usage bias), measured as the variance between genomic tetranucleotide frequencies and Markov chain approximated tetranucleotide frequencies, as predictors. PRINCIPAL FINDINGS:Regression analysis revealed that OUV was the most important factor (p<0.001) determining intra-genomic homogeneity as measured using genomic signatures. This means that the less random the oligonucleotide usage is in the sense of higher OUV, the more homogeneous the genome is in terms of the genomic signature. The other factors influencing variance in the genomic signature (p<0.001) were genomic AT content, phylum and oxygen requirement. CONCLUSIONS:Genomic homogeneity in prokaryotes is intimately linked to genomic GC content, oligonucleotide usage bias (OUV) and aerobiosis, while oligonucleotide usage bias (OUV) is associated with genomic GC content, aerobiosis and habitat

    Clustering metagenomic sequences with interpolated Markov models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Sequencing of environmental DNA (often called metagenomics) has shown tremendous potential to uncover the vast number of unknown microbes that cannot be cultured and sequenced by traditional methods. Because the output from metagenomic sequencing is a large set of reads of unknown origin, clustering reads together that were sequenced from the same species is a crucial analysis step. Many effective approaches to this task rely on sequenced genomes in public databases, but these genomes are a highly biased sample that is not necessarily representative of environments interesting to many metagenomics projects.</p> <p>Results</p> <p>We present S<smcaps>CIMM</smcaps> (Sequence Clustering with Interpolated Markov Models), an unsupervised sequence clustering method. S<smcaps>CIMM</smcaps> achieves greater clustering accuracy than previous unsupervised approaches. We examine the limitations of unsupervised learning on complex datasets, and suggest a hybrid of S<smcaps>CIMM</smcaps> and supervised learning method Phymm called P<smcaps>HY</smcaps>S<smcaps>CIMM</smcaps> that performs better when evolutionarily close training genomes are available.</p> <p>Conclusions</p> <p>S<smcaps>CIMM</smcaps> and P<smcaps>HY</smcaps>S<smcaps>CIMM</smcaps> are highly accurate methods to cluster metagenomic sequences. S<smcaps>CIMM</smcaps> operates entirely unsupervised, making it ideal for environments containing mostly novel microbes. P<smcaps>HY</smcaps>S<smcaps>CIMM</smcaps> uses supervised learning to improve clustering in environments containing microbial strains from well-characterized genera. S<smcaps>CIMM</smcaps> and P<smcaps>HY</smcaps>S<smcaps>CIMM</smcaps> are available open source from <url>http://www.cbcb.umd.edu/software/scimm</url>.</p
    corecore