522 research outputs found
Initiator tRNA genes template the 3\u27 CCA end at high frequencies in bacteria.
BACKGROUND: While the CCA sequence at the mature 3\u27 end of tRNAs is conserved and critical for translational function, a genetic template for this sequence is not always contained in tRNA genes. In eukaryotes and Archaea, the CCA ends of tRNAs are synthesized post-transcriptionally by CCA-adding enzymes. In Bacteria, tRNA genes template CCA sporadically.
RESULTS: In order to understand the variation in how prokaryotic tRNA genes template CCA, we re-annotated tRNA genes in tRNAdb-CE database version 0.8. Among 132,129 prokaryotic tRNA genes, initiator tRNA genes template CCA at the highest average frequency (74.1%) over all functional classes except selenocysteine and pyrrolysine tRNA genes (88.1% and 100% respectively). Across bacterial phyla and a wide range of genome sizes, many lineages exist in which predominantly initiator tRNA genes template CCA. Convergent and parallel retention of CCA templating in initiator tRNA genes evolved in independent histories of reductive genome evolution in Bacteria. Also, in a majority of cyanobacterial and actinobacterial genera, predominantly initiator tRNA genes template CCA. We also found that a surprising fraction of archaeal tRNA genes template CCA.
CONCLUSIONS: We suggest that cotranscriptional synthesis of initiator tRNA CCA 3\u27 ends can complement inefficient processing of initiator tRNA precursors, bootstrap rapid initiation of protein synthesis from a non-growing state, or contribute to an increase in cellular growth rates by reducing overheads of mass and energy to maintain nonfunctional tRNA precursor pools. More generally, CCA templating in structurally non-conforming tRNA genes can afford cells robustness and greater plasticity to respond rapidly to environmental changes and stimuli
Initiator tRNA genes template the 3' CCA end at high frequencies in bacteria.
BackgroundWhile the CCA sequence at the mature 3' end of tRNAs is conserved and critical for translational function, a genetic template for this sequence is not always contained in tRNA genes. In eukaryotes and Archaea, the CCA ends of tRNAs are synthesized post-transcriptionally by CCA-adding enzymes. In Bacteria, tRNA genes template CCA sporadically.ResultsIn order to understand the variation in how prokaryotic tRNA genes template CCA, we re-annotated tRNA genes in tRNAdb-CE database version 0.8. Among 132,129 prokaryotic tRNA genes, initiator tRNA genes template CCA at the highest average frequency (74.1%) over all functional classes except selenocysteine and pyrrolysine tRNA genes (88.1% and 100% respectively). Across bacterial phyla and a wide range of genome sizes, many lineages exist in which predominantly initiator tRNA genes template CCA. Convergent and parallel retention of CCA templating in initiator tRNA genes evolved in independent histories of reductive genome evolution in Bacteria. Also, in a majority of cyanobacterial and actinobacterial genera, predominantly initiator tRNA genes template CCA. We also found that a surprising fraction of archaeal tRNA genes template CCA.ConclusionsWe suggest that cotranscriptional synthesis of initiator tRNA CCA 3' ends can complement inefficient processing of initiator tRNA precursors, "bootstrap" rapid initiation of protein synthesis from a non-growing state, or contribute to an increase in cellular growth rates by reducing overheads of mass and energy to maintain nonfunctional tRNA precursor pools. More generally, CCA templating in structurally non-conforming tRNA genes can afford cells robustness and greater plasticity to respond rapidly to environmental changes and stimuli
tRNA signatures reveal polyphyletic origins of streamlined SAR11 genomes among the alphaproteobacteria
Phylogenomic analyses are subject to bias from compositional convergence and
noise from horizontal gene transfer (HGT). Compositional convergence is a
likely cause of controversy regarding phylogeny of the SAR11 group of
Alphaproteobacteria that have extremely streamlined, A+T-biased genomes. While
careful modeling can reduce artifacts caused by convergence, the most
consistent and robust phylogenetic signal in genomes may lie distributed among
encoded functional features that govern macromolecular interactions. Here we
develop a novel phyloclassification method based on signatures derived from
bioinformatically defined tRNA Class-Informative Features (CIFs). tRNA CIFs are
enriched for features that underlie tRNA-protein interactions. Using a simple
tRNA-CIF-based phyloclassifier, we obtained results consistent with those of
bias-corrected whole proteome phylogenomic studies, rejecting monophyly of
SAR11 and affiliating most strains with Rhizobiales with strong statistical
support. Yet SAR11 and Rickettsiales tRNA genes share distinct patterns of
A+T-richness, as expected from their elevated genomic A+T compositions. Using
conventional supermatrix methods on total tRNA sequence data, we could recover
the artifactual result of a monophyletic SAR11 grouping with Rickettsiales.
Thus tRNA CIF-based phyloclassification is more robust to base content
convergence than supermatrix phylogenomics on whole tRNA sequences. Also, given
the notoriously promiscuous HGT of aminoacyl-tRNA synthetases, tRNA CIF-based
phyloclassification may be relatively robust to HGT of network components. We
describe how unique features of tRNA-protein interaction networks facilitate
the mining of traits governing macromolecular interactions from genomic data,
and discuss why interaction-governing traits may be especially useful to solve
difficult problems in microbial classification and phylogeny
tRNA functional signatures classify plastids as late-branching cyanobacteria.
BackgroundEukaryotes acquired the trait of oxygenic photosynthesis through endosymbiosis of the cyanobacterial progenitor of plastid organelles. Despite recent advances in the phylogenomics of Cyanobacteria, the phylogenetic root of plastids remains controversial. Although a single origin of plastids by endosymbiosis is broadly supported, recent phylogenomic studies are contradictory on whether plastids branch early or late within Cyanobacteria. One underlying cause may be poor fit of evolutionary models to complex phylogenomic data.ResultsUsing Posterior Predictive Analysis, we show that recently applied evolutionary models poorly fit three phylogenomic datasets curated from cyanobacteria and plastid genomes because of heterogeneities in both substitution processes across sites and of compositions across lineages. To circumvent these sources of bias, we developed CYANO-MLP, a machine learning algorithm that consistently and accurately phylogenetically classifies ("phyloclassifies") cyanobacterial genomes to their clade of origin based on bioinformatically predicted function-informative features in tRNA gene complements. Classification of cyanobacterial genomes with CYANO-MLP is accurate and robust to deletion of clades, unbalanced sampling, and compositional heterogeneity in input tRNA data. CYANO-MLP consistently classifies plastid genomes into a late-branching cyanobacterial sub-clade containing single-cell, starch-producing, nitrogen-fixing ecotypes, consistent with metabolic and gene transfer data.ConclusionsPhylogenomic data of cyanobacteria and plastids exhibit both site-process heterogeneities and compositional heterogeneities across lineages. These aspects of the data require careful modeling to avoid bias in phylogenomic estimation. Furthermore, we show that amino acid recoding strategies may be insufficient to mitigate bias from compositional heterogeneities. However, the combination of our novel tRNA-specific strategy with machine learning in CYANO-MLP appears robust to these sources of bias with high accuracy in phyloclassification of cyanobacterial genomes. CYANO-MLP consistently classifies plastids as late-branching Cyanobacteria, consistent with independent evidence from signature-based approaches and some previous phylogenetic studies
FAST: FAST Analysis of Sequences Toolbox.
FAST (FAST Analysis of Sequences Toolbox) provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU's Not Unix) Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R, and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics make FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format). Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought
TFAM detects co-evolution of tRNA identity rules with lateral transfer of histidyl-tRNA synthetase
We present TFAM, an automated, statistical method to classify the identity of tRNAs. TFAM, currently optimized for bacteria, classifies initiator tRNAs and predicts the charging identity of both typical and atypical tRNAs such as suppressors with high confidence. We show statistical evidence for extensive variation in tRNA identity determinants among bacterial genomes due to variation in overall tDNA base content. With TFAM we have detected the first case of eukaryotic-like tRNA identity rules in bacteria. An α-proteobacterial clade encompassing Rhizobiales, Caulobacter crescentus and Silicibacter pomeroyi, unlike a sister clade containing the Rickettsiales, Zymomonas mobilis and Gluconobacter oxydans, uses the eukaryotic identity element A73 instead of the highly conserved prokaryotic element C73. We confirm divergence of bacterial histidylation rules by demonstrating perfect covariation of α-proteobacterial tRNA(His) acceptor stems and residues in the motif IIb tRNA-binding pocket of their histidyl-tRNA synthetases (HisRS). Phylogenomic analysis supports lateral transfer of a eukaryotic-like HisRS into the α-proteobacteria followed by in situ adaptation of the bacterial tDNA(His) and identity rule divergence. Our results demonstrate that TFAM is an effective tool for the bioinformatics, comparative genomics and evolutionary study of tRNA identity
An iterative strategy combining biophysical criteria and duration hidden Markov) models for structural predictions of Chlamydia trachomatis s66 promoters
Background: Promoter identification is a first step in the quest to explain gene regulation in bacteria. It has been demonstrated that the initiation of bacterial transcription depends upon the stability and topology of DNA in the promoter region as well as the binding affinity between the RNA polymerase σ-factor and promoter. However, promoter prediction algorithms to date have not explicitly used an ensemble of these factors as predictors. In addition, most promoter models have been trained on data from Escherichia coli. Although it has been shown that transcriptional mechanisms are similar among various bacteria, it is quite possible that the differences between Escherichia coli and Chlamydia trachomatis are large enough to recommend an organism-specific modeling effort.
Results: Here we present an iterative stochastic model building procedure that combines such biophysical metrics as DNA stability, curvature, twist and stress-induced DNA duplex destabilization along with duration hidden Markov model parameters to model Chlamydia trachomatis σ66 promoters from 29 experimentally verified sequences. Initially, iterative duration hidden Markov modeling of the training set sequences provides a scoring algorithm for Chlamydia trachomatis RNA polymerase σ66/DNA binding. Subsequently, an iterative application of Stepwise Binary Logistic Regression selects multiple promoter predictors and deletes/replaces training set sequences to determine an optimal training set. The resulting model predicts the final training set with a high degree of accuracy and provides insights into the structure of the promoter region. Model based genome-wide predictions are provided so that optimal promoter candidates can be experimentally evaluated, and refined models developed. Co-predictions with three other algorithms are also supplied to enhance reliability.
Conclusion: This strategy and resulting model support the conjecture that DNA biophysical properties, along with RNA polymerase σ-factor/DNA binding collaboratively, contribute to a sequence\u27s ability to promote transcription. This work provides a baseline model that can evolve as new Chlamydia trachomatis σ66 promoters are identified with assistance from the provided genome-wide predictions. The proposed methodology is ideal for organisms with few identified promoters and relatively small genomes
An iterative strategy combining biophysical criteria and duration hidden Markov) models for structural predictions of Chlamydia trachomatis s66 promoters
Background: Promoter identification is a first step in the quest to explain gene regulation in bacteria. It has been demonstrated that the initiation of bacterial transcription depends upon the stability and topology of DNA in the promoter region as well as the binding affinity between the RNA polymerase σ-factor and promoter. However, promoter prediction algorithms to date have not explicitly used an ensemble of these factors as predictors. In addition, most promoter models have been trained on data from Escherichia coli. Although it has been shown that transcriptional mechanisms are similar among various bacteria, it is quite possible that the differences between Escherichia coli and Chlamydia trachomatis are large enough to recommend an organism-specific modeling effort.
Results: Here we present an iterative stochastic model building procedure that combines such biophysical metrics as DNA stability, curvature, twist and stress-induced DNA duplex destabilization along with duration hidden Markov model parameters to model Chlamydia trachomatis σ66 promoters from 29 experimentally verified sequences. Initially, iterative duration hidden Markov modeling of the training set sequences provides a scoring algorithm for Chlamydia trachomatis RNA polymerase σ66/DNA binding. Subsequently, an iterative application of Stepwise Binary Logistic Regression selects multiple promoter predictors and deletes/replaces training set sequences to determine an optimal training set. The resulting model predicts the final training set with a high degree of accuracy and provides insights into the structure of the promoter region. Model based genome-wide predictions are provided so that optimal promoter candidates can be experimentally evaluated, and refined models developed. Co-predictions with three other algorithms are also supplied to enhance reliability.
Conclusion: This strategy and resulting model support the conjecture that DNA biophysical properties, along with RNA polymerase σ-factor/DNA binding collaboratively, contribute to a sequence\u27s ability to promote transcription. This work provides a baseline model that can evolve as new Chlamydia trachomatis σ66 promoters are identified with assistance from the provided genome-wide predictions. The proposed methodology is ideal for organisms with few identified promoters and relatively small genomes
The Genomic Pattern of tDNA Operon Expression in E. coli
In fast-growing microorganisms, a tRNA concentration profile enriched in major isoacceptors selects for the biased usage of cognate codons. This optimizes translational rate for the least mass invested in the translational apparatus. Such translational streamlining is thought to be growth-regulated, but its genetic basis is poorly understood. First, we found in reanalysis of the E. coli tRNA profile that the degree to which it is translationally streamlined is nearly invariant with growth rate. Then, using least squares multiple regression, we partitioned tRNA isoacceptor pools to predicted tDNA operons from the E. coli K12 genome. Co-expression of tDNAs in operons explains the tRNA profile significantly better than tDNA gene dosage alone. Also, operon expression increases significantly with proximity to the origin of replication, oriC, at all growth rates. Genome location explains about 15% of expression variation in a form, at a given growth rate, that is consistent with replication-dependent gene concentration effects. Yet the change in the tRNA profile with growth rate is less than would be expected from such effects. We estimated per-copy expression rates for all tDNA operons that were consistent with independent estimates for rDNA operons. We also found that tDNA operon location, and the location dependence of expression, were significantly different in the leading and lagging strands. The operonic organization and genomic location of tDNA operons are significant factors influencing their expression. Nonrandom patterns of location and strandedness shown by tDNA operons in E. coli suggest that their genomic architecture may be under selection to satisfy physiological demand for tRNA expression at high growth rates
- …