3,816 research outputs found
tRNAdb 2009: compilation of tRNA sequences and tRNA genes
One of the first specialized collections of nucleic acid sequences in life sciences was the ācompilation of tRNA sequences and sequences of tRNA genesā (http://www.trna.uni-bayreuth.de). Here, an updated and completely restructured version of this compilation is presented (http://trnadb.bioinf.uni-leipzig.de). The new database, tRNAdb, is hosted and maintained in cooperation between the universities of Leipzig, Marburg, and Strasbourg. Reimplemented as a relational database, tRNAdb will be updated periodically and is searchable in a highly flexible and user-friendly way. Currently, it contains more than 12 000 tRNA genes, classified into families according to amino acid specificity. Furthermore, the implementation of the NCBI taxonomy tree facilitates phylogeny-related queries. The database provides various services including graphical representations of tRNA secondary structures, a customizable output of aligned or un-aligned sequences with a variety of individual and combinable search criteria, as well as the construction of consensus sequences for any selected set of tRNAs
YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs.
Besides translation, transfer RNAs (tRNAs) play many non-canonical roles in various biological pathways and exhibit highly variable expression profiles. To unravel the emerging complexities of tRNA biology and molecular mechanisms underlying them, an efficient tRNA sequencing method is required. However, the rigid structure of tRNA has been presenting a challenge to the development of such methods. We report the development of Y-shaped Adapter-ligated MAture TRNA sequencing (YAMAT-seq), an efficient and convenient method for high-throughput sequencing of mature tRNAs. YAMAT-seq circumvents the issue of inefficient adapter ligation, a characteristic of conventional RNA sequencing methods for mature tRNAs, by employing the efficient and specific ligation of Y-shaped adapter to mature tRNAs using T4 RNA Ligase 2. Subsequent cDNA amplification and next-generation sequencing successfully yield numerous mature tRNA sequences. YAMAT-seq has high specificity for mature tRNAs and high sensitivity to detect most isoacceptors from minute amount of total RNA. Moreover, YAMAT-seq shows quantitative capability to estimate expression levels of mature tRNAs, and has high reproducibility and broad applicability for various cell lines. YAMAT-seq thus provides high-throughput technique for identifying tRNA profiles and their regulations in various transcriptomes, which could play important regulatory roles in translation and other biological processes
An RNA foldability metric; implications for the design of rapidly foldable RNA sequences
Evidence is presented suggesting, for the first time, that the protein
foldability metric sigma=(T_theta - T_f)/T_theta, where T_theta and T_f are,
respectively, the collapse and folding transition temperatures, could be used
also to measure the foldability of RNA sequences. The importance of sigma is
discussed in the context of the in silico design of rapidly foldable RNA
sequences.Comment: To appear in Biophysical Chemistr
tRNA functional signatures classify plastids as late-branching cyanobacteria.
BackgroundEukaryotes acquired the trait of oxygenic photosynthesis through endosymbiosis of the cyanobacterial progenitor of plastid organelles. Despite recent advances in the phylogenomics of Cyanobacteria, the phylogenetic root of plastids remains controversial. Although a single origin of plastids by endosymbiosis is broadly supported, recent phylogenomic studies are contradictory on whether plastids branch early or late within Cyanobacteria. One underlying cause may be poor fit of evolutionary models to complex phylogenomic data.ResultsUsing Posterior Predictive Analysis, we show that recently applied evolutionary models poorly fit three phylogenomic datasets curated from cyanobacteria and plastid genomes because of heterogeneities in both substitution processes across sites and of compositions across lineages. To circumvent these sources of bias, we developed CYANO-MLP, a machine learning algorithm that consistently and accurately phylogenetically classifies ("phyloclassifies") cyanobacterial genomes to their clade of origin based on bioinformatically predicted function-informative features in tRNA gene complements. Classification of cyanobacterial genomes with CYANO-MLP is accurate and robust to deletion of clades, unbalanced sampling, and compositional heterogeneity in input tRNA data. CYANO-MLP consistently classifies plastid genomes into a late-branching cyanobacterial sub-clade containing single-cell, starch-producing, nitrogen-fixing ecotypes, consistent with metabolic and gene transfer data.ConclusionsPhylogenomic data of cyanobacteria and plastids exhibit both site-process heterogeneities and compositional heterogeneities across lineages. These aspects of the data require careful modeling to avoid bias in phylogenomic estimation. Furthermore, we show that amino acid recoding strategies may be insufficient to mitigate bias from compositional heterogeneities. However, the combination of our novel tRNA-specific strategy with machine learning in CYANO-MLP appears robust to these sources of bias with high accuracy in phyloclassification of cyanobacterial genomes. CYANO-MLP consistently classifies plastids as late-branching Cyanobacteria, consistent with independent evidence from signature-based approaches and some previous phylogenetic studies
tRNA signatures reveal polyphyletic origins of streamlined SAR11 genomes among the alphaproteobacteria
Phylogenomic analyses are subject to bias from compositional convergence and
noise from horizontal gene transfer (HGT). Compositional convergence is a
likely cause of controversy regarding phylogeny of the SAR11 group of
Alphaproteobacteria that have extremely streamlined, A+T-biased genomes. While
careful modeling can reduce artifacts caused by convergence, the most
consistent and robust phylogenetic signal in genomes may lie distributed among
encoded functional features that govern macromolecular interactions. Here we
develop a novel phyloclassification method based on signatures derived from
bioinformatically defined tRNA Class-Informative Features (CIFs). tRNA CIFs are
enriched for features that underlie tRNA-protein interactions. Using a simple
tRNA-CIF-based phyloclassifier, we obtained results consistent with those of
bias-corrected whole proteome phylogenomic studies, rejecting monophyly of
SAR11 and affiliating most strains with Rhizobiales with strong statistical
support. Yet SAR11 and Rickettsiales tRNA genes share distinct patterns of
A+T-richness, as expected from their elevated genomic A+T compositions. Using
conventional supermatrix methods on total tRNA sequence data, we could recover
the artifactual result of a monophyletic SAR11 grouping with Rickettsiales.
Thus tRNA CIF-based phyloclassification is more robust to base content
convergence than supermatrix phylogenomics on whole tRNA sequences. Also, given
the notoriously promiscuous HGT of aminoacyl-tRNA synthetases, tRNA CIF-based
phyloclassification may be relatively robust to HGT of network components. We
describe how unique features of tRNA-protein interaction networks facilitate
the mining of traits governing macromolecular interactions from genomic data,
and discuss why interaction-governing traits may be especially useful to solve
difficult problems in microbial classification and phylogeny
Physical Complexity of Symbolic Sequences
A practical measure for the complexity of sequences of symbols (``strings'')
is introduced that is rooted in automata theory but avoids the problems of
Kolmogorov-Chaitin complexity. This physical complexity can be estimated for
ensembles of sequences, for which it reverts to the difference between the
maximal entropy of the ensemble and the actual entropy given the specific
environment within which the sequence is to be interpreted. Thus, the physical
complexity measures the amount of information about the environment that is
coded in the sequence, and is conditional on such an environment. In practice,
an estimate of the complexity of a string can be obtained by counting the
number of loci per string that are fixed in the ensemble, while the volatile
positions represent, again with respect to the environment, randomness. We
apply this measure to tRNA sequence data.Comment: 12 pages LaTeX2e, 3 postscript figures, uses elsart.cls.
Substantially improved and clarified version, includes application to EMBL
tRNA sequence dat
Recommended from our members
Transfer RNA genes experience exceptionally elevated mutation rates.
Transfer RNAs (tRNAs) are a central component for the biological synthesis of proteins, and they are among the most highly conserved and frequently transcribed genes in all living things. Despite their clear significance for fundamental cellular processes, the forces governing tRNA evolution are poorly understood. We present evidence that transcription-associated mutagenesis and strong purifying selection are key determinants of patterns of sequence variation within and surrounding tRNA genes in humans and diverse model organisms. Remarkably, the mutation rate at broadly expressed cytosolic tRNA loci is likely between 7 and 10 times greater than the nuclear genome average. Furthermore, evolutionary analyses provide strong evidence that tRNA genes, but not their flanking sequences, experience strong purifying selection acting against this elevated mutation rate. We also find a strong correlation between tRNA expression levels and the mutation rates in their immediate flanking regions, suggesting a simple method for estimating individual tRNA gene activity. Collectively, this study illuminates the extreme competing forces in tRNA gene evolution and indicates that mutations at tRNA loci contribute disproportionately to mutational load and have unexplored fitness consequences in human populations
- ā¦