3,816 research outputs found

    tRNAdb 2009: compilation of tRNA sequences and tRNA genes

    Get PDF
    One of the first specialized collections of nucleic acid sequences in life sciences was the ā€˜compilation of tRNA sequences and sequences of tRNA genesā€™ (http://www.trna.uni-bayreuth.de). Here, an updated and completely restructured version of this compilation is presented (http://trnadb.bioinf.uni-leipzig.de). The new database, tRNAdb, is hosted and maintained in cooperation between the universities of Leipzig, Marburg, and Strasbourg. Reimplemented as a relational database, tRNAdb will be updated periodically and is searchable in a highly flexible and user-friendly way. Currently, it contains more than 12 000 tRNA genes, classified into families according to amino acid specificity. Furthermore, the implementation of the NCBI taxonomy tree facilitates phylogeny-related queries. The database provides various services including graphical representations of tRNA secondary structures, a customizable output of aligned or un-aligned sequences with a variety of individual and combinable search criteria, as well as the construction of consensus sequences for any selected set of tRNAs

    YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs.

    Get PDF
    Besides translation, transfer RNAs (tRNAs) play many non-canonical roles in various biological pathways and exhibit highly variable expression profiles. To unravel the emerging complexities of tRNA biology and molecular mechanisms underlying them, an efficient tRNA sequencing method is required. However, the rigid structure of tRNA has been presenting a challenge to the development of such methods. We report the development of Y-shaped Adapter-ligated MAture TRNA sequencing (YAMAT-seq), an efficient and convenient method for high-throughput sequencing of mature tRNAs. YAMAT-seq circumvents the issue of inefficient adapter ligation, a characteristic of conventional RNA sequencing methods for mature tRNAs, by employing the efficient and specific ligation of Y-shaped adapter to mature tRNAs using T4 RNA Ligase 2. Subsequent cDNA amplification and next-generation sequencing successfully yield numerous mature tRNA sequences. YAMAT-seq has high specificity for mature tRNAs and high sensitivity to detect most isoacceptors from minute amount of total RNA. Moreover, YAMAT-seq shows quantitative capability to estimate expression levels of mature tRNAs, and has high reproducibility and broad applicability for various cell lines. YAMAT-seq thus provides high-throughput technique for identifying tRNA profiles and their regulations in various transcriptomes, which could play important regulatory roles in translation and other biological processes

    An RNA foldability metric; implications for the design of rapidly foldable RNA sequences

    Full text link
    Evidence is presented suggesting, for the first time, that the protein foldability metric sigma=(T_theta - T_f)/T_theta, where T_theta and T_f are, respectively, the collapse and folding transition temperatures, could be used also to measure the foldability of RNA sequences. The importance of sigma is discussed in the context of the in silico design of rapidly foldable RNA sequences.Comment: To appear in Biophysical Chemistr

    tRNA functional signatures classify plastids as late-branching cyanobacteria.

    Get PDF
    BackgroundEukaryotes acquired the trait of oxygenic photosynthesis through endosymbiosis of the cyanobacterial progenitor of plastid organelles. Despite recent advances in the phylogenomics of Cyanobacteria, the phylogenetic root of plastids remains controversial. Although a single origin of plastids by endosymbiosis is broadly supported, recent phylogenomic studies are contradictory on whether plastids branch early or late within Cyanobacteria. One underlying cause may be poor fit of evolutionary models to complex phylogenomic data.ResultsUsing Posterior Predictive Analysis, we show that recently applied evolutionary models poorly fit three phylogenomic datasets curated from cyanobacteria and plastid genomes because of heterogeneities in both substitution processes across sites and of compositions across lineages. To circumvent these sources of bias, we developed CYANO-MLP, a machine learning algorithm that consistently and accurately phylogenetically classifies ("phyloclassifies") cyanobacterial genomes to their clade of origin based on bioinformatically predicted function-informative features in tRNA gene complements. Classification of cyanobacterial genomes with CYANO-MLP is accurate and robust to deletion of clades, unbalanced sampling, and compositional heterogeneity in input tRNA data. CYANO-MLP consistently classifies plastid genomes into a late-branching cyanobacterial sub-clade containing single-cell, starch-producing, nitrogen-fixing ecotypes, consistent with metabolic and gene transfer data.ConclusionsPhylogenomic data of cyanobacteria and plastids exhibit both site-process heterogeneities and compositional heterogeneities across lineages. These aspects of the data require careful modeling to avoid bias in phylogenomic estimation. Furthermore, we show that amino acid recoding strategies may be insufficient to mitigate bias from compositional heterogeneities. However, the combination of our novel tRNA-specific strategy with machine learning in CYANO-MLP appears robust to these sources of bias with high accuracy in phyloclassification of cyanobacterial genomes. CYANO-MLP consistently classifies plastids as late-branching Cyanobacteria, consistent with independent evidence from signature-based approaches and some previous phylogenetic studies

    tRNA signatures reveal polyphyletic origins of streamlined SAR11 genomes among the alphaproteobacteria

    Get PDF
    Phylogenomic analyses are subject to bias from compositional convergence and noise from horizontal gene transfer (HGT). Compositional convergence is a likely cause of controversy regarding phylogeny of the SAR11 group of Alphaproteobacteria that have extremely streamlined, A+T-biased genomes. While careful modeling can reduce artifacts caused by convergence, the most consistent and robust phylogenetic signal in genomes may lie distributed among encoded functional features that govern macromolecular interactions. Here we develop a novel phyloclassification method based on signatures derived from bioinformatically defined tRNA Class-Informative Features (CIFs). tRNA CIFs are enriched for features that underlie tRNA-protein interactions. Using a simple tRNA-CIF-based phyloclassifier, we obtained results consistent with those of bias-corrected whole proteome phylogenomic studies, rejecting monophyly of SAR11 and affiliating most strains with Rhizobiales with strong statistical support. Yet SAR11 and Rickettsiales tRNA genes share distinct patterns of A+T-richness, as expected from their elevated genomic A+T compositions. Using conventional supermatrix methods on total tRNA sequence data, we could recover the artifactual result of a monophyletic SAR11 grouping with Rickettsiales. Thus tRNA CIF-based phyloclassification is more robust to base content convergence than supermatrix phylogenomics on whole tRNA sequences. Also, given the notoriously promiscuous HGT of aminoacyl-tRNA synthetases, tRNA CIF-based phyloclassification may be relatively robust to HGT of network components. We describe how unique features of tRNA-protein interaction networks facilitate the mining of traits governing macromolecular interactions from genomic data, and discuss why interaction-governing traits may be especially useful to solve difficult problems in microbial classification and phylogeny

    Physical Complexity of Symbolic Sequences

    Full text link
    A practical measure for the complexity of sequences of symbols (``strings'') is introduced that is rooted in automata theory but avoids the problems of Kolmogorov-Chaitin complexity. This physical complexity can be estimated for ensembles of sequences, for which it reverts to the difference between the maximal entropy of the ensemble and the actual entropy given the specific environment within which the sequence is to be interpreted. Thus, the physical complexity measures the amount of information about the environment that is coded in the sequence, and is conditional on such an environment. In practice, an estimate of the complexity of a string can be obtained by counting the number of loci per string that are fixed in the ensemble, while the volatile positions represent, again with respect to the environment, randomness. We apply this measure to tRNA sequence data.Comment: 12 pages LaTeX2e, 3 postscript figures, uses elsart.cls. Substantially improved and clarified version, includes application to EMBL tRNA sequence dat
    • ā€¦
    corecore