15 research outputs found

    Gene processing control loops suggested by sequencing, splicing, and RNA folding

    Get PDF
    Abstract Background Small RNAs are known to regulate diverse gene expression processes including translation, transcription, and splicing. Among small RNAs, the microRNAs (miRNAs) of 17 to 27 nucleotides (nts) undergo biogeneses including primary transcription, RNA excision and folding, nuclear export, cytoplasmic processing, and then bioactivity as regulatory agents. We propose that analogous hairpins from RNA molecules that function as part of the spliceosome might also be the source of small, regulatory RNAs (somewhat smaller than miRNAs). Results Deep sequencing technology has enabled discovery of a novel 16-nt RNA sequence in total RNA from human brain that we propose is derived from RNU1, an RNA component of spliceosome assembly. Bioinformatic alignments compel inquiring whether the novel 16-nt sequence or its precursor have a regulatory function as well as determining aspects of how processing intersects with the miRNA biogenesis pathway. Specifically, our preliminary in silico investigations reveal the sequence could regulate splicing factor Arg/Ser rich 1 (SFRS1), a gene coding an essential protein component of the spliceosome. All 16-base source sequences in the UCSC Human Genome Browser are within the 14 instances of RNU1 genes listed in wgEncodeGencodeAutoV3. Furthermore, 10 of the 14 instances of the sequence are also within a common 28-nt hairpin-forming subsequence of RNU1. Conclusions An abundant 16-nt RNA sequence is sourced from a spliceosomal RNA, lies in a stem of a predicted RNA hairpin, and includes reverse complements of subsequences of the 3'UTR of a gene coding for a spliceosome protein. Thus RNU1 could function both as a component of spliceosome assembly and as inhibitor of production of the essential, spliceosome protein coded by SFRS1. Beyond this example, a general procedure is needed for systematic discovery of multiple alignments of sequencing, splicing, and RNA folding data

    A Pseudo-tRNA Modulates Antibiotic Resistance in Bacillus cereus

    Get PDF
    Bacterial genomic islands are often flanked by tRNA genes, which act as sites for the integration of foreign DNA into the host chromosome. For example, Bacillus cereus ATCC14579 contains a pathogenicity island flanked by a predicted pseudo-tRNA, tRNA(Other), which does not function in translation. Deletion of tRNA(Other) led to significant changes in cell wall morphology and antibiotic resistance and was accompanied by changes in the expression of numerous genes involved in oxidative stress responses, several of which contain significant complementarities to sequences surrounding tRNA(Other). This suggested that tRNA(Other) might be expressed as part of a larger RNA, and RACE analysis subsequently confirmed the existence of several RNA species that significantly extend both the 39 and 5\u27-ends of tRNA(Other). tRNA(Other) expression levels were found to be responsive to changes in extracellular iron concentration, consistent with the presence of three putative ferric uptake regulator (Fur) binding sites in the 59 leader region of one of these larger RNAs. Taken together with previous data, this study now suggests that tRNA(Other) may function by providing a tRNA-like structural element within a larger regulatory RNA. These findings illustrate that while integration of genomic islands often leaves tRNA genes intact and functional, in other instances inactivation may generate tRNA-like elements that are then recruited to other functions in the cell

    A Pseudo-tRNA Modulates Antibiotic Resistance in \u3cem\u3eBacillus cereus\u3c/em\u3e

    Get PDF
    Bacterial genomic islands are often flanked by tRNA genes, which act as sites for the integration of foreign DNA into the host chromosome. For example, Bacillus cereus ATCC14579 contains a pathogenicity island flanked by a predicted pseudo-tRNA, tRNAOther, which does not function in translation. Deletion of tRNAOther led to significant changes in cell wall morphology and antibiotic resistance and was accompanied by changes in the expression of numerous genes involved in oxidative stress responses, several of which contain significant complementarities to sequences surrounding tRNAOther. This suggested that tRNAOther might be expressed as part of a larger RNA, and RACE analysis subsequently confirmed the existence of several RNA species that significantly extend both the 3′ and 5′-ends of tRNAOther. tRNAOther expression levels were found to be responsive to changes in extracellular iron concentration, consistent with the presence of three putative ferric uptake regulator (Fur) binding sites in the 5′ leader region of one of these larger RNAs. Taken together with previous data, this study now suggests that tRNAOther may function by providing a tRNA-like structural element within a larger regulatory RNA. These findings illustrate that while integration of genomic islands often leaves tRNA genes intact and functional, in other instances inactivation may generate tRNA-like elements that are then recruited to other functions in the cell

    Emerging applications of read profiles towards the functional annotation of the genome

    Get PDF
    Functional annotation of the genome in various species is important to understand their phenotypic complexity. The road towards functional annotation involves several challenges ranging from experiments on individual molecules to large-scale analysis of high-throughput sequencing (HTS) data. HTS data is typically a result of the protocol designed to address specific research questions. The sequencing results in reads, which when mapped to a reference genome often leads to the formation of distinct patterns (read profiles). Interpretation of these read profiles are essential for the analysis in relation to the research question addressed. Several strategies have been employed at varying levels of abstraction ranging from a somewhat ad hoc to a more systematic analysis of read profiles. These include methods which can compare read profiles, e.g. from direct (non-sequence based) alignments to classification of patterns into functional groups. In this review, we highlight the emerging applications of read profiles for the annotation of non-coding RNA and cis-regulatory regions such as enhancers and promoters. We also discuss the biological rationale behind their formation

    Characterization of statistical features for plant microRNA prediction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Several tools are available to identify miRNAs from deep-sequencing data, however, only a few of them, like miRDeep, can identify novel miRNAs and are also available as a standalone application. Given the difference between plant and animal miRNAs, particularly in terms of distribution of hairpin length and the nature of complementarity with its duplex partner (or miRNA star), the underlying (statistical) features of miRDeep and other tools, using similar features, are likely to get affected.</p> <p>Results</p> <p>The potential effects on features, such as minimum free energy, stability of secondary structures, excision length, etc., were examined, and the parameters of those displaying sizable changes were estimated for plant specific miRNAs. We found most of these features acquired a new set of values or distributions for plant specific miRNAs. While the length of conserved positions (nucleus) in mature miRNAs were relatively longer in plants, the difference in distribution of minimum free energy, between real and background hairpins, was marginal. However, the choice of source (species) of background sequences was found to affect both the minimum free energy and miRNA hairpin stability. The new parameters were tested on an Illumina dataset from maize seedlings, and the results were compared with those obtained using default parameters. The newly parameterized model was found to have much improved specificity and sensitivity over its default counterpart.</p> <p>Conclusions</p> <p>In summary, the present study reports behavior of few general and tool-specific statistical features for improving the prediction accuracy of plant miRNAs from deep-sequencing data.</p

    Differential and coherent processing patterns from small RNAs

    Get PDF
    Post-transcriptional processing events related to short RNAs are often reflected in their read profile patterns emerging from high-throughput sequencing data. MicroRNA arm switching across different tissues is a well-known example of what we define as differential processing. Here, short RNAs from the nine cell lines of the ENCODE project, irrespective of their annotation status, were analyzed for genomic loci representing differential or coherent processing. We observed differential processing predominantly in RNAs annotated as miRNA, snoRNA or tRNA. Four out of five known cases of differentially processed miRNAs that were in the input dataset were recovered and several novel cases were discovered. In contrast to differential processing, coherent processing is observed widespread in both annotated and unannotated regions. While the annotated loci predominantly consist of ~24nt short RNAs, the unannotated loci comparatively consist of ~17nt short RNAs. Furthermore, these ~17nt short RNAs are significantly enriched for overlap to transcription start sites and DNase I hypersensitive sites (p-value < 0.01) that are characteristic features of transcription initiation RNAs. We discuss how the computational pipeline developed in this study has the potential to be applied to other forms of RNA-seq data for further transcriptome-wide studies of differential and coherent processing

    Massive-Scale RNA-Seq Analysis of Non Ribosomal Transcriptome in Human Trisomy 21

    Get PDF
    Hybridization- and tag-based technologies have been successfully used in Down syndrome to identify genes involved in various aspects of the pathogenesis. However, these technologies suffer from several limits and drawbacks and, to date, information about rare, even though relevant, RNA species such as long and small non-coding RNAs, is completely missing. Indeed, none of published works has still described the whole transcriptional landscape of Down syndrome. Although the recent advances in high-throughput RNA sequencing have revealed the complexity of transcriptomes, most of them rely on polyA enrichment protocols, able to detect only a small fraction of total RNA content. On the opposite end, massive-scale RNA sequencing on rRNA-depleted samples allows the survey of the complete set of coding and non-coding RNA species, now emerging as novel contributors to pathogenic mechanisms. Hence, in this work we analysed for the first time the complete transcriptome of human trisomic endothelial progenitor cells to an unprecedented level of resolution and sensitivity by RNA-sequencing. Our analysis allowed us to detect differential expression of even low expressed genes crucial for the pathogenesis, to disclose novel regions of active transcription outside yet annotated loci, and to investigate a plethora of non-polyadenilated long as well as short non coding RNAs. Novel splice isoforms for a large subset of crucial genes, and novel extended untranslated regions for known genes—possibly novel miRNA targets or regulatory sites for gene transcription—were also identified in this study. Coupling the rRNA depletion of samples, followed by high-throughput RNA-sequencing, to the easy availability of these cells renders this approach very feasible for transcriptome studies, offering the possibility of investigating in-depth blood-related pathological features of Down syndrome, as well as other genetic disorders

    RNAplonc: um classificador para identificação de longos RNAs não codificantes em plantas

    Get PDF
    Long non-coding RNAs (lncRNAs) correspond to a non-coding RNA class that has gained emerging attention in the last years as a higher layer of regulation for gene expression in cells. There is, however, a lack of specific computational approaches to reliably predict lncRNA in plants, which contrast with the myriad of prediction tools available for mammalian lncRNAs. Given that the biological features and mechanisms generating lncRNAs in the cell are likely different between animals and plants, specific tools for plants is a need for these studies. With this in mind, we present here RNAplonc, a classifier approach for the identification of lncRNAs in plants from mRNA-based data. To build this tool, we used publicly available lncRNA and mRNA sequences from six plant genomes: Arabidopsis thaliana, Cucumis sativus, Glycine max, Oryza sativa, Populus trichocarpa and Setaria italica. This data was extracted from the public databases PLNlncRbase, GreeNC and Phytozome, from which we used 22.543 lncRNAs and 29.960 mRNAs as a training set. We selected 16 features that could best classify lncRNAs from 5.468 features with the REPTree algorithm for lncRNA. After an extensive comparison with tools used for lncRNA identification in plants (CPC) and animals (PLEK and lncRScan-SVM), we found that RNAplonc obtained a better accuracy (92%) in the training dataset when compared to the 77% of accuracy obtained with the CPC tool. We also found that RNAplonc produced more reliable lncRNA predictions from plant transcripts, as estimated for 17 datasets in 13 species from the CANTATAdb, GreeNC and PNRD databases. We also evaluated RNAplonc performance in two case studies that identified lncRNAs from Populus tomentosa and Gossypium, respectively. RNAplonc could correctly identify 98.5% of biologically validated lncRNAs in Populus and 99.1% in Gossypium. RNAplonc, its documentation and training datasets are available at the website: http://rnaplonc.cp.utfpr.edu.br/. We can conclude that RNAplonc retrieves correctly known plant lncRNAs. Moreover, RNAplonc can be a strategy for lncRNA discovery, providing a rich resource of candidate lncRNAs specifically for plants.Fundação Araucária de Apoio ao Desenvolvimento Científico e Tecnológico do ParanáUniversidade Tecnológica Federal do Paraná (UTFPR)Longos RNAs não-codificantes (lncRNAs) pertencem a classe dos RNAs que não codificam proteínas e que estão relacionados às diversas funções biológicas, como modificações da cromatina, regulação pós-transcricional, tradução, organização nuclear e diversos processos de desenvolvimento. Atualmente há uma lacuna de abordagens computacionais específicas para a identificação de lncRNAs em plantas, em oposição à variedade de ferramentas disponíveis para mamíferos. Diferente do que ocorre para outras classes de RNAs não-codificantes, a distinção dos lncRNA entre plantas e animais ainda não está esclarecida. Dado este cenário, este trabalho apresenta o RNAplonc, uma abordagem para a identificação de lncRNAs em plantas. A base da construção foram sequências públicas de lncRNAs e mRNAs disponíveis de seis genomas de plantas: Arabidopsis thaliana, Cucumis sativus, Glycine max, Oryza sativa, Populus trichocarpa e Setaria italica. Foram usados 22.543 lncRNAs e 29.960 mRNAs como conjunto de treinamento a partir de bases de dados públicas PLNlncRbase, GreeNC e Phytozome. Ainda, avaliaram-se 5.468 características em 10 algoritmos de aprendizado de máquina. Os resultados obtidos pela análise de sensibilidade e especificidade de classificação permitiram selecionar 16 características com o algoritmo REPTree, alcançando 93% de acertos na classificação de lncRNAs. Em seguida, avaliou-se o desempenho do RNAplonc com uma ferramenta largamente utilizada para a identificação de lncRNA em plantas (CPC) e outras duas aplicadas para animais (PLEK e lncRScan-SVM). O RNAplonc obteve uma sensibilidade de 99,83% na identificação de lncRNAs no conjunto de dados de treinamento quando comparado com a ferramenta CPC. Ainda, avaliou-se o desempenho do RNAplonc em dois estudos de caso independente que identificaram com evidências biológicas lncRNAs em Populus e Gossypium, tendo assim obtido 98,5% e 99,1% dos lncRNAs identificados em Populus e Gossypium, respectivamente. Toda a documentação e os conjuntos de utilizados (treinamento e testes) estão disponíveis no endereço: http://rnaplonc.cp.utfpr.edu.br/. Por fim, acredita-se que o RNAplonc é uma estratégia para contribuir na descoberta de lncRNAs candidatos especificamente para plantas

    Predição e análise da expressão de RNAS não codificadores com função regulatória presentes na bactéria Herbaspirillum Seropedicae SmR1

    Get PDF
    Resumo: Herbaspirillum seropedicae estirpe SmR1 é uma bactéria endofítica capaz de fixar nitrogênio e promover o crescimento de importantes culturas agrícolas. Seu genoma foi completamente sequenciado e anotado pelo Programa Genoma do Estado do Paraná (GENOPAR Consortium (www.genopar.org)). Esta bactéria tem um único cromossomo circular de 5.513.887 pares de base com 4.735 ORFs anotadas, as quais representam 88,3% do genoma. Em bactérias, RNAs não codificadores com função regulatória (ncRNAs) podem modular várias respostas fisiológicas e atuar por diferentes mecanismos, como pareamento de bases de RNA-RNA e interações RNA-proteína. Tecnologias de sequenciamento High-throughput, como por exemplo, a plataforma SOLiD, estão permitindo a identificação em larga escala de ncRNAs, revelando a existência de vários transcritos não-codificadores e indicando que a quantidade de ncRNAs reguladores pode ser maior do que se pensava anteriormente. Tradicionalmente, abordagens in silico para a identificação dessas moléculas envolvem a associação de sequência de promotor fator sigma70 com sequências de terminador Rho-independentes, e / ou conservação de estruturas primárias e secundárias de RNA. O objetivo deste trabalho foi identificar e avaliar a expressão de ncRNAs presentes em H. Seropedicae SmR1. Para isso, o genoma completo foi pesquisado com as ferramentas de bioinformática Gsalgorithm e nocoRNAc, que foram usadas para identificar regiões do genoma flanqueadas por sequências de promotor e sequências de terminador Rho- independentes candidatas a codificar ncRNAs. Adicionalmente, a ferramenta Cufflinks foi utilizadas para localizar regiões do genoma com consideráveis níveis de transcrição e livres de ORFS. Para avaliar a expressão dos ncRNAs, o transcriptoma de H. seropedicae SMR1 cultivada em três diferentes condições foi determinado por RNA-Seq utilizando a plataforma de sequenciamento SOLiD. Um total de 98 ncRNAs foram confirmados no conjunto de dados do transcriptoma. A comparação dos ncRNAs com os bancos de dados RFAM e RIBEX revelaram que apenas oito transcritos identificados nesse estudo já haviam sido descritos em outras espécies de bactérias, descando os seguintes: 4.5S RNA, 6S RNA (SsrS), Intron_gpI, tmRNA, and TPP Riboswitch. A função de 90 novos ncRNAs serão investigados in vivo

    Methods in and Applications of the Sequencing of Short Non-Coding RNAs

    Get PDF
    Short non-coding RNAs are important for all domains of life. With the advent of modern molecular biology their applicability to medicine has become apparent in settings ranging from diagonistic biomarkers to therapeutics and fields ranging from oncology to neurology. In addition, a critical, recent technological development is high-throughput sequencing of nucleic acids. The convergence of modern biotechnology with developments in RNA biology presents opportunities in both basic research and medical settings. Here I present two novel methods for leveraging high-throughput sequencing in the study of short non-coding RNAs, as well as a study in which they are applied to Alzheimer\u27s Disease (AD). The computational methods presented here include High-throughput Annotation of Modified Ribonucleotides (HAMR), which enables researchers to detect post-transcriptional covalent modifications to RNAs in a high-throughput manner. In addition, I describe Classification of RNAs by Analysis of Length (CoRAL), a computational method that allows researchers to characterize the pathways responsible for short non-coding RNA biogenesis. Lastly, I present an application of the study of non-coding RNAs to Alzheimer\u27s disease. When applied to the study of AD, it is apparent that several classes of non-coding RNAs, particularly tRNAs and tRNA fragments, show striking changes in the dorsolateral prefrontal cortex of affected human brains. Interestingly, the nature of these changes differs between mitochondrial and nuclear tRNAs, implicating an association between Alzheimer\u27s disease and perturbation of mitochondrial function. In addition, by combining known genetic factors of AD with genes that are differentially expressed and targets of regulatory RNAs that are differentially expressed, I construct a network of genes that are potentially relevant to the pathogenesis of the disease. By combining genetics data with novel results from the study of non-coding RNAs, we can further elucidate the molecular mechanisms that underly Alzheimer\u27s disease pathogenesis
    corecore