4,863 research outputs found

    COMPUTER METHODS FOR PRE-MICRORNA SECONDARY STRUCTURE PREDICTION

    Get PDF
    This thesis presents a new algorithm to predict the pre-microRNA secondary structure. An accurate prediction of the pre-microRNA secondary structure is important in miRNA informatics. Based on a recently proposed model, nucleotide cyclic motifs (NCM), to predict RNA secondary structure, we propose and implement a Modified NCM (MNCM) model with a physics-based scoring strategy to tackle the problem of pre-microRNA folding. Our microRNAfold is implemented using a global optimal algorithm based on the bottom-up local optimal solutions. It has been shown that studying the functions of multiple genes and predicting the secondary structure of multiple related microRNA is more important and meaningful since many polygenic traits in animals and plants can be controlled by more than a single gene. We propose a parallel algorithm based on the master-slave architecture to predict the secondary structure from an input sequence. The experimental results show that our algorithm is able to produce the optimal secondary structure of polycistronic microRNAs. The trend of speedups of our parallel algorithm matches that of theoretical speedups. Conserved secondary structures are likely to be functional, and secondary structural characteristics that are shared between endogenous pre-miRNAs may contribute toward efficient biogenesis. So identifying conserved secondary structure is very meaningful and identifying conserved characteristics in RNA is a very important research field. After the characteristics are extracted from the secondary structures of RNAs, corresponding patterns or rules could be dug out and used. We propose to use the conserved microRNA characteristics in two aspects: to improve prediction through knowledge base, and to classify the real specific microRNAs from pseudo microRNAs. Through statistical analysis of the performance of classification, we verify that the conserved characteristics extracted from microRNAsā€™ secondary structures are precise enough. Gene suppression is a powerful tool for functional genomics and elimination of specific gene products. However, current gene suppression vectors can only be used to silence a single gene at a time. So we design an efficient poly-cistronic microRNA vector and the web-based tool allows users to design their own microRNA vectors online

    Genome-Wide Analysis of RNA Secondary Structure in Eukaryotes

    Get PDF
    The secondary structure of an RNA molecule plays an integral role in its maturation, regulation, and function. Over the past decades, myriad studies have revealed specific examples of structural elements that direct the expression and function of both protein-coding messenger RNAs (mRNAs) and non-coding RNAs (ncRNAs). In this work, we develop and apply a novel high-throughput, sequencing-based, structure mapping approach to study RNA secondary structure in three eukaryotic organisms. First, we assess global patterns of secondary structure across protein-coding transcripts and identify a conserved mark of strongly reduced base pairing at transcription start and stop sites, which we hypothesize helps with ribosome recruitment and function. We also find empirical evidence for reduced base pairing within microRNA (miRNA) target sites, lending further support to the notion that even mRNAs have additional selective pressures outside of their protein coding sequence. Next, we integrate our structure mapping approaches with transcriptome-wide sequencing of ribosomal RNA-depleted (RNA-seq), small (smRNA-seq), and ribosome-bound (ribo-seq) RNA populations to investigate the impact of RNA secondary structure on gene expression regulation in the model organism Arabidopsis thaliana. We find that secondary structure and mRNA abundance are strongly anti-correlated, which is likely due to the propensity for highly structured transcripts to be degraded and/or processed into smRNAs. Finally, we develop a likelihood model and Bayesian Markov chain Monte Carlo (MCMC) algorithm that utilizes the sequencing data from our structure mapping approaches to generate single-nucleotide resolution predictions of RNA secondary structure. We show that this likelihood framework resolves ambiguities that arise from the sequencing protocol and leads to significantly increased prediction accuracy. In total, our findings provide on a global scale both validation of existing hypotheses regarding RNA biology as well as new insights into the regulatory and functional consequences of RNA secondary structure. Furthermore, the development of a statistical approach to structure prediction from sequencing data offers the promise of true genome-wide determination of RNA secondary structure

    Networks of intergenic long-range enhancers and snpRNAs drive castration-resistant phenotype of prostate cancer and contribute to pathogenesis of multiple common human disorders

    Get PDF
    Biological and mechanistic relevance of intergenic disease-associated genetic loci (IDAGL) containing highly statistically significant disease-linked SNPs remains unknown. Here we present the experimental and clinical evidence revealing important role of IDAGL in human diseases. Targeted RT-PCR screen coupled with sequencing of purified PCR products detects widespread transcription at multiple intergenic disease-associated genomic loci (IDAGL) and identifies 96 small non-coding trans-regulatory RNAs of ~ 100-300 nt in length containing SNPs associated with 21 common human disorders (snpRNAs). Functionality of snpRNAs is supported by multiple independent lines of experimental evidence demonstrating their cell-type-specific expression and evolutionary conservation of sequences, genomic coordinates, and biological effects. Analysis of chromatin state signatures, expression profiling experiments using microarray and Q-PCR technologies, and luciferase reporter assays indicate that many IDAGL are Polycomb-regulated long-range enhancers. Expression of snpRNAs in human and mouse cells markedly affects cellular behavior and induces allele-specific clinically-relevant phenotypic changes: NLRP1-locus snpRNAs exert regulatory effects on monocyte/macrophage trans-differentiation, induce prostate cancer (PC) susceptibility snpRNAs, and transform low-malignancy hormone-dependent human PC cells into highly malignant androgen-independent PC. Q-PCR analysis and luciferase reporter assays demonstrate that snpRNA sequences represent allele-specific “decoy” targets of microRNAs which function as SNP-allele-specific modifiers of microRNA expression and activity. We demonstrate that trans-acting RNA molecules facilitating androgen depletion-independent growth (ADIG) in vitro and castration-resistant (CR) phenotype in vivo of PC contain intergenic 8q24-locus SNP variants which were recently linked with increased risk of developing PC. Expression level of 8q24-locus PC susceptibility snpRNAs is regulated by NLRP1-locus snpRNAs, which are transcribed from the intergenic long-range enhancer sequence located in 17p13 region at ~ 30 kb distance from the NLRP1 gene. Q-PCR analysis of clinical PC samples reveals markedly increased snpRNA expression levels in tumor tissues compared to the adjacent normal prostate [122-fold and 45-fold in Gleason 7 tumors (p = 0.03); 370-fold and 127-fold in Gleason 8 tumors (p = 0.0001); for NLRP1-locus and 8q24-locus SnpRNAs, respectively]. Highly concordant expression profiles of the NLRP1-locus snpRNAs and 8q24 CR-locus snpRNAs (r = 0.896; p < 0.0001) in clinical PC samples and experimental evidence of trans-regulatory effects of NLRP1-locus snpRNAs on expression of 8q24-locus SnpRNAs indicate that ADIG and CR phenotype of human PC cells can be triggered by RNA molecules transcribed from the NLRP1-locus intergenic enhancer and down-stream activation of the 8q24-locus snpRNAs. Our results define the intergenic NLRP1 and 8q24 regions as regulatory loci of ADIG and CR phenotype of human PC, reveal previously unknown molecular links between the innate immunity/inflammasome system and development of hormone-independent PC, and identify novel diagnostic and therapeutic targets exploration of which should be highly beneficial for clinical management of PC

    miRA: adaptable novel miRNA identification in plants using small RNA sequencing data

    No full text
    BACKGROUND: MicroRNAs (miRNAs) are short regulatory RNAs derived from longer precursor RNAs. miRNA biogenesis has been studied in animals and plants, recently elucidating more complex aspects, such as non-conserved, species-specific, and heterogeneous miRNA precursor populations. Small RNA sequencing data can help in computationally identifying genomic loci of miRNA precursors. The challenge is to predict a valid miRNA precursor from inhomogeneous read coverage from a complex RNA library: while the mature miRNA typically produces many sequence reads, the remaining part of the precursor is covered very sparsely. As recent results suggest, alternative miRNA biogenesis pathways may lead to a more diverse miRNA precursor population than previously assumed. In plants, the latter manifests itself in e.g. complex secondary structures and expression from multiple loci within precursors. Current miRNA identification algorithms often depend on already existing gene annotation, and/or make use of specific miRNA precursor features such as precursor lengths, secondary structures etc. Consequently and in view of the emerging new understanding of a more complex miRNA biogenesis in plants, current tools may fail to characterise organism-specific and heterogeneous miRNA populations. RESULTS: miRA is a new tool to identify miRNA precursors in plants, allowing for heterogeneous and complex precursor populations. miRA requires small RNA sequencing data and a corresponding reference genome, and evaluates precursor secondary structures and precursor processing accuracy; key parameters can be adapted based on the specific organism under investigation. We show that miRA outperforms the currently best plant miRNA prediction tools both in sensitivity and specificity, for data involving Arabidopsis thaliana and the Volvocine algae Chlamydomonas reinhardtii; the latter organism has been shown to exhibit a heterogeneous and complex precursor population with little cross-species miRNA sequence conservation, and therefore constitutes an ideal model organism. Furthermore we identify novel miRNAs in the Chlamydomonas-related organism Volvox carteri. CONCLUSIONS: We propose miRA, a new plant miRNA identification tool that is well adapted to complex precursor populations. miRA is particularly suited for organisms with no existing miRNA annotation, or without a known related organism with well characterized miRNAs. Moreover, miRA has proven its ability to identify species-specific miRNAs. miRA is flexible in its parameter settings, and produces user-friendly output files in various formats (pdf, csv, genome-browser-suitable annotation files, etc.). It is freely available at https://github.com/mhuttner/miRA .The authors acknowledge funding from the Deutsche Forschungsgemeinschaft (SFB 960), the Bavarian Genome Research Network (BayGene), and the Bavarian Biosystems Network (BioSysNet)

    Analysis of Machine Learning Based Methods for Identifying MicroRNA Precursors

    Get PDF
    MicroRNAs are a type of non-coding RNA that were discovered less than a decade ago but are now known to be incredibly important in regulating gene expression despite their small size. However, due to their small size, and several other limiting factors, experimental procedures have had limited success in discovering new microRNAs. Computational methods are therefore vital to discovering novel microRNAs. Many different approaches have been used to scan genomic sequences for novel microRNAs with varying degrees of success. This work provides an overview of these computational methods, focusing particularly on those methods based on machine learning techniques. The results of experiments performed on several of the machine learning based microRNA detectors are provided along with an analysis of their performance

    miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments

    Get PDF
    Next-generation sequencing allows now the sequencing of small RNA molecules and the estimation of their expression levels. Consequently, there will be a high demand of bioinformatics tools to cope with the several gigabytes of sequence data generated in each single deep-sequencing experiment. Given this scene, we developed miRanalyzer, a web server tool for the analysis of deep-sequencing experiments for small RNAs. The web server tool requires a simple input file containing a list of unique reads and its copy numbers (expression levels). Using these data, miRanalyzer (i) detects all known microRNA sequences annotated in miRBase, (ii) finds all perfect matches against other libraries of transcribed sequences and (iii) predicts new microRNAs. The prediction of new microRNAs is an especially important point as there are many species with very few known microRNAs. Therefore, we implemented a highly accurate machine learning algorithm for the prediction of new microRNAs that reaches AUC values of 97.9% and recall values of up to 75% on unseen data. The web tool summarizes all the described steps in a single output page, which provides a comprehensive overview of the analysis, adding links to more detailed output pages for each analysis module. miRanalyzer is available at http://web.bioinformatics.cicbiogune.es/microRNA/

    Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering

    Get PDF
    The RFAM database defines families of ncRNAs by means of sequence similarities that are sufficient to establish homology. In some cases, such as microRNAs and box H/ACA snoRNAs, functional commonalities define classes of RNAs that are characterized by structural similarities, and typically consist of multiple RNA families. Recent advances in high-throughput transcriptomics and comparative genomics have produced very large sets of putative noncoding RNAs and regulatory RNA signals. For many of them, evidence for stabilizing selection acting on their secondary structures has been derived, and at least approximate models of their structures have been computed. The overwhelming majority of these hypothetical RNAs cannot be assigned to established families or classes. We present here a structure-based clustering approach that is capable of extracting putative RNA classes from genome-wide surveys for structured RNAs. The LocARNA (local alignment of RNA) tool implements a novel variant of the Sankoff algorithm that is sufficiently fast to deal with several thousand candidate sequences. The method is also robust against false positive predictions, i.e., a contamination of the input data with unstructured or nonconserved sequences. We have successfully tested the LocARNA-based clustering approach on the sequences of the RFAM-seed alignments. Furthermore, we have applied it to a previously published set of 3,332 predicted structured elements in the Ciona intestinalis genome (Missal K, Rose D, Stadler PF (2005) Noncoding RNAs in Ciona intestinalis. Bioinformatics 21 (Supplement 2): i77ā€“i78). In addition to recovering, e.g., tRNAs as a structure-based class, the method identifies several RNA families, including microRNA and snoRNA candidates, and suggests several novel classes of ncRNAs for which to date no representative has been experimentally characterized
    • ā€¦
    corecore