8 research outputs found

    RNA Accessibility in cubic time

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The accessibility of RNA binding motifs controls the efficacy of many biological processes. Examples are the binding of miRNA, siRNA or bacterial sRNA to their respective targets. Similarly, the accessibility of the Shine-Dalgarno sequence is essential for translation to start in prokaryotes. Furthermore, many classes of RNA binding proteins require the binding site to be single-stranded.</p> <p>Results</p> <p>We introduce a way to compute the accessibility of all intervals within an RNA sequence in <inline-formula><graphic file="1748-7188-6-3-i1.gif"/></inline-formula>(<it>n</it><sup>3</sup>) time. This improves on previous implementations where only intervals of one defined length were computed in the same time. While the algorithm is in the same efficiency class as sampling approaches, the results, especially if the probabilities get small, are much more exact.</p> <p>Conclusions</p> <p>Our algorithm significantly speeds up methods for the prediction of RNA-RNA interactions and other applications that require the accessibility of RNA molecules. The algorithm is already available in the program RNAplfold of the ViennaRNA package.</p

    A comprehensive comparison of general RNA-RNA interaction prediction methods

    Get PDF
    RNA-RNA interactions are fast emerging as a major functional component in many newly discovered non-coding RNAs. Basepairing is believed to be a major contributor to the stability of these intermolecular interactions, much like intramolecular basepairs formed in RNA secondary structure. As such, using algorithms similar to those for predicting RNA secondary structure, computational methods have been recently developed for the prediction of RNA-RNA interactions.We provide the first comprehensive comparison comprising 14 methods that predict general intermolecular basepairs. To evaluate these, we compile an extensive data set of 54 experimentally confirmed fungal snoRNA-rRNA interactions and 102 bacterial sRNA-mRNA interactions. We test the performance accuracy of all methods, evaluating the effects of tool settings, sequence length, and multiple sequence alignment usage and quality.Our results show that-unlike for RNA secondary structure prediction-the overall best performing tools are non-comparative energy-based tools utilizing accessibility information that predict short interactions on this data set. Furthermore, we find that maintaining high accuracy across biologically different data sets and increasing input lengths remains a huge challenge, causing implications for de novo transcriptome-wide searches. Finally, we make our interaction data set publicly available for future development and benchmarking efforts

    ViennaRNA Package 2.0

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Secondary structure forms an important intermediate level of description of nucleic acids that encapsulates the dominating part of the folding energy, is often well conserved in evolution, and is routinely used as a basis to explain experimental findings. Based on carefully measured thermodynamic parameters, exact dynamic programming algorithms can be used to compute ground states, base pairing probabilities, as well as thermodynamic properties.</p> <p>Results</p> <p>The <monospace>ViennaRNA</monospace> Package has been a widely used compilation of RNA secondary structure related computer programs for nearly two decades. Major changes in the structure of the standard energy model, the <it>Turner 2004 </it>parameters, the pervasive use of multi-core CPUs, and an increasing number of algorithmic variants prompted a major technical overhaul of both the underlying <monospace>RNAlib</monospace> and the interactive user programs. New features include an expanded repertoire of tools to assess RNA-RNA interactions and restricted ensembles of structures, additional output information such as <it>centroid </it>structures and <it>maximum expected accuracy </it>structures derived from base pairing probabilities, or <it>z</it>-<it>scores </it>for locally stable secondary structures, and support for input in <monospace>fasta</monospace> format. Updates were implemented without compromising the computational efficiency of the core algorithms and ensuring compatibility with earlier versions.</p> <p>Conclusions</p> <p>The <monospace>ViennaRNA Package 2.0</monospace>, supporting concurrent computations <monospace>via OpenMP</monospace>, can be downloaded from <url>http://www.tbi.univie.ac.at/RNA</url>.</p

    Genome-Wide Analysis of mRNAs and lncRNAs of Intramuscular Fat Related to Lipid Metabolism in Two Pig Breeds

    Get PDF
    Background/Aims: Long non-coding RNAs (lncRNAs) can regulate adipogenesis and lipid accumulation. Intramuscular fat deposition appears to vary in different pig breeds, and the regulation mechanism has not yet been fully elucidated at molecular level. Moreover, little is known about the function and profile of lncRNAs in intramuscular fat deposition and metabolism in pig. The aim of this study was thus to explore the regulatory functions of lncRNAs in intramuscular fat deposition. Methods: In this study, Laiwu (LW) pig and Large White (LY) pig with significant difference in fat deposition were selected for use. RNA-seq technology and bioinformatics methods were used to comparatively analyze the gene expression profiles of intramuscular fat between LW and LY pigs to identify key mRNAs and lncRNAs associated with lipid metabolism and adipogenesis. Real-time fluorescence-based quantitative PCR was applied to verify the expression level of the differentially expressed mRNAs and lncRNAs. Results: A total of 513 mRNAs and 55 lncRNAs were differentially expressed between two pig breeds. By co-expression network construction as well as cis- and trans-regulated target gene analysis, 31 key lncRNAs were identified. Gene Ontology and KEGG pathway analyses revealed that differentially expressed genes and lncRNAs were mainly involved in the biological processes and pathways related to adipogenesis and lipid metabolism. Conclusion: XLOC_046142, XLOC_004398 and XLOC_015408 may target MAPKAPK2, NR1D2 and AKR1C4, respectively, and play critical regulatory roles in intramuscular adipogenesis and lipid accumulation in pig. XLOC_064871 and XLOC_011001 may play a role in lipid metabolism-related disease via regulating TRIB3 and BRCA1. This study provides a valuable resource for lncRNA study and improves our understanding of the biological roles of lipid metabolism- related genes and molecular mechanism of intramuscular fat metabolism and deposition

    Investigating the concept of accessibility for predicting novel RNA-RNA interactions

    Get PDF
    State-of-the-art methods for predicting novel trans RNA-RNA interactions use the so-called accessibility as key concept. It estimates whether a region in a given RNA sequence is accessible for forming trans interactions, using a thermodynamic model which quantifies its secondary structure features. RNA-RNA interactions are then predicted by finding the minimum free energy base pairing between the two transcripts, taking into account the accessibility as energy penalty. We investigated the underlying assumptions of this approach using the two methods RNAPLEX and INTARNA on two datasets, containing sRNA-mRNA and snoRNA-rRNA interactions, respectively. We find that (1) known trans RNA-RNA interactions frequently overlap regions containing RNA structure features, (2) the estimated accessibility reflects sRNA structures fairly well, but often disagrees with structures of longer transcripts, (3) the prediction performance of RNA-RNA interaction prediction methods is independent of the quality of the estimated accessibility profiles, and (4) one important overall effect of accessibility profiles is to prevent the thermodynamic model from predicting too long interactions. Based on our findings, we conclude that the accessibility concept to the minimum free energy approach to predicting novel RNA-RNA interactions has conceptual limitations and discuss potential ways of improving the field in the future

    Recurrent Neural Networks and Their Applications to RNA Secondary Structure Inference

    Get PDF
    Recurrent neural networks (RNNs) are state of the art sequential machine learning tools, but have difficulty learning sequences with long-range dependencies due to the exponential growth or decay of gradients backpropagated through the RNN. Some methods overcome this problem by modifying the standard RNN architecure to force the recurrent weight matrix W to remain orthogonal throughout training. The first half of this thesis presents a novel orthogonal RNN architecture that enforces orthogonality of W by parametrizing with a skew-symmetric matrix via the Cayley transform. We present rules for backpropagation through the Cayley transform, show how to deal with the Cayley transform\u27s singularity, and compare its performance on benchmark tasks to other orthogonal RNN architectures. The second half explores two deep learning approaches to problems in RNA secondary structure inference and compares them to a standard structure inference tool, the nearest neighbor thermodynamic model (NNTM). The first uses RNNs to detect paired or unpaired nucleotides in the RNA structure, which are then converted into synthetic auxiliary data that direct NNTM structure predictions. The second method uses recurrent and convolutional networks to directly infer RNA base pairs. In many cases, these approaches improve over NNTM structure predictions by 20-30 percentage points

    Expanding the SnoRNA Interaction Network: Conservation of Guiding Function in Vertebrates

    Get PDF
    Small nucleolar RNAs (snoRNAs) are one of the most abundant and evolutionary ancient group of small non-coding RNAs. Their main function is to target chemical modifications of ribosomal RNAs (rRNAs) and small nuclear (snRNAs). They fall into two classes, box C/D snoRNAs and box H/ACA snoRNAs, which are clearly distinguished by conserved sequence motifs and the type of modification that they govern. The box H/ACA snoRNAs are responsible for targeting pseudouridylation sites and the box C/D snoRNAs for directing 2’-O-methylation of ribonucleotides. A subclass that localize to the Cajal bodies, termed scaRNAs, are responsible for methylation and pseudouridylation of snRNAs. In addition an amazing diversity of non-canonical functions of individual snoRNAs arose. The modification patterns in rRNAs and snRNAs are retained during evolution making it even possible to project them from yeast onto human. The stringent conservation of modification sites and the slow evolution of rRNAs and snRNAs contradicts the rapid evolution of snoRNA sequences. Recent studies that incorporate high-throughput sequencing experiments still identify undetected snoRNAs even in well studied organisms as human. The snoRNAbase, which has been the standard database for human snoRNAs has not been updated ince 2006 and misses these new data. Along with the lack of a centralized data collection across species, which incorporates also snoRNA class specific characteristics the need to integrate distributed data from literature and databases into a comprehensive snoRNA set arose. Although several snoRNA studies included pro forma target predictions in individual species and more and more studies focus on non-canonical functions of subclasses a systematic survey on the guiding function and especially functional homologies of snoRNAs was not available. To establish a sound set of snoRNAs a computational snoRNA annotation pipeline, named snoStrip that identifies homologous snoRNAs in related species was employed. For large scale investigation of the snoRNA function, state-of-the-art target pedictions were performed with our software RNAsnoop and PLEXY. Further, a new measure the Interaction Conservation Index (ICI) was developed to evaluate the conservation of snoRNA function. The snoStrip pipeline was applied to vertebrate species, where the genome sequence has been available. In addition, it was used in several ncRNA annotation studies (48 avian, spotted gar) of newly assembled genomes to contribute the snoRNA genes. Detailed target analysis of the new vertebrate snoRNA set revealed that in general functions of homologous snoRNAs are evolutionarily stable, thus, members of the same snoRNA family guide equivalent modifications. The conservation of snoRNA sequences is high at target binding regions while the remaining sequence varies significantly. In addition to elucidating principles of correlated evolution it was possible, with the help of the ICI measure, to assign functions to previously orphan snoRNAs and to associate snoRNAs as partners to known but so far unexplained chemical modifications. As further pattern redundant guiding became apparent. For many modification sites more than one snoRNA encodes the appropriate antisense element (ASE), which could ensure constant modification through snoRNAs that have different expression patterns. Furthermore, predictions of snoRNA functions in conjunction with sequence conservation could identify distant homologies. Due to the high overall entropy of snoRNA sequences, such relationships are hard to detect by means of sequence homology search methods alone. The snoRNA interaction network was further expanded through novel snoRNAs that were detected in data from high-throughput experiments in human and mouse. Through subsequent target analysis the new snoRNAs could immediately explain known modifications that had no appropriate snoRNA guide assigned before. In a further study a full catalog of expressed snoRNAs in human was provided. Beside canonical snoRNAs also recent findings like AluACAs, sno-lncRNAs and extraordinary short SNORD-like transcripts were taken into account. Again the target analysis workflow identified undetected connections between snoRNA guides and modifications. Especially some species/clade specific interactions of SNORD-like genes emerged that seem to act as bona fide snoRNA guides for rRNA and snRNA modifications. For all high confident new snoRNA genes identified during this work official gene names were requested from the HUGO Gene Nomenclature Committee (HGNC) avoiding further naming confusion
    corecore