29 research outputs found

    Rethinking Performance Measures of RNA Secondary Structure Problems

    Full text link
    Accurate RNA secondary structure prediction is vital for understanding cellular regulation and disease mechanisms. Deep learning (DL) methods have surpassed traditional algorithms by predicting complex features like pseudoknots and multi-interacting base pairs. However, traditional distance measures can hardly deal with such tertiary interactions and the currently used evaluation measures (F1 score, MCC) have limitations. We propose the Weisfeiler-Lehman graph kernel (WL) as an alternative metric. Embracing graph-based metrics like WL enables fair and accurate evaluation of RNA structure prediction algorithms. Further, WL provides informative guidance, as demonstrated in an RNA design experiment.Comment: 12 pages, Accepted at the Machine Learning for Structural Biology Workshop, NeurIPS 202

    Finding RNA structure in the unstructured RBPome

    Get PDF
    BACKGROUND: RNA-binding proteins (RBPs) play vital roles in many processes in the cell. Different RBPs bind RNA with different sequence and structure specificities. While sequence specificities for a large set of 205 RBPs have been reported through the RNAcompete compendium, structure specificities are known for only a small fraction. The main limitation lies in the design of the RNAcompete technology, which tests RBP binding against unstructured RNA probes, making it difficult to infer structural preferences from these data. We recently developed RCK, an algorithm to infer sequence and structural binding models from RNAcompete data. The set of binding models enables, for the first time, a large-scale assessment of RNA structure in the RBPome. RESULTS: We re-validate and uncover the role of RNA structure in the RPBome through novel analysis of the largest-scale dataset to date. First, we show that RNA structure exists in presumably unstructured RNA probes and that its variability is correlated with RNA-binding. Second, we examine the structural binding preferences of RBPs and discover an overall preference to bind RNA loops. Third, we significantly improve protein-binding prediction using RNA structure, both in vitro and in vivo. Lastly, we demonstrate that RNA structural binding preferences can be inferred for new proteins from solely their amino acid content. CONCLUSIONS: By counter-intuitively demonstrating through our analysis that we can predict both the RNA structure of and RBP binding to these putatively unstructured RNAs, we transform a compendium of RNA-binding proteins into a valuable resource for structure-based binding models. We uncover the important role RNA structure plays in protein-RNA interaction for hundreds of RNA-binding proteins

    RCK: accurate and efficient inference of sequence- and structure-based protein–RNA binding models from RNAcompete data

    Get PDF
    Motivation: Protein-RNA interactions, which play vital roles in many processes, are mediated through both RNA sequence and structure. CLIP-based methods, which measure protein-RNA binding in vivo, suffer from experimental noise and systematic biases, whereas in vitro experiments capture a clearer signal of protein RNA-binding. Among them, RNAcompete provides binding affinities of a specific protein to more than 240 000 unstructured RNA probes in one experiment. The computational challenge is to infer RNA structure- and sequence-based binding models from these data. The state-of-the-art in sequence models, Deepbind, does not model structural preferences. RNAcontext models both sequence and structure preferences, but is outperformed by GraphProt. Unfortunately, GraphProt cannot detect structural preferences from RNAcompete data due to the unstructured nature of the data, as noted by its developers, nor can it be tractably run on the full RNACompete dataset. Results: We develop RCK, an efficient, scalable algorithm that infers both sequence and structure preferences based on a new k-mer based model. Remarkably, even though RNAcompete data is designed to be unstructured, RCK can still learn structural preferences from it. RCK significantly outperforms both RNAcontext and Deepbind in in vitro binding prediction for 244 RNAcompete experiments. Moreover, RCK is also faster and uses less memory, which enables scalability. While currently on par with existing methods in in vivo binding prediction on a small scale test, we demonstrate that RCK will increasingly benefit from experimentally measured RNA structure profiles as compared to computationally predicted ones. By running RCK on the entire RNAcompete dataset, we generate and provide as a resource a set of protein-RNA structure-based models on an unprecedented scale.National Institutes of Health (U.S.) (Grant R01GM081871

    Characterization of mammalian Lipocalin UTRs in silico: Predictions for their role in posttranscriptional regulation

    Get PDF
    The Lipocalin family is a group of homologous proteins characterized by its big array of functional capabilities. As extracellular proteins, they can bind small hydrophobic ligands through a well-conserved β-barrel folding. Lipocalins evolutionary history sprawls across many different taxa and shows great divergence even within chordates. This variability is also found in their heterogeneous tissue expression pattern. Although a handful of promoter regions have been previously described, studies on UTR regulatory roles in Lipocalin gene expression are scarce. Here we report a comprehensive bioinformatic analysis showing that complex post-transcriptional regulation exists in Lipocalin genes, as suggested by the presence of alternative UTRs with substantial sequence conservation in mammals, alongside a high diversity of transcription start sites and alternative promoters. Strong selective pressure could have operated upon Lipocalins UTRs, leading to an enrichment in particular sequence motifs that limit the choice of secondary structures. Mapping these regulatory features to the expression pattern of early and late diverging Lipocalins suggests that UTRs represent an additional phylogenetic signal, which may help to uncover how functional pleiotropy originated within the Lipocalin family.Ministerio de Ciencia e Innovación BFU2015-68149-RMinisterio de Economía, Industria y Competitividad, Gobierno de España BFU2011-2397

    Design of RNAs: comparing programs for inverse RNA folding.

    Get PDF
    International audienceComputational programs for predicting RNA sequences with desired folding properties have been extensively developed and expanded in the past several years. Given a secondary structure, these programs aim to predict sequences that fold into a target minimum free energy secondary structure, while considering various constraints. This procedure is called inverse RNA folding. Inverse RNA folding has been traditionally used to design optimized RNAs with favorable properties, an application that is expected to grow considerably in the future in light of advances in the expanding new fields of synthetic biology and RNA nanostructures. Moreover, it was recently demonstrated that inverse RNA folding can successfully be used as a valuable preprocessing step in computational detection of novel noncoding RNAs. This review describes the most popular freeware programs that have been developed for such purposes, starting from RNAinverse that was devised when formulating the inverse RNA folding problem. The most recently published ones that consider RNA secondary structure as input are antaRNA, RNAiFold and incaRNAfbinv, each having different features that could be beneficial to specific biological problems in practice. The various programs also use distinct approaches, ranging from ant colony optimization to constraint programming, in addition to adaptive walk, simulated annealing and Boltzmann sampling. This review compares between the various programs and provides a simple description of the various possibilities that would benefit practitioners in selecting the most suitable program. It is geared for specific tasks requiring RNA design based on input secondary structure, with an outlook toward the future of RNA design programs

    Accurate classification of RNA structures using topological fingerprints

    Get PDF
    While RNAs are well known to possess complex structures, functionally similar RNAs often have little sequence similarity. While the exact size and spacing of base-paired regions vary, functionally similar RNAs have pronounced similarity in the arrangement, or topology, of base-paired stems. Furthermore, predicted RNA structures often lack pseudoknots (a crucial aspect of biological activity), and are only partially correct, or incomplete. A topological approach addresses all of these difficulties. In this work we describe each RNA structure as a graph that can be converted to a topological spectrum (RNA fingerprint). The set of subgraphs in an RNA structure, its RNA fingerprint, can be compared with the fingerprints of other RNA structures to identify and correctly classify functionally related RNAs. Topologically similar RNAs can be identified even when a large fraction, up to 30%, of the stems are omitted, indicating that highly accurate structures are not necessary. We investigate the performance of the RNA fingerprint approach on a set of eight highly curated RNA families, with diverse sizes and functions, containing pseudoknots, and with little sequence similarity–an especially difficult test set. In spite of the difficult test set, the RNA fingerprint approach is very successful (ROC AUC \u3e 0.95). Due to the inclusion of pseudoknots, the RNA fingerprint approach both covers a wider range of possible structures than methods based only on secondary structure, and its tolerance for incomplete structures suggests that it can be applied even to predicted structures. Source code is freely available at https://github.rcac.purdue.edu/mgribsko/XIOS_RNA_fingerprint

    A hybrid approach to assess the structural impact of long noncoding RNA mutations uncovers key NEAT1 interactions in colorectal cancer

    Get PDF
    Long noncoding RNAs (lncRNAs) are emerging players in cancer and they entail potential as prognostic biomarkers or therapeutic targets. Earlier studies have identified somatic mutations in lncRNAs that are associated with tumor relapse after therapy, but the underlying mechanisms behind these associations remain unknown. Given the relevance of secondary structure for the function of some lncRNAs, some of these mutations may have a functional impact through structural disturbance. Here, we examined the potential structural and functional impact of a novel A > G point mutation in NEAT1 that has been recurrently observed in tumors of colorectal cancer patients experiencing relapse after treatment. Here, we used the nextPARS structural probing approach to provide first empirical evidence that this mutation alters NEAT1 structure. We further evaluated the potential effects of this structural alteration using computational tools and found that this mutation likely alters the binding propensities of several NEAT1-interacting miRNAs. Differential expression analysis on these miRNA networks shows upregulation of Vimentin, consistent with previous findings. We propose a hybrid pipeline that can be used to explore the potential functional effects of lncRNA somatic mutations.Funding information H2020 European Research Council, Grant/Award Number: 724173Peer ReviewedPostprint (published version

    Giant reverse transcriptase-encoding transposable elements at telomeres

    Get PDF
    Author Posting. © The Author(s), 2017. This is the author's version of the work. It is posted here by permission of Oxford University Press for personal use, not for redistribution. The definitive version was published in Molecular Biology and Evolution 34 (2017): 2245–2257, doi:10.1093/molbev/msx159.Transposable elements are omnipresent in eukaryotic genomes and have a profound impact on chromosome structure, function and evolution. Their structural and functional diversity is thought to be reasonably well-understood, especially in retroelements, which transpose via an RNA intermediate copied into cDNA by the element-encoded reverse transcriptase, and are characterized by a compact structure. Here we report a novel type of expandable eukaryotic retroelements, which we call Terminons. These elements can attach to G-rich telomeric repeat overhangs at the chromosome ends, in a process apparently facilitated by complementary C-rich repeats at the 3’-end of the RNA template immediately adjacent to a hammerhead ribozyme motif. Terminon units, which can exceed 40 kb in length, display an unusually complex and diverse structure, and can form very long chains, with host genes often captured between units. As the principal polymerizing component, Terminons contain Athena reverse transcriptases previously described in bdelloid rotifers and belonging to the enigmatic group of Penelope-like elements, but can additionally accumulate multiple co-oriented ORFs, including DEDDy 3’-exonucleases, GDSL esterases/lipases, GIY-YIG-like endonucleases, rolling-circle replication initiator (Rep) proteins, and putatively structural ORFs with coiled-coil motifs and transmembrane domains. The extraordinary length and complexity of Terminons and the high degree of inter-family variability in their ORF content challenge the current views on the structural organization of eukaryotic retroelements, and highlight their possible connections with the viral world and the implications for the elevated frequency of gene transfer.This work was supported by the National Institutes of Health (grant GM111917 to I.A.).2018-05-3

    Intermolecular base-pairing interactions, a unique topology and exoribonuclease-resistant noncoding RNAs drive formation of viral chimeric RNAs in plants

    Get PDF
    In plants, exoribonuclease-resistant RNAs (xrRNAs) are produced by many viruses. Whereas xrRNAs contribute to the pathogenicity of these viruses, the role of xrRNAs in the virus infectious cycle remains elusive.Here, we show that xrRNAs produced by a benyvirus (a multipartite RNA virus with four genomic segments) in plants are involved in the formation of monocistronic coat protein (CP)-encoding chimeric RNAs. Naturally occurring chimeric RNAs, we discovered, are composed of 5 '-end of RNA 2 and 3 '-end of either RNA 3 or RNA 4 bearing conservative exoribonuclease-resistant 'coremin' region.Using computational tools and site-directed mutagenesis, we show that de novo formation of chimeric RNAs requires intermolecular base-pairing interaction between 'coremin' and 3 '-proximal part of the CP gene of RNA 2 as well as a stem-loop structure immediately adjacent to the CP gene. Moreover, knockdown of the expression of the XRN4 gene, encoding 5 '-> 3 ' exoribonuclease, inhibits biogenesis of both xrRNAs and chimeric RNAs.Our findings suggest a novel mechanism involving a unique tropology of the intermolecular base-pairing complex between xrRNAs and RNA2 to promote formation of chimeric RNAs in plants. XrRNAs, essential for chimeric RNA biogenesis, are generated through the action of cytoplasmic Xrn 4 5 '-> 3 ' exoribonuclease conserved in all plant species
    corecore