Search CORE

29 research outputs found

Rethinking Performance Measures of RNA Secondary Structure Problems

Author: Fertmann Daniel
Franke Jörg K. H.
Hutter Frank
Runge Frederic
Publication venue
Publication date: 04/12/2023
Field of study

Accurate RNA secondary structure prediction is vital for understanding cellular regulation and disease mechanisms. Deep learning (DL) methods have surpassed traditional algorithms by predicting complex features like pseudoknots and multi-interacting base pairs. However, traditional distance measures can hardly deal with such tertiary interactions and the currently used evaluation measures (F1 score, MCC) have limitations. We propose the Weisfeiler-Lehman graph kernel (WL) as an alternative metric. Embracing graph-based metrics like WL enables fair and accurate evaluation of RNA structure prediction algorithms. Further, WL provides informative guidance, as demonstrated in an RNA design experiment.Comment: 12 pages, Accepted at the Machine Learning for Structural Biology Workshop, NeurIPS 202

arXiv.org e-Print Archive

Finding RNA structure in the unstructured RBPome

Author: Berger B.
Ohler U.
Orenstein Y.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2018
Field of study

BACKGROUND: RNA-binding proteins (RBPs) play vital roles in many processes in the cell. Different RBPs bind RNA with different sequence and structure specificities. While sequence specificities for a large set of 205 RBPs have been reported through the RNAcompete compendium, structure specificities are known for only a small fraction. The main limitation lies in the design of the RNAcompete technology, which tests RBP binding against unstructured RNA probes, making it difficult to infer structural preferences from these data. We recently developed RCK, an algorithm to infer sequence and structural binding models from RNAcompete data. The set of binding models enables, for the first time, a large-scale assessment of RNA structure in the RBPome. RESULTS: We re-validate and uncover the role of RNA structure in the RPBome through novel analysis of the largest-scale dataset to date. First, we show that RNA structure exists in presumably unstructured RNA probes and that its variability is correlated with RNA-binding. Second, we examine the structural binding preferences of RBPs and discover an overall preference to bind RNA loops. Third, we significantly improve protein-binding prediction using RNA structure, both in vitro and in vivo. Lastly, we demonstrate that RNA structural binding preferences can be inferred for new proteins from solely their amino acid content. CONCLUSIONS: By counter-intuitively demonstrating through our analysis that we can predict both the RNA structure of and RBP binding to these putatively unstructured RNAs, we transform a compendium of RNA-binding proteins into a valuable resource for structure-based binding models. We uncover the important role RNA structure plays in protein-RNA interaction for hundreds of RNA-binding proteins

DSpace@MIT

Directory of Open Access Journals

MDC Repository

RCK: accurate and efficient inference of sequence- and structure-based protein–RNA binding models from RNAcompete data

Author: Berger Leighton Bonnie
Orenstein Yaron
Wang Yuhao
Publication venue: 'Oxford University Press (OUP)'
Publication date: 16/05/2018
Field of study

Motivation: Protein-RNA interactions, which play vital roles in many processes, are mediated through both RNA sequence and structure. CLIP-based methods, which measure protein-RNA binding in vivo, suffer from experimental noise and systematic biases, whereas in vitro experiments capture a clearer signal of protein RNA-binding. Among them, RNAcompete provides binding affinities of a specific protein to more than 240 000 unstructured RNA probes in one experiment. The computational challenge is to infer RNA structure- and sequence-based binding models from these data. The state-of-the-art in sequence models, Deepbind, does not model structural preferences. RNAcontext models both sequence and structure preferences, but is outperformed by GraphProt. Unfortunately, GraphProt cannot detect structural preferences from RNAcompete data due to the unstructured nature of the data, as noted by its developers, nor can it be tractably run on the full RNACompete dataset. Results: We develop RCK, an efficient, scalable algorithm that infers both sequence and structure preferences based on a new k-mer based model. Remarkably, even though RNAcompete data is designed to be unstructured, RCK can still learn structural preferences from it. RCK significantly outperforms both RNAcontext and Deepbind in in vitro binding prediction for 244 RNAcompete experiments. Moreover, RCK is also faster and uses less memory, which enables scalability. While currently on par with existing methods in in vivo binding prediction on a small scale test, we demonstrate that RCK will increasingly benefit from experimentally measured RNA structure profiles as compared to computationally predicted ones. By running RCK on the entire RNAcompete dataset, we generate and provide as a resource a set of protein-RNA structure-based models on an unprecedented scale.National Institutes of Health (U.S.) (Grant R01GM081871

DSpace@MIT

Characterization of mammalian Lipocalin UTRs in silico: Predictions for their role in posttranscriptional regulation

Author: Diez-Hermano Sergio
Ganfornina María Dolores
Gutiérrez Pozo Gabriel
Mejías Romero Andrés
Sánchez Diego
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

The Lipocalin family is a group of homologous proteins characterized by its big array of functional capabilities. As extracellular proteins, they can bind small hydrophobic ligands through a well-conserved β-barrel folding. Lipocalins evolutionary history sprawls across many different taxa and shows great divergence even within chordates. This variability is also found in their heterogeneous tissue expression pattern. Although a handful of promoter regions have been previously described, studies on UTR regulatory roles in Lipocalin gene expression are scarce. Here we report a comprehensive bioinformatic analysis showing that complex post-transcriptional regulation exists in Lipocalin genes, as suggested by the presence of alternative UTRs with substantial sequence conservation in mammals, alongside a high diversity of transcription start sites and alternative promoters. Strong selective pressure could have operated upon Lipocalins UTRs, leading to an enrichment in particular sequence motifs that limit the choice of secondary structures. Mapping these regulatory features to the expression pattern of early and late diverging Lipocalins suggests that UTRs represent an additional phylogenetic signal, which may help to uncover how functional pleiotropy originated within the Lipocalin family.Ministerio de Ciencia e Innovación BFU2015-68149-RMinisterio de Economía, Industria y Competitividad, Gobierno de España BFU2011-2397

Docta Complutense

Directory of Open Access Journals

Digital.CSIC

idUS. Depósito de Investigación Universidad de Sevilla

FigShare

Design of RNAs: comparing programs for inverse RNA folding.

Author: Barash Danny
Churkin Alexander
Ponty Yann
Reinharz Vladimir
Retwitzer Matan Drory
Waldispühl Jérôme
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2017
Field of study

International audienceComputational programs for predicting RNA sequences with desired folding properties have been extensively developed and expanded in the past several years. Given a secondary structure, these programs aim to predict sequences that fold into a target minimum free energy secondary structure, while considering various constraints. This procedure is called inverse RNA folding. Inverse RNA folding has been traditionally used to design optimized RNAs with favorable properties, an application that is expected to grow considerably in the future in light of advances in the expanding new fields of synthetic biology and RNA nanostructures. Moreover, it was recently demonstrated that inverse RNA folding can successfully be used as a valuable preprocessing step in computational detection of novel noncoding RNAs. This review describes the most popular freeware programs that have been developed for such purposes, starting from RNAinverse that was devised when formulating the inverse RNA folding problem. The most recently published ones that consider RNA secondary structure as input are antaRNA, RNAiFold and incaRNAfbinv, each having different features that could be beneficial to specific biological problems in practice. The various programs also use distinct approaches, ranging from ant colony optimization to constraint programming, in addition to adaptive walk, simulated annealing and Boltzmann sampling. This review compares between the various programs and provides a simple description of the various possibilities that would benefit practitioners in selecting the most suitable program. It is geared for specific tasks requiring RNA design based on input secondary structure, with an outlook toward the future of RNA design programs

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Polytechnique

Isolation and Characterisation of Alongshan Virus in Russia

Author: Bell-Sakyi Lesley
Belova Oxana A
Bespyatova Liubov A
Bugmyrin Sergey V
Gmyl Anatoly P
Gmyl Larissa V
Gushchin Vladimir A
Ivannikova Anna Y
Karganova Galina G
Kholodilov Ivan S
Klimentov Alexander S
Litov Alexander G
Luchinina Svetlana V
Nikitin Nikolai A
Polienko Alexandra E
Shchetinin Alexey M
Yakovlev Alexander S
Publication venue: 'MDPI AG'
Publication date: 01/04/2020
Field of study

University of Liverpool Repository

Accurate classification of RNA structures using topological fingerprints

While RNAs are well known to possess complex structures, functionally similar RNAs often have little sequence similarity. While the exact size and spacing of base-paired regions vary, functionally similar RNAs have pronounced similarity in the arrangement, or topology, of base-paired stems. Furthermore, predicted RNA structures often lack pseudoknots (a crucial aspect of biological activity), and are only partially correct, or incomplete. A topological approach addresses all of these difficulties. In this work we describe each RNA structure as a graph that can be converted to a topological spectrum (RNA fingerprint). The set of subgraphs in an RNA structure, its RNA fingerprint, can be compared with the fingerprints of other RNA structures to identify and correctly classify functionally related RNAs. Topologically similar RNAs can be identified even when a large fraction, up to 30%, of the stems are omitted, indicating that highly accurate structures are not necessary. We investigate the performance of the RNA fingerprint approach on a set of eight highly curated RNA families, with diverse sizes and functions, containing pseudoknots, and with little sequence similarity–an especially difficult test set. In spite of the difficult test set, the RNA fingerprint approach is very successful (ROC AUC \u3e 0.95). Due to the inclusion of pseudoknots, the RNA fingerprint approach both covers a wider range of possible structures than methods based only on secondary structure, and its tolerance for incomplete structures suggests that it can be applied even to predicted structures. Source code is freely available at https://github.rcac.purdue.edu/mgribsko/XIOS_RNA_fingerprint

Crossref

Directory of Open Access Journals

PubMed Central

Purdue E-Pubs

FigShare

A hybrid approach to assess the structural impact of long noncoding RNA mutations uncovers key NEAT1 interactions in colorectal cancer

Author: Aydın Efe
Chorostecki Uciel
Gabaldon Toni
Saus Ester
Publication venue: Wiley
Publication date: 01/01/2023
Field of study

Long noncoding RNAs (lncRNAs) are emerging players in cancer and they entail potential as prognostic biomarkers or therapeutic targets. Earlier studies have identified somatic mutations in lncRNAs that are associated with tumor relapse after therapy, but the underlying mechanisms behind these associations remain unknown. Given the relevance of secondary structure for the function of some lncRNAs, some of these mutations may have a functional impact through structural disturbance. Here, we examined the potential structural and functional impact of a novel A > G point mutation in NEAT1 that has been recurrently observed in tumors of colorectal cancer patients experiencing relapse after treatment. Here, we used the nextPARS structural probing approach to provide first empirical evidence that this mutation alters NEAT1 structure. We further evaluated the potential effects of this structural alteration using computational tools and found that this mutation likely alters the binding propensities of several NEAT1-interacting miRNAs. Differential expression analysis on these miRNA networks shows upregulation of Vimentin, consistent with previous findings. We propose a hybrid pipeline that can be used to explore the potential functional effects of lncRNA somatic mutations.Funding information H2020 European Research Council, Grant/Award Number: 724173Peer ReviewedPostprint (published version

Lund University Publications

UPCommons. Portal del coneixement obert de la UPC

Giant reverse transcriptase-encoding transposable elements at telomeres

Author: Arkhipova Irina R.
Rodriguez Fernando
Yushenova Irina A.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/05/2017
Field of study

Author Posting. © The Author(s), 2017. This is the author's version of the work. It is posted here by permission of Oxford University Press for personal use, not for redistribution. The definitive version was published in Molecular Biology and Evolution 34 (2017): 2245–2257, doi:10.1093/molbev/msx159.Transposable elements are omnipresent in eukaryotic genomes and have a profound impact on chromosome structure, function and evolution. Their structural and functional diversity is thought to be reasonably well-understood, especially in retroelements, which transpose via an RNA intermediate copied into cDNA by the element-encoded reverse transcriptase, and are characterized by a compact structure. Here we report a novel type of expandable eukaryotic retroelements, which we call Terminons. These elements can attach to G-rich telomeric repeat overhangs at the chromosome ends, in a process apparently facilitated by complementary C-rich repeats at the 3’-end of the RNA template immediately adjacent to a hammerhead ribozyme motif. Terminon units, which can exceed 40 kb in length, display an unusually complex and diverse structure, and can form very long chains, with host genes often captured between units. As the principal polymerizing component, Terminons contain Athena reverse transcriptases previously described in bdelloid rotifers and belonging to the enigmatic group of Penelope-like elements, but can additionally accumulate multiple co-oriented ORFs, including DEDDy 3’-exonucleases, GDSL esterases/lipases, GIY-YIG-like endonucleases, rolling-circle replication initiator (Rep) proteins, and putatively structural ORFs with coiled-coil motifs and transmembrane domains. The extraordinary length and complexity of Terminons and the high degree of inter-family variability in their ORF content challenge the current views on the structural organization of eukaryotic retroelements, and highlight their possible connections with the viral world and the implications for the elevated frequency of gene transfer.This work was supported by the National Institutes of Health (grant GM111917 to I.A.).2018-05-3

Woods Hole Open Access Server

Intermolecular base-pairing interactions, a unique topology and exoribonuclease-resistant noncoding RNAs drive formation of viral chimeric RNAs in plants

Author: Gil José Fernando
Mansi Mansi
Nemes Katalin
Poimenopoulou Efstratia
Savenkov Eugene
Publication venue
Publication date: 01/01/2024
Field of study

In plants, exoribonuclease-resistant RNAs (xrRNAs) are produced by many viruses. Whereas xrRNAs contribute to the pathogenicity of these viruses, the role of xrRNAs in the virus infectious cycle remains elusive.Here, we show that xrRNAs produced by a benyvirus (a multipartite RNA virus with four genomic segments) in plants are involved in the formation of monocistronic coat protein (CP)-encoding chimeric RNAs. Naturally occurring chimeric RNAs, we discovered, are composed of 5 '-end of RNA 2 and 3 '-end of either RNA 3 or RNA 4 bearing conservative exoribonuclease-resistant 'coremin' region.Using computational tools and site-directed mutagenesis, we show that de novo formation of chimeric RNAs requires intermolecular base-pairing interaction between 'coremin' and 3 '-proximal part of the CP gene of RNA 2 as well as a stem-loop structure immediately adjacent to the CP gene. Moreover, knockdown of the expression of the XRN4 gene, encoding 5 '-> 3 ' exoribonuclease, inhibits biogenesis of both xrRNAs and chimeric RNAs.Our findings suggest a novel mechanism involving a unique tropology of the intermolecular base-pairing complex between xrRNAs and RNA2 to promote formation of chimeric RNAs in plants. XrRNAs, essential for chimeric RNA biogenesis, are generated through the action of cytoplasmic Xrn 4 5 '-> 3 ' exoribonuclease conserved in all plant species

Epsilon Open Archive