40 research outputs found

    Automatic Exploration of the Natural Variability of RNA Non-Canonical Geometric Patterns with a Parameterized Sampling Technique

    Get PDF
    Motivation. Recurrent substructures in RNA, known as 3D motifs, consist of networks of base pair interactions and are critical to understanding the relationship between structure and function. Their structure is naturally expressed as a graph which has led to many graph-based algorithms to automatically catalog identical motifs found in 3D structures. Yet, due to the complexity of the problem, state-of-the-art methods are often optimized to find exact matches, limiting the search to a subset of potential solutions, or do not allow explicit control over the desired variability. Results. We developed FuzzTree, a method able to efficiently sample approximate instances of an RNA motif, abstracted as a subgraph within a target RNA structure. It is the first method that allows explicit control over (1) the admissible geometric variability in the interactions; (2) the number of missing edges; and (3) the introduction of discontinuities in the backbone given close distances in the 3D structure. Our tool relies on a multidimensional Boltzmann sampling, having complexity parameterized by the treewidth of the requested motif. We applied our method to the well-known internal loop Kink-Turn motif, which can be divided into 12 subgroups. Given only the graph representing the main Kink-Turn subgroup, FuzzTree retrieved over 3/4 of all kink-turns. We also highlighted two occurrences of new sampled patterns. Our tool is available as free software and can be customized for different parameters and types of graphs

    Combining structure probing data on RNA mutants with evolutionary information reveals RNA-binding interfaces

    Get PDF
    International audienceSystematic structure probing experiments (e.g. SHAPE) of RNA mutants such as the mutate-and-map protocol give us a direct access into the genetic robustness of ncRNA structures. Comparative studies of homologous sequences provide a distinct, yet complementary, approach to analyze structural and functional properties of non-coding RNAs. In this paper, we introduce a formal framework to combine the biochemical signal collected from mutate-and-map experiments, with the evolutionary information available in multiple sequence alignments. We apply neutral theory principles to detect complex long-range dependencies between nucleotides of a single stranded RNA, and implement these ideas into a software called aRNhAck. We illustrate the biological significance of this signal and show that the nucleotides networks calculated with aRNhAck are correlated with nucleotides located in RNA-RNA, RNA-protein, RNA-DNA and RNA-ligand interfaces. aRNhAck is freely available at http://csb.cs.mcgill.ca/arnhack

    Using Structural and Evolutionary Information to Detect and Correct Pyrosequencing Errors in Noncoding RNAs.

    Get PDF
    Extended version of RECOMB'13International audienceThe analysis of the sequence-structure relationship in RNA molecules is not only essential for evolutionary studies but also for concrete applications such as error-correction in next generation sequencing (NGS) technologies. The prohibitive sizes of the mutational and conformational landscapes, combined with the volume of data to process, require efficient algorithms to compute sequence-structure properties. In this article, we address the correction of NGS errors by calculating which mutations most increase the likelihood of a sequence to a given structure and RNA family. We introduce RNApyro, an efficient, linear time and space inside-outside algorithm that computes exact mutational probabilities under secondary structure and evolutionary constraints given as a multiple sequence alignment with a consensus structure. We develop a scoring scheme combining classical stacking base-pair energies to novel isostericity scores and apply our techniques to correct pointwise errors in 5s and 16s rRNA sequences. Our results suggest that RNApyro is a promising algorithm to complement existing tools in the NGS error-correction pipeline

    A linear inside-outside algorithm for correcting sequencing errors in structured RNA sequences

    Get PDF
    International audienceAnalysis of the sequence-structure relationship in RNA molecules are essential to evolutionary studies but also to concrete applications such as error-correction methodologies in sequencing technologies. The prohibitive sizes of the mutational and conformational landscapes combined with the volume of data to proceed require e cient algorithms to compute sequence-structure properties. More speci cally, here we aim to calculate which mutations increase the most the likelihood of a sequence to a given structure and RNA family. In this paper, we introduce RNApyro, an e cient linear-time and space inside-outside algorithm that computes exact mutational probabilities under secondary structure and evolutionary constraints given as a multiple sequence alignment with a consensus structure. We develop a scoring scheme combining classical stacking base pair energies to novel isostericity scales, and apply our techniques to correct point-wise errors in 5s rRNA sequences. Our results suggest that RNApyro is a promising algorithm to complement existing tools in the NGS error-correction pipeline

    incaRNAfbinv : a web server for the fragment-based design of RNA sequences

    Get PDF
    International audienceIn recent years, new methods for computational RNA design have been developed and applied to various problems in synthetic biology and nanotechnology. Lately, there is considerable interest in incorporating essential biological information when solving the inverse RNA folding problem. Correspondingly, RNAfbinv aims at including biologically meaningful constraints and is the only program to-date that performs a fragment-based design of RNA sequences. In doing so it allows the design of sequences that do not necessarily exactly fold into the target, as long as the overall coarse-grained tree graph shape is preserved. Augmented by the weighted sampling algorithm of incaRNAtion, our web server called incaRNAfbinv implements the method devised in RNAfbinv and offers an interactive environment for the inverse folding of RNA using a fragment-based design approach. It takes as input: a target RNA secondary structure; optional sequence and motif constraints; optional target minimum free energy, neutrality, and GC content. In addition to the design of synthetic regulatory sequences, it can be used as a pre-processing step for the detection of novel natural occurring RNAs. The two complementary methodologies RNAfbinv and incaRNAtion are merged together and fully implemented in our web server incaRNAfbinv, available at http://www.cs.bgu. ac.il/incaRNAfbinv

    AlphaFold2 can predict single-mutation effects on structure and phenotype

    Full text link
    AlphaFold2 (AF) is a promising tool, but is it accurate enough to predict single mutation effects? Here, we report that a measure for localized structural deformation between protein pairs differing by only 1-3 mutations is correlated across 4,645 experimental and AF-predicted structures. Furthermore, analysis of \sim11,000 proteins shows that the local structural change correlates with various phenotypic changes. These findings suggest that AF can predict the magnitude of single-mutation effects in many proteins, and we propose a method to identify those proteins for which AF is most predictive

    Design of RNAs: comparing programs for inverse RNA folding.

    Get PDF
    International audienceComputational programs for predicting RNA sequences with desired folding properties have been extensively developed and expanded in the past several years. Given a secondary structure, these programs aim to predict sequences that fold into a target minimum free energy secondary structure, while considering various constraints. This procedure is called inverse RNA folding. Inverse RNA folding has been traditionally used to design optimized RNAs with favorable properties, an application that is expected to grow considerably in the future in light of advances in the expanding new fields of synthetic biology and RNA nanostructures. Moreover, it was recently demonstrated that inverse RNA folding can successfully be used as a valuable preprocessing step in computational detection of novel noncoding RNAs. This review describes the most popular freeware programs that have been developed for such purposes, starting from RNAinverse that was devised when formulating the inverse RNA folding problem. The most recently published ones that consider RNA secondary structure as input are antaRNA, RNAiFold and incaRNAfbinv, each having different features that could be beneficial to specific biological problems in practice. The various programs also use distinct approaches, ranging from ant colony optimization to constraint programming, in addition to adaptive walk, simulated annealing and Boltzmann sampling. This review compares between the various programs and provides a simple description of the various possibilities that would benefit practitioners in selecting the most suitable program. It is geared for specific tasks requiring RNA design based on input secondary structure, with an outlook toward the future of RNA design programs

    Algorithmic properties of evolved structured RNAs

    No full text
    Ribonucleic acids (RNAs) are ubiquitous in every living organism and perform or mediate a wealth of essential biological functions.To achieve these functions, some RNA families fold in complex three-dimensional conformations which are mostly determined by physical forces. For those families, functionality is conserved through structure rather than sequence, posing a challenge for traditional comparative studies.But they also share a history, through evolution which consolidates the sequences in each family. To better understand the extant structured RNA functional families, an analysis of their sequence-structure maps is necessary but arduous due to their complexity and size. In this thesis, we address key questions related to the RNA sequence-structure maps and develop efficient algorithmic principles to address them.The resulting methods combine signals extracted from evolutionary information, thermodynamic energies, and experimental data on mutational disruptions.Our contributed algorithms leverage dynamic programming techniques, information theory principle of mutual information, and heuristics on directed graphs. We apply the resulting tools to correct of sequencing errors, design structured RNAs, detect binding interfaces and discover/annotate conserved interaction networks in 3D models. This thesis opens new avenues of research in structural prediction, sequence design, and raises methodological questions regarding the treatment of non-structural dependencies, and the development of a novel library of conserved RNA structural fragments.Les acides ribonucléiques (ARN) sont omniprésents au sein des organismes vivants, où ils exécutent, ou servent d'intermédiaires, à un grand nombre de fonctions biologiques.A cette fin, certaines catégories d'ARN se replient et forment des conformations tri-dimensionnelles complexes, principalement déterminées par leur propriétés physico-chimiques. Au sein de ces familles, l'homologie fonctionnelle se traduit davantage par une conservation de structure que par une conservation de séquence, ce qui constitue une limite importante des approches comparatives classiques. Mais ces familles d'ARN partagent aussi une histoire commune, et le cours de l'évolution a participé à consolider leur séquences.Afin de mieux comprendre les familles fonctionnelles existantes d'ARN structurés, une analyse de la relation séquence-structure est nécessaire, mais une telle analyse se heurte à de nombreuses difficultés liées aux grandes complexité de cette relation, ainsi qu'à la taille des espaces concernés.Dans cette thèse, nous abordons des questions centrées sur la relation séquence-structure, et développons des principes algorithmiques efficaces afin d'y répondre.Ceux-ci combinent différentes information issues de l'évolution, de modèles thermodynamiques, et de données expérimentales de sondage chimique. Les algorithmes conjuguent des techniques de programmations dynamique, de théorie de l'information et de théorie algorithmique des graphes. Ils sont appliqués à la correction d'erreurs de séquençage haut-débit, le repliement inverse, la détection de zones d'interaction, et l'annotation de réseaux d'interactions conservés dans les modèles 3D d'ARN. Les résultats de cette thèse soulèvent plusieurs questions liées à la prédiction de structure, au repliement inverse de séquence, à la compréhension des dépendances non-structurelles et à la création de nouvelles librairies de fragments conservés dans les ARN structurés

    alpha beta DCA method identifies unspecific binding but specific disruption of the group I intron by the StpA chaperone

    No full text
    Chaperone proteins-the most disordered among all protein groups-help RNAs fold into their functional structure by destabilizing misfolded configurations or stabilizing the functional ones. But disentangling the mechanism underlying RNA chaperoning is challenging, mostly because of inherent disorder of the chaperones and the transient nature of their interactions with RNA. In particular, it is unclear how specific the interactions are and what role is played by amino acid charge and polarity patterns. Here, we address these questions in the RNA chaperone StpA. We adapted direct coupling analysis (DCA) into the alpha beta DCA method that can treat in tandem sequences written in two alphabets, nucleotides and amino acids. With alpha beta DCA, we could analyze StpA-RNA interactions and show consistency with a previously proposed two-pronged mechanism: StpA disrupts specific positions in the group I intron while globally and loosely binding to the entire structure. Moreover, the interactions are strongly associated with the charge pattern: Negatively charged regions in the destabilizing StpA amino-terminal affect a fewspecific positions in the RNA, located in stems and in the pseudoknot. In contrast, positive regions in the carboxy-terminal contain strongly coupled amino acids that promote nonspecific or weakly specific binding to the RNA. The present study opens new avenues to examine the functions of disordered proteins and to design disruptive proteins based on their charge patterns
    corecore