Search CORE

144 research outputs found

Comparative analysis of RNA secondary structure accuracy on predicted RNA 3D models

Author: Deb Indrajit
Kulkarni Mandar
Thangappan Jayaraman
Wu Sangwook
Publication venue
Publication date: 01/01/2023
Field of study

University of Dundee Online Publications

Recommended from our members

Probing of RNA structures in a positive sense RNA virus reveals selection pressures for structural elements.

Author: Aviran Sharon
Choudhary Krishna
Lucks Julius B
Perry Keith L
Thompson Jeremy R
Watters Kyle E
Publication venue: eScholarship, University of California
Publication date: 01/03/2018
Field of study

In single stranded (+)-sense RNA viruses, RNA structural elements (SEs) play essential roles in the infection process from replication to encapsidation. Using selective 2'-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq) and covariation analysis, we explore the structural features of the third genome segment of cucumber mosaic virus (CMV), RNA3 (2216 nt), both in vitro and in plant cell lysates. Comparing SHAPE-Seq and covariation analysis results revealed multiple SEs in the coat protein open reading frame and 3' untranslated region. Four of these SEs were mutated and serially passaged in Nicotiana tabacum plants to identify biologically selected changes to the original mutated sequences. After passaging, loop mutants showed partial reversion to their wild-type sequence and SEs that were structurally disrupted by mutations were restored to wild-type-like structures via synonymous mutations in planta. These results support the existence and selection of virus open reading frame SEs in the host organism and provide a framework for further studies on the role of RNA structure in viral infection. Additionally, this work demonstrates the applicability of high-throughput chemical probing in plant cell lysates and presents a new method for calculating SHAPE reactivities from overlapping reverse transcriptase priming sites

eScholarship - University of California

From RNA folding to inverse folding: a computational study: Folding and design of RNA molecules

Author: Nono Saha Cyrille Merleau
Publication venue
Publication date: 10/02/2023
Field of study

Since the discovery of the structure of DNA in the early 1953s and its double-chained complement of information hinting at its means of replication, biologists have recognized the strong connection between molecular structure and function. In the past two decades, there has been a surge of research on an ever-growing class of RNA molecules that are non-coding but whose various folded structures allow a diverse array of vital functions. From the well-known splicing and modification of ribosomal RNA, non-coding RNAs (ncRNAs) are now known to be intimately involved in possibly every stage of DNA translation and protein transcription, as well as RNA signalling and gene regulation processes. Despite the rapid development and declining cost of modern molecular methods, they typically can only describe ncRNA's structural conformations in vitro, which differ from their in vivo counterparts. Moreover, it is estimated that only a tiny fraction of known ncRNAs has been documented experimentally, often at a high cost. There is thus a growing realization that computational methods must play a central role in the analysis of ncRNAs. Not only do computational approaches hold the promise of rapidly characterizing many ncRNAs yet to be described, but there is also the hope that by understanding the rules that determine their structure, we will gain better insight into their function and design. Many studies revealed that the ncRNA functions are performed by high-level structures that often depend on their low-level structures, such as the secondary structure. This thesis studies the computational folding mechanism and inverse folding of ncRNAs at the secondary level. In this thesis, we describe the development of two bioinformatic tools that have the potential to improve our understanding of RNA secondary structure. These tools are as follows: (1) RAFFT for efficient prediction of pseudoknot-free RNA folding pathways using the fast Fourier transform (FFT)}; (2) aRNAque, an evolutionary algorithm inspired by Lévy flights for RNA inverse folding with or without pseudoknot (A secondary structure that often poses difficulties for bio-computational detection). The first tool, RAFFT, implements a novel heuristic to predict RNA secondary structure formation pathways that has two components: (i) a folding algorithm and (ii) a kinetic ansatz. When considering the best prediction in the ensemble of 50 secondary structures predicted by RAFFT, its performance matches the recent deep-learning-based structure prediction methods. RAFFT also acts as a folding kinetic ansatz, which we tested on two RNAs: the CFSE and a classic bi-stable sequence. In both test cases, fewer structures were required to reproduce the full kinetics, whereas known methods (such as Treekin) required a sample of 20,000 structures and more. The second tool, aRNAque, implements an evolutionary algorithm (EA) inspired by the Lévy flight, allowing both local global search and which supports pseudoknotted target structures. The number of point mutations at every step of aRNAque's EA is drawn from a Zipf distribution. Therefore, our proposed method increases the diversity of designed RNA sequences and reduces the average number of evaluations of the evolutionary algorithm. The overall performance showed improved empirical results compared to existing tools through intensive benchmarks on both pseudoknotted and pseudoknot-free datasets. In conclusion, we highlight some promising extensions of the versatile RAFFT method to RNA-RNA interaction studies. We also provide an outlook on both tools' implications in studying evolutionary dynamics

Qucosa - Publikationsserver der Universität Leipzig

The 3′ Splice Site of Influenza A Segment 7 mRNA Can Exist in Two Conformations: A Pseudoknot and a Hairpin

Author: A Dallas
A Honig
A Pasternak
A Pasternak
A Watakabe
AF Muro
Alfred Lewin
AP Gultyaev
AP Gultyaev
AS Abu Almakarem
B Streicher
C Chen
C Ehresmann
CA Theimer
CA Theimer
DD Loeb
DH Mathews
DH Turner
DN Frank
Douglas H. Turner
E Buratti
E Kierzek
E Kierzek
E Kierzek
E Kierzek
EJ Merino
Elzbieta Kierzek
G Stiver
HY Mei
J Ciesiolka
J Dushoff
JA Holland
JL Chen
JL Childs
JL Jenkins
JO Deshler
K Katoh
KA Wilkinson
L Domenjoud
L Jaeger
L Kierebom
L Kirsebom
LP Labuda
Lumbini I. Dela-Moss
LW Hung
M Frugier
M Madjid
MA Larkin
MB Warf
MB Warf
MD Disney
MD Disney
MD Disney
MR Hilleman
MT Cheah
N Ban
NB Ulyanov
NC Robb
NN Singh
NP Johnson
NR Pace
P Massin
P Palese
PL Nixon
PV Cornish
Q Ge
QS Du
R Liang
RA Fouchier
RA Lamb
RL Gonzalez Jr
RM Krug
RR Gutell
RS Brown
Ryszard Kierzek
S Aggarwal
S Barik
S Cao
S Cao
S Jacquenet
S Rudisser
SA Woodson
Salvatore F. Priore
SB Jang
SF Priore
SJ Sucheck
SL Heilman-Miller
SL Heilman-Miller
SM Tompkins
SR Shih
SR Shih
T Hermann
T Xia
U Nagaswamy
V Girish
W Winkler
WA Ziehler
Walter N. Moss
WD Wilson
WN Moss
WW Thompson
WW Thompson
X Tan
Y Bao
YV Lerman
Publication venue: Public Library of Science
Publication date: 07/06/2012
Field of study

The 3′ splice site of influenza A segment 7 is used to produce mRNA for the M2 ion-channel protein, which is critical to the formation of viable influenza virions. Native gel analysis, enzymatic/chemical structure probing, and oligonucleotide binding studies of a 63 nt fragment, containing the 3′ splice site, key residues of an SF2/ASF splicing factor binding site, and a polypyrimidine tract, provide evidence for an equilibrium between pseudoknot and hairpin structures. This equilibrium is sensitive to multivalent cations, and can be forced towards the pseudoknot by addition of 5 mM cobalt hexammine. In the two conformations, the splice site and other functional elements exist in very different structural environments. In particular, the splice site is sequestered in the middle of a double helix in the pseudoknot conformation, while in the hairpin it resides in a two-by-two nucleotide internal loop. The results suggest that segment 7 mRNA splicing can be controlled by a conformational switch that exposes or hides the splice site

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Geometric combinatorics and computational molecular biology: branching polytopes for RNA sequences

Author: Drellich Elizabeth
Gainer-Dewar Andrew
Harrington Heather A.
He Qijun
Heitsch Christine
Poznanović Svetlana
Publication venue
Publication date: 16/06/2016
Field of study

Questions in computational molecular biology generate various discrete optimization problems, such as DNA sequence alignment and RNA secondary structure prediction. However, the optimal solutions are fundamentally dependent on the parameters used in the objective functions. The goal of a parametric analysis is to elucidate such dependencies, especially as they pertain to the accuracy and robustness of the optimal solutions. Techniques from geometric combinatorics, including polytopes and their normal fans, have been used previously to give parametric analyses of simple models for DNA sequence alignment and RNA branching configurations. Here, we present a new computational framework, and proof-of-principle results, which give the first complete parametric analysis of the branching portion of the nearest neighbor thermodynamic model for secondary structure prediction for real RNA sequences.Comment: 17 pages, 8 figure

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Graphical methods in RNA structure matching

Author: Huang Jiajie
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2016
Field of study

Eukaryotic genomes are pervasively transcribed; almost every base can be found in an RNA transcript. This is a surprising observation since most of the genome does not encode proteins. This RNA must serve an important regulatory function – important because producing non-coding RNA is an energy intensive process, and in the absence of strong selection one would expect it to disappear. RNA families with common functions have specifically conserved structural motifs, which are directly related to the functional roles of RNA in catalysis and regulation. Because the conserved structures depend on base-pairing, similar RNA structures may have little or no detectable sequence similarity, making the identification of conserved RNAs difficult. This is a particularly serious problem when studying regulatory structures in RNA. In many cases, such as that of cellular internal ribosome entry sites, although we can identify RNAs that have similar regulatory responses, it is difficult to tell whether the RNAs have common structural features using current methods. Available tools for identifying common structures based on RNA sequence suffer from one or more of the following problems: they do not consider pseudoknots, which are important in many catalytic and regulatory structures; they do not consider near minimum free energy structures, which is important as many RNAs exist as an ensemble of structures of nearly equal energy; they require many examples of known structures in order to train a computational model; they require impractical amounts of computational time, precluding their use on long sequences or genomic scale; or they use a similarity function that cannot identify RNAs as having similar structure, even when they are from one of the well characterized known classes. The approach presented here has the potential to address all of these issues, allowing novel RNA structures that are shared between RNAs with little or no sequence similarity to be discovered. This provides a powerful tool to investigate and explain the pervasive transcription observed in eukaryotic genomes

Purdue E-Pubs

Computing the Partition Function for Kinetically Trapped RNA Secondary Structures

Author: Clote Peter
Lorenz William A.
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

An RNA secondary structure is locally optimal if there is no lower energy structure that can be obtained by the addition or removal of a single base pair, where energy is defined according to the widely accepted Turner nearest neighbor model. Locally optimal structures form kinetic traps, since any evolution away from a locally optimal structure must involve energetically unfavorable folding steps. Here, we present a novel, efficient algorithm to compute the partition function over all locally optimal secondary structures of a given RNA sequence. Our software, RNAlocopt runs in time and space. Additionally, RNAlocopt samples a user-specified number of structures from the Boltzmann subensemble of all locally optimal structures. We apply RNAlocopt to show that (1) the number of locally optimal structures is far fewer than the total number of structures – indeed, the number of locally optimal structures approximately equal to the square root of the number of all structures, (2) the structural diversity of this subensemble may be either similar to or quite different from the structural diversity of the entire Boltzmann ensemble, a situation that depends on the type of input RNA, (3) the (modified) maximum expected accuracy structure, computed by taking into account base pairing frequencies of locally optimal structures, is a more accurate prediction of the native structure than other current thermodynamics-based methods. The software RNAlocopt constitutes a technical breakthrough in our study of the folding landscape for RNA secondary structures. For the first time, locally optimal structures (kinetic traps in the Turner energy model) can be rapidly generated for long RNA sequences, previously impossible with methods that involved exhaustive enumeration. Use of locally optimal structure leads to state-of-the-art secondary structure prediction, as benchmarked against methods involving the computation of minimum free energy and of maximum expected accuracy. Web server and source code available at http://bioinformatics.bc.edu/clotelab/RNAlocopt/

CiteSeerX

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Recommended from our members

PATTERNA: transcriptome-wide search for functional RNA elements via structural data signatures.

Author: Aviran Sharon
Ledda Mirko
Publication venue: eScholarship, University of California
Publication date: 01/03/2018
Field of study

Establishing a link between RNA structure and function remains a great challenge in RNA biology. The emergence of high-throughput structure profiling experiments is revolutionizing our ability to decipher structure, yet principled approaches for extracting information on structural elements directly from these data sets are lacking. We present PATTERNA, an unsupervised pattern recognition algorithm that rapidly mines RNA structure motifs from profiling data. We demonstrate that PATTERNA detects motifs with an accuracy comparable to commonly used thermodynamic models and highlight its utility in automating data-directed structure modeling from large data sets. PATTERNA is versatile and compatible with diverse profiling techniques and experimental conditions

eScholarship - University of California

Analysis of Genomic and Proteomic Sequences using DSP Techniques

Author: Kakumani Raja Sekhar
Publication venue
Publication date: 12/03/2013
Field of study

Analysis of biological sequences by detecting the hidden periodicities and symbolic patterns has been an active area of research since couple of decades. The hidden periodic components and the patterns help locating the biologically relevant motifs such as protein coding regions (exons), CpG islands (CGI) and hot-spots that characterize various biological functions. The discrete nature of biological sequences has prompted many researchers to use digital signal processing (DSP) techniques for their analysis. After mapping the biological sequences to numerical sequences, various DSP techniques using digital filters, wavelets, neural networks, filter banks etc. have been developed to detect the hidden periodicities and recurring patterns in these sequences. This thesis attempts to develop effective DSP based techniques to solve some of the important problems in biological sequence analysis. Specifically, DSP techniques such as statistically optimal null filters (SONF), matched filters and neural networks based algorithms are developed for the analysis of deoxyribonucleic acid (DNA), ribonucleic acid (RNA) and protein sequences. In the first part of this study, DNA sequences are investigated in order to identify the locations of CGIs and protein coding regions, i.e., exons. SONFs, which are known for their ability to efficiently estimate short-duration signals embedded in noise by combining the maximum signal-to-noise ratio and the least squares optimization criteria, are utilized to solve these problems. Basis sequences characterizing CGIs and exons are formulated to be used in SONF technique for solving the problems. In the second part of this study, RNA sequences are analyzed to predict their secondary structures. For this purpose, matched filters based on 2-dimensional convolution are developed to identify the locations of stem and loop patterns in the RNA secondary structure. The knowledge of the stem and loop patterns thus obtained are then used to predict the presence of pseudoknot, leading to the determination of the entire RNA secondary structure. Finally, in the third part of this thesis, protein sequences are analyzed to solve the problems of predicting protein secondary structure and identifying the locations of hot-spots. For predicting the protein secondary structure a two-stage neural network scheme is developed, whereas for predicting the locations of hot-spots an SONF based approach is proposed. Hot-spots in proteins exhibit a characteristic frequency corresponding to their biological function. A basis function is formulated based on this characteristic frequency to be used in SONFs to detect the locations of hot-spots belonging to the corresponding functional group. Extensive experiments are performed throughout the thesis to demonstrate the effectiveness and validity of the various schemes and techniques developed in this investigation. The performance of the proposed techniques is compared with that of the previously reported techniques for the analysis of biological sequences. For this purpose, the results obtained are validated using databases containing with known annotations. It is shown that the proposed schemes result in performance superior to those of some of the existing techniques

Concordia University Research Repository