Search CORE

12,292 research outputs found

If the Current Clique Algorithms are Optimal, so is Valiant's Parser

Author: Abboud Amir
Backurs Arturs
Williams Virginia Vassilevska
Publication venue
Publication date: 05/11/2015
Field of study

The CFG recognition problem is: given a context-free grammar

\mathcal{G}

and a string

w

of length

n

, decide if

w

can be obtained from

\mathcal{G}

. This is the most basic parsing question and is a core computer science problem. Valiant's parser from 1975 solves the problem in

O(n^{\omega})

time, where

\omega<2.373

is the matrix multiplication exponent. Dozens of parsing algorithms have been proposed over the years, yet Valiant's upper bound remains unbeaten. The best combinatorial algorithms have mildly subcubic

O(n^3/\log^3{n})

complexity. Lee (JACM'01) provided evidence that fast matrix multiplication is needed for CFG parsing, and that very efficient and practical algorithms might be hard or even impossible to obtain. Lee showed that any algorithm for a more general parsing problem with running time

O(|\mathcal{G}|\cdot n^{3-\varepsilon})

can be converted into a surprising subcubic algorithm for Boolean Matrix Multiplication. Unfortunately, Lee's hardness result required that the grammar size be

|\mathcal{G}|=\Omega(n^6)

. Nothing was known for the more relevant case of constant size grammars. In this work, we prove that any improvement on Valiant's algorithm, even for constant size grammars, either in terms of runtime or by avoiding the inefficiencies of fast matrix multiplication, would imply a breakthrough algorithm for the

k

-Clique problem: given a graph on

n

nodes, decide if there are

k

that form a clique. Besides classifying the complexity of a fundamental problem, our reduction has led us to similar lower bounds for more modern and well-studied cubic time problems for which faster algorithms are highly desirable in practice: RNA Folding, a central problem in computational biology, and Dyck Language Edit Distance, answering an open question of Saha (FOCS'14)

arXiv.org e-Print Archive

Crossref

Prediction of secondary structures for large RNA molecules

Author: Mathuriya Amrita
Publication venue: Georgia Institute of Technology
Publication date: 12/01/2009
Field of study

The prediction of correct secondary structures of large RNAs is one of the unsolved challenges of computational molecular biology. Among the major obstacles is the fact that accurate calculations scale as O(n⁴), so the computational requirements become prohibitive as the length increases. We present a new parallel multicore and scalable program called GTfold, which is one to two orders of magnitude faster than the de facto standard programs mfold and RNAfold for folding large RNA viral sequences and achieves comparable accuracy of prediction. We analyze the algorithm's concurrency and describe the parallelism for a shared memory environment such as a symmetric multiprocessor or multicore chip. We are seeing a paradigm shift to multicore chips and parallelism must be explicitly addressed to continue gaining performance with each new generation of systems. We provide a rigorous proof of correctness of an optimized algorithm for internal loop calculations called internal loop speedup algorithm (ILSA), which reduces the time complexity of internal loop computations from O(n⁴) to O(n³) and show that the exact algorithms such as ILSA are executed with our method in affordable amount of time. The proof gives insight into solving these kinds of combinatorial problems. We have documented detailed pseudocode of the algorithm for predicting minimum free energy secondary structures which provides a base to implement future algorithmic improvements and improved thermodynamic model in GTfold. GTfold is written in C/C++ and freely available as open source from our website.M.S.Committee Chair: Bader, David; Committee Co-Chair: Heitsch, Christine; Committee Member: Harvey, Stephen; Committee Member: Vuduc, Richar

Scholarly Materials And Research @ Georgia Tech

Statistical mechanics of RNA folding: importance of alphabet size

Author: A.D. Ellington
B.M.R. Stadler
Chao Tang
E. Shakhnovich
E.L. Kussell
Eldon Emberly
H. Li
H. Li
H. Li
H.S. Chan
I. Tinoco
I.L. Hofacker
J. Miller
J.S. McCaskill
L.F. Landweber
M. Zuker
Ned S. Wingreen
R. Bundschuh
R. Bundschuh
R. Mélin
Ranjan Mukhopadhyay
S. Govindarajan
S.Y. Le
W. Fontana
W. Fontana
Publication venue: 'American Physical Society (APS)'
Publication date: 04/08/2003
Field of study

We construct a minimalist model of RNA secondary-structure formation and use it to study the mapping from sequence to structure. There are strong, qualitative differences between two-letter and four or six-letter alphabets. With only two kinds of bases, there are many alternate folding configurations, yielding thermodynamically stable ground-states only for a small set of structures of high designability, i.e., total number of associated sequences. In contrast, sequences made from four bases, as found in nature, or six bases have far fewer competing folding configurations, resulting in a much greater average stability of the ground state.Comment: 7 figures; uses revtex

arXiv.org e-Print Archive

Crossref

Recommended from our members

PATTERNA: transcriptome-wide search for functional RNA elements via structural data signatures.

Author: Aviran Sharon
Ledda Mirko
Publication venue: eScholarship, University of California
Publication date: 01/03/2018
Field of study

Establishing a link between RNA structure and function remains a great challenge in RNA biology. The emergence of high-throughput structure profiling experiments is revolutionizing our ability to decipher structure, yet principled approaches for extracting information on structural elements directly from these data sets are lacking. We present PATTERNA, an unsupervised pattern recognition algorithm that rapidly mines RNA structure motifs from profiling data. We demonstrate that PATTERNA detects motifs with an accuracy comparable to commonly used thermodynamic models and highlight its utility in automating data-directed structure modeling from large data sets. PATTERNA is versatile and compatible with diverse profiling techniques and experimental conditions

eScholarship - University of California

Translocation of structured polynucleotides through nanopores

Author: Akeson M
Bates M
Cate J H
Cech T R
Chuang J
Flomenbom O
Gerland U
Gerland U
Gutell R R
Hofacker I L
Karlin S
Koch S J
Lubensky D K
McCaskill J S
Meller A
Metzler R
Ralf Bundschuh
Terence Hwa
Ulrich Gerland
Walter A E
Zuker M
Publication venue: 'IOP Publishing'
Publication date: 13/02/2004
Field of study

We investigate theoretically the translocation of structured RNA/DNA molecules through narrow pores which allow single but not double strands to pass. The unzipping of basepaired regions within the molecules presents significant kinetic barriers for the translocation process. We show that this circumstance may be exploited to determine the full basepairing pattern of polynucleotides, including RNA pseudoknots. The crucial requirement is that the translocation dynamics (i.e., the length of the translocated molecular segment) needs to be recorded as a function of time with a spatial resolution of a few nucleotides. This could be achieved, for instance, by applying a mechanical driving force for translocation and recording force-extension curves (FEC's) with a device such as an atomic force microscope or optical tweezers. Our analysis suggests that with this added spatial resolution, nanopores could be transformed into a powerful experimental tool to study the folding of nucleic acids.Comment: 9 pages, 5 figure

arXiv.org e-Print Archive

Crossref

Recommended from our members

Linear-time Algorithms for RNA Folding: Partition Function, Stochastic Sampling and RNA-RNA Interaction

Author: Zhang He
Publication venue: 'Oregon State University'
Publication date
Field of study

RNAs play important roles in the central dogma of molecular biology, and are involved in multiple biology processes such as chromatin modification, transcriptional interference and translation initiation. The functions of RNAs, especially non-coding RNAs, are highly related to its secondary structures, therefore computational methods for RNA structure prediction are of great interest. In this dissertation, we propose linear-time algorithms for RNA folding partition function, stochastic sampling and RNA-RNA interaction, which can efficiently and accurately predict and analyze RNA secondary structure. The partition function-based methods are proposed to compute folding ensembles and estimate structure and base pair probabilities. However, the classical partition function algorithm scales cubically with sequence length, and is therefore a slow calculation for long sequences. We design a linear-time heuristic algorithm, LinearPartition, to approximate the partition function and base pairing probabilities, which is shown to be orders of magnitude faster than classical folding systems such as Vienna RNAfold and CONTRAfold. More interestingly, the resulting base pairing probabilities are even better correlated with the ground truth structures. On the other hand, partition function and the estimated base-pairing probabilities provide compact representations of the exponentially large ensemble, but they cannot provide direct and intuitive descriptions, and cannot be directly used for accessibility prediction. Stochastic sampling algorithm, which samples secondary structures according to their probabilities in the Boltzmann ensemble, is widely used, e.g., for accessibility prediction. However, current sampling algorithms are unnecessarily complicated, repeatedly perform redundant work, and scale cubically with the sequence length. These issues prevent it from being used for full-length viral genomes such as SARS-CoV-2. To alleviate these problems, we ﬁrst propose a hypergraph framework under which the sampling algorithm can be greatly simplified, then present a lazy-saving sampling strategy under this framework of which redundant work is eliminated. Finally, we propose LinearSampling, the ﬁrst end-to-end linear-time stochastic sampling algorithm, which can be used to detect SARS-CoV-2 potential regions of diagnostics and treatment. Many RNAs function through RNA-RNA interactions. two-stand folding, which can directly predict the structures with consideration of RNA-RNA interaction, is also well-desired. Some existing tools, such as RNAhybrid and RNAplex, are not only less informative but also less accurate due to omitting the competing between intermolecular and intramolecular base pairs. Another group of tools such as RNAup focus on predicting the binding region rather than predicting two-strand co-folding structure. Other tools like RNAcofold are too slow due to cubic runtime complexity. To address these issues, we propose LinearCoFold and LinearCoPartition, which is able to predict two-strand folding structure, partition function and base pairing prob-abilities in linear runtime and space. Our new coffolding algorithms are orders of magnitude faster than the baseline RNAcofold, and perform better PPV and sensitivity on the RNA-RNA interaction dataset

ScholarsArchive@OSU

RNA secondary structure prediction from multi-aligned sequences

It has been well accepted that the RNA secondary structures of most functional non-coding RNAs (ncRNAs) are closely related to their functions and are conserved during evolution. Hence, prediction of conserved secondary structures from evolutionarily related sequences is one important task in RNA bioinformatics; the methods are useful not only to further functional analyses of ncRNAs but also to improve the accuracy of secondary structure predictions and to find novel functional RNAs from the genome. In this review, I focus on common secondary structure prediction from a given aligned RNA sequence, in which one secondary structure whose length is equal to that of the input alignment is predicted. I systematically review and classify existing tools and algorithms for the problem, by utilizing the information employed in the tools and by adopting a unified viewpoint based on maximum expected gain (MEG) estimators. I believe that this classification will allow a deeper understanding of each tool and provide users with useful information for selecting tools for common secondary structure predictions.Comment: A preprint of an invited review manuscript that will be published in a chapter of the book `Methods in Molecular Biology'. Note that this version of the manuscript may differ from the published versio

arXiv.org e-Print Archive

CiteSeerX

Crossref

Ab initio RNA folding

Author: Cragnolini Tristan
Derreumaux Philippe
Pasquali Samuela
Publication venue: 'IOP Publishing'
Publication date: 30/12/2014
Field of study

RNA molecules are essential cellular machines performing a wide variety of functions for which a specific three-dimensional structure is required. Over the last several years, experimental determination of RNA structures through X-ray crystallography and NMR seems to have reached a plateau in the number of structures resolved each year, but as more and more RNA sequences are being discovered, need for structure prediction tools to complement experimental data is strong. Theoretical approaches to RNA folding have been developed since the late nineties when the first algorithms for secondary structure prediction appeared. Over the last 10 years a number of prediction methods for 3D structures have been developed, first based on bioinformatics and data-mining, and more recently based on a coarse-grained physical representation of the systems. In this review we are going to present the challenges of RNA structure prediction and the main ideas behind bioinformatic approaches and physics-based approaches. We will focus on the description of the more recent physics-based phenomenological models and on how they are built to include the specificity of the interactions of RNA bases, whose role is critical in folding. Through examples from different models, we will point out the strengths of physics-based approaches, which are able not only to predict equilibrium structures, but also to investigate dynamical and thermodynamical behavior, and the open challenges to include more key interactions ruling RNA folding.Comment: 28 pages, 18 figure

arXiv.org e-Print Archive

Hal-Diderot