12,292 research outputs found
If the Current Clique Algorithms are Optimal, so is Valiant's Parser
The CFG recognition problem is: given a context-free grammar
and a string of length , decide if can be obtained from
. This is the most basic parsing question and is a core computer
science problem. Valiant's parser from 1975 solves the problem in
time, where is the matrix multiplication
exponent. Dozens of parsing algorithms have been proposed over the years, yet
Valiant's upper bound remains unbeaten. The best combinatorial algorithms have
mildly subcubic complexity.
Lee (JACM'01) provided evidence that fast matrix multiplication is needed for
CFG parsing, and that very efficient and practical algorithms might be hard or
even impossible to obtain. Lee showed that any algorithm for a more general
parsing problem with running time can
be converted into a surprising subcubic algorithm for Boolean Matrix
Multiplication. Unfortunately, Lee's hardness result required that the grammar
size be . Nothing was known for the more relevant
case of constant size grammars.
In this work, we prove that any improvement on Valiant's algorithm, even for
constant size grammars, either in terms of runtime or by avoiding the
inefficiencies of fast matrix multiplication, would imply a breakthrough
algorithm for the -Clique problem: given a graph on nodes, decide if
there are that form a clique.
Besides classifying the complexity of a fundamental problem, our reduction
has led us to similar lower bounds for more modern and well-studied cubic time
problems for which faster algorithms are highly desirable in practice: RNA
Folding, a central problem in computational biology, and Dyck Language Edit
Distance, answering an open question of Saha (FOCS'14)
Prediction of secondary structures for large RNA molecules
The prediction of correct secondary structures of large RNAs is one of the unsolved challenges of computational molecular biology. Among the major obstacles is the fact that accurate calculations scale as O(nā“), so the computational requirements become prohibitive as the length increases. We present a new parallel multicore and scalable program called GTfold, which is one to two orders of magnitude faster than the de facto standard programs mfold and RNAfold for folding large RNA viral sequences and achieves comparable accuracy of prediction. We analyze the algorithm's concurrency and describe the parallelism for a shared memory environment such as a symmetric multiprocessor or multicore chip. We are seeing a paradigm shift to multicore chips and parallelism must be explicitly addressed to continue gaining performance with each new generation of systems.
We provide a rigorous proof of correctness of an optimized algorithm for internal loop calculations called internal loop speedup algorithm (ILSA), which reduces the time complexity of internal loop computations from O(nā“) to O(nĀ³) and show that the exact algorithms such as ILSA are executed with our method in affordable amount of time. The proof gives insight into solving these kinds of combinatorial problems. We have documented detailed pseudocode of the algorithm for predicting minimum free energy secondary structures which provides a base to implement future algorithmic improvements and improved thermodynamic model in GTfold. GTfold is written in C/C++ and freely available as open source from our website.M.S.Committee Chair: Bader, David; Committee Co-Chair: Heitsch, Christine; Committee Member: Harvey, Stephen; Committee Member: Vuduc, Richar
Statistical mechanics of RNA folding: importance of alphabet size
We construct a minimalist model of RNA secondary-structure formation and use
it to study the mapping from sequence to structure. There are strong,
qualitative differences between two-letter and four or six-letter alphabets.
With only two kinds of bases, there are many alternate folding configurations,
yielding thermodynamically stable ground-states only for a small set of
structures of high designability, i.e., total number of associated sequences.
In contrast, sequences made from four bases, as found in nature, or six bases
have far fewer competing folding configurations, resulting in a much greater
average stability of the ground state.Comment: 7 figures; uses revtex
Recommended from our members
PATTERNA: transcriptome-wide search for functional RNA elements via structural data signatures.
Establishing a link between RNA structure and function remains a great challenge in RNA biology. The emergence of high-throughput structure profiling experiments is revolutionizing our ability to decipher structure, yet principled approaches for extracting information on structural elements directly from these data sets are lacking. We present PATTERNA, an unsupervised pattern recognition algorithm that rapidly mines RNA structure motifs from profiling data. We demonstrate that PATTERNA detects motifs with an accuracy comparable to commonly used thermodynamic models and highlight its utility in automating data-directed structure modeling from large data sets. PATTERNA is versatile and compatible with diverse profiling techniques and experimental conditions
Translocation of structured polynucleotides through nanopores
We investigate theoretically the translocation of structured RNA/DNA
molecules through narrow pores which allow single but not double strands to
pass. The unzipping of basepaired regions within the molecules presents
significant kinetic barriers for the translocation process. We show that this
circumstance may be exploited to determine the full basepairing pattern of
polynucleotides, including RNA pseudoknots. The crucial requirement is that the
translocation dynamics (i.e., the length of the translocated molecular segment)
needs to be recorded as a function of time with a spatial resolution of a few
nucleotides. This could be achieved, for instance, by applying a mechanical
driving force for translocation and recording force-extension curves (FEC's)
with a device such as an atomic force microscope or optical tweezers. Our
analysis suggests that with this added spatial resolution, nanopores could be
transformed into a powerful experimental tool to study the folding of nucleic
acids.Comment: 9 pages, 5 figure
Recommended from our members
Linear-time Algorithms for RNA Folding: Partition Function, Stochastic Sampling and RNA-RNA Interaction
RNAs play important roles in the central dogma of molecular biology, and are involved in multiple biology processes such as chromatin modification, transcriptional interference and translation initiation. The functions of RNAs, especially non-coding RNAs, are highly related to its secondary structures, therefore computational methods for RNA structure prediction are of great interest. In this dissertation, we propose linear-time algorithms for RNA folding partition function, stochastic sampling and RNA-RNA interaction, which can efficiently and accurately predict and analyze RNA secondary structure. The partition function-based methods are proposed to compute folding ensembles and estimate structure and base pair probabilities. However, the classical partition function algorithm scales cubically with sequence length, and is therefore a slow calculation for long sequences. We design a linear-time heuristic algorithm, LinearPartition, to approximate the partition function and base pairing probabilities, which is shown to be orders of magnitude faster than classical folding systems such as Vienna RNAfold and CONTRAfold. More interestingly, the resulting base pairing probabilities are even better correlated with the ground truth structures. On the other hand, partition function and the estimated base-pairing probabilities provide compact representations of the exponentially large ensemble, but they cannot provide direct and intuitive descriptions, and cannot be directly used for accessibility prediction. Stochastic sampling algorithm, which samples secondary structures according to their probabilities in the Boltzmann ensemble, is widely used, e.g., for accessibility prediction. However, current sampling algorithms are unnecessarily complicated, repeatedly perform redundant work, and scale cubically with the sequence length. These issues prevent it from being used for full-length viral genomes such as SARS-CoV-2. To alleviate these problems, we ļ¬rst propose a hypergraph framework under which the sampling algorithm can be greatly simplified, then present a lazy-saving sampling strategy under this framework of which redundant work is eliminated. Finally, we propose LinearSampling, the ļ¬rst end-to-end linear-time stochastic sampling algorithm, which can be used to detect SARS-CoV-2 potential regions of diagnostics and treatment. Many RNAs function through RNA-RNA interactions. two-stand folding, which can directly predict the structures with consideration of RNA-RNA interaction, is also well-desired. Some existing tools, such as RNAhybrid and RNAplex, are not only less informative but also less accurate due to omitting the competing between intermolecular and intramolecular base pairs. Another group of tools such as RNAup focus on predicting the binding region rather than predicting two-strand co-folding structure. Other tools like RNAcofold are too slow due to cubic runtime complexity. To address these issues, we propose LinearCoFold and LinearCoPartition, which is able to predict two-strand folding structure, partition function and base pairing prob-abilities in linear runtime and space. Our new coffolding algorithms are orders of magnitude faster than the baseline RNAcofold, and perform better PPV and sensitivity on the RNA-RNA interaction dataset
RNA secondary structure prediction from multi-aligned sequences
It has been well accepted that the RNA secondary structures of most
functional non-coding RNAs (ncRNAs) are closely related to their functions and
are conserved during evolution. Hence, prediction of conserved secondary
structures from evolutionarily related sequences is one important task in RNA
bioinformatics; the methods are useful not only to further functional analyses
of ncRNAs but also to improve the accuracy of secondary structure predictions
and to find novel functional RNAs from the genome. In this review, I focus on
common secondary structure prediction from a given aligned RNA sequence, in
which one secondary structure whose length is equal to that of the input
alignment is predicted. I systematically review and classify existing tools and
algorithms for the problem, by utilizing the information employed in the tools
and by adopting a unified viewpoint based on maximum expected gain (MEG)
estimators. I believe that this classification will allow a deeper
understanding of each tool and provide users with useful information for
selecting tools for common secondary structure predictions.Comment: A preprint of an invited review manuscript that will be published in
a chapter of the book `Methods in Molecular Biology'. Note that this version
of the manuscript may differ from the published versio
Ab initio RNA folding
RNA molecules are essential cellular machines performing a wide variety of
functions for which a specific three-dimensional structure is required. Over
the last several years, experimental determination of RNA structures through
X-ray crystallography and NMR seems to have reached a plateau in the number of
structures resolved each year, but as more and more RNA sequences are being
discovered, need for structure prediction tools to complement experimental data
is strong. Theoretical approaches to RNA folding have been developed since the
late nineties when the first algorithms for secondary structure prediction
appeared. Over the last 10 years a number of prediction methods for 3D
structures have been developed, first based on bioinformatics and data-mining,
and more recently based on a coarse-grained physical representation of the
systems. In this review we are going to present the challenges of RNA structure
prediction and the main ideas behind bioinformatic approaches and physics-based
approaches. We will focus on the description of the more recent physics-based
phenomenological models and on how they are built to include the specificity of
the interactions of RNA bases, whose role is critical in folding. Through
examples from different models, we will point out the strengths of
physics-based approaches, which are able not only to predict equilibrium
structures, but also to investigate dynamical and thermodynamical behavior, and
the open challenges to include more key interactions ruling RNA folding.Comment: 28 pages, 18 figure
- ā¦