6 research outputs found
If the Current Clique Algorithms are Optimal, so is Valiant's Parser
The CFG recognition problem is: given a context-free grammar
and a string of length , decide if can be obtained from
. This is the most basic parsing question and is a core computer
science problem. Valiant's parser from 1975 solves the problem in
time, where is the matrix multiplication
exponent. Dozens of parsing algorithms have been proposed over the years, yet
Valiant's upper bound remains unbeaten. The best combinatorial algorithms have
mildly subcubic complexity.
Lee (JACM'01) provided evidence that fast matrix multiplication is needed for
CFG parsing, and that very efficient and practical algorithms might be hard or
even impossible to obtain. Lee showed that any algorithm for a more general
parsing problem with running time can
be converted into a surprising subcubic algorithm for Boolean Matrix
Multiplication. Unfortunately, Lee's hardness result required that the grammar
size be . Nothing was known for the more relevant
case of constant size grammars.
In this work, we prove that any improvement on Valiant's algorithm, even for
constant size grammars, either in terms of runtime or by avoiding the
inefficiencies of fast matrix multiplication, would imply a breakthrough
algorithm for the -Clique problem: given a graph on nodes, decide if
there are that form a clique.
Besides classifying the complexity of a fundamental problem, our reduction
has led us to similar lower bounds for more modern and well-studied cubic time
problems for which faster algorithms are highly desirable in practice: RNA
Folding, a central problem in computational biology, and Dyck Language Edit
Distance, answering an open question of Saha (FOCS'14)
GPU accelerated RNA-RNA interaction program, A
2021 Spring.Includes bibliographical references.RNA-RNA interaction (RRI) is important in processes like gene regulation, and is known to play roles in diseases including cancer and Alzheimer's. Large RRI computations run for days, weeks or even months, because the algorithms have time and space complexity of, respectively, O(N3M3) and O(N2M2), for sequences length N and M, and there is a need for high-throughput RRI tools. GPU parallelization of such algorithms is a challenge. We first show that the most computationally expensive part of base pair maximization (BPM) algorithms comprises O(N3) instances of upper banded tropical matrix products. We develop the first GPU library for this attaining close to theoretical machine peak (TMP). We next optimize other (fifth degree polynomial) terms in the computation and develop the first GPU implementation of the complete BPMax algorithm. We attain 12% of GPU TMP, a significant speedup over the original parallel CPU implementation, which attains less than 1% of CPU TMP. We also perform a large scale study of three small viral RNAs, hypothesized to be relevant to COVID-19
Faster Algorithms for RNA-folding using the Four-Russians method
The secondary structure that maximizes the number of non-crossing matchings between complimentary bases of an RNA sequence of length n can be computed in O(n 3) time using dynamic programming. Four-Russians is a technique that will reduce the running time for certain dynamic programming algorithms by a factor after a preprocessing step where solutions to all smaller subproblems of a fixed size are exhaustively enumerated. Frid and Gusfield designed an O ( n3) algorithm for RNA folding using the Four-Russians technique. However, in their log n algorithm the preprocessing is interleaved with the algorithm computation. We simplify the algorithm and the analysis by doing the preprocessing once prior to the algorithm computation. We call this the two-vector method. We also show variants where instead of exhaustive preprocessing, we only solve the subproblems encountered in the main algorithm once and memoize the results. We give a proof of correctness and explore the practical advantages over the earlier method. The Nussinov algorithm admits an O(n 2) parallel algorithm. We show an parallel algorithm using the two-vector idea that improves the time bound to O(n 2 / log n). We have implemented the parallel algorithm on Graphical processing units using CUDA platform. We discuss the organization of the data structures to exploit coalesced memory access for fast running time. These ideas also help in improving the running time of the serial algorithms. For sequences of up to 6000 bases the parallel algorithm takes only about 2 secs, the two-vector and memoized versions are faster than the Frid-Gusfield algorithm by a factor of 3, and faster than Nussinov by a factor of 20.