10 research outputs found
Rank, select and access in grammar-compressed strings
Given a string of length on a fixed alphabet of symbols, a
grammar compressor produces a context-free grammar of size that
generates and only . In this paper we describe data structures to
support the following operations on a grammar-compressed string:
\mbox{rank}_c(S,i) (return the number of occurrences of symbol before
position in ); \mbox{select}_c(S,i) (return the position of the th
occurrence of in ); and \mbox{access}(S,i,j) (return substring
). For rank and select we describe data structures of size
bits that support the two operations in time. We
propose another structure that uses
bits and that supports the two queries in , where
is an arbitrary constant. To our knowledge, we are the first to
study the asymptotic complexity of rank and select in the grammar-compressed
setting, and we provide a hardness result showing that significantly improving
the bounds we achieve would imply a major breakthrough on a hard
graph-theoretical problem. Our main result for access is a method that requires
bits of space and time to extract
consecutive symbols from . Alternatively, we can achieve query time using bits of space. This matches a lower bound stated by Verbin
and Yu for strings where is polynomially related to .Comment: 16 page
Multivariate Fine-Grained Complexity of Longest Common Subsequence
We revisit the classic combinatorial pattern matching problem of finding a
longest common subsequence (LCS). For strings and of length , a
textbook algorithm solves LCS in time , but although much effort has
been spent, no -time algorithm is known. Recent work
indeed shows that such an algorithm would refute the Strong Exponential Time
Hypothesis (SETH) [Abboud, Backurs, Vassilevska Williams + Bringmann,
K\"unnemann FOCS'15].
Despite the quadratic-time barrier, for over 40 years an enduring scientific
interest continued to produce fast algorithms for LCS and its variations.
Particular attention was put into identifying and exploiting input parameters
that yield strongly subquadratic time algorithms for special cases of interest,
e.g., differential file comparison. This line of research was successfully
pursued until 1990, at which time significant improvements came to a halt. In
this paper, using the lens of fine-grained complexity, our goal is to (1)
justify the lack of further improvements and (2) determine whether some special
cases of LCS admit faster algorithms than currently known.
To this end, we provide a systematic study of the multivariate complexity of
LCS, taking into account all parameters previously discussed in the literature:
the input size , the length of the shorter string
, the length of an LCS of and , the numbers of
deletions and , the alphabet size, as well as
the numbers of matching pairs and dominant pairs . For any class of
instances defined by fixing each parameter individually to a polynomial in
terms of the input size, we prove a SETH-based lower bound matching one of
three known algorithms. Specifically, we determine the optimal running time for
LCS under SETH as .
[...]Comment: Presented at SODA'18. Full Version. 66 page
Multivariate Fine-Grained Complexity of Longest Common Subsequence
We revisit the classic combinatorial pattern matching problem of finding a longest common subsequence (LCS). For strings and of length , a textbook algorithm solves LCS in time , but although much effort has been spent, no -time algorithm is known. Recent work indeed shows that such an algorithm would refute the Strong Exponential Time Hypothesis (SETH) [Abboud, Backurs, Vassilevska Williams + Bringmann, K\"unnemann FOCS'15]. Despite the quadratic-time barrier, for over 40 years an enduring scientific interest continued to produce fast algorithms for LCS and its variations. Particular attention was put into identifying and exploiting input parameters that yield strongly subquadratic time algorithms for special cases of interest, e.g., differential file comparison. This line of research was successfully pursued until 1990, at which time significant improvements came to a halt. In this paper, using the lens of fine-grained complexity, our goal is to (1) justify the lack of further improvements and (2) determine whether some special cases of LCS admit faster algorithms than currently known. To this end, we provide a systematic study of the multivariate complexity of LCS, taking into account all parameters previously discussed in the literature: the input size , the length of the shorter string , the length of an LCS of and , the numbers of deletions and , the alphabet size, as well as the numbers of matching pairs and dominant pairs . For any class of instances defined by fixing each parameter individually to a polynomial in terms of the input size, we prove a SETH-based lower bound matching one of three known algorithms. Specifically, we determine the optimal running time for LCS under SETH as . [...
Grammar Boosting: A New Technique for Proving Lower Bounds for Computation over Compressed Data
Grammar compression is a general compression framework in which a string
of length is represented as a context-free grammar of size whose
language contains only . In this paper, we focus on studying the limitations
of algorithms and data structures operating on strings in grammar-compressed
form. Previous work focused on proving lower bounds for grammars constructed
using algorithms that achieve the approximation ratio
. Unfortunately, for the majority of
grammar compressors, is either unknown or satisfies
. In their seminal paper, Charikar et al. [IEEE
Trans. Inf. Theory 2005] studied seven popular grammar compression algorithms:
RePair, Greedy, LongestMatch, Sequential, Bisection, LZ78, and
-Balanced. Only one of them (-Balanced) is known to achieve
.
We develop the first technique for proving lower bounds for data structures
and algorithms on grammars that is fully general and does not depend on the
approximation ratio of the used grammar compressor. Using this
technique, we first prove that time is required
for random access on RePair, Greedy, LongestMatch, Sequential, and Bisection,
while time is required for random access to LZ78. All
these lower bounds hold within space and
match the existing upper bounds. We also generalize this technique to prove
several conditional lower bounds for compressed computation. For example, we
prove that unless the Combinatorial -Clique Conjecture fails, there is no
combinatorial algorithm for CFG parsing on Bisection (for which it holds
) that runs in time for all constants and . Previously,
this was known only for