77 research outputs found
String attractors : Verification and optimization
String attractors [STOC 2018] are combinatorial objects recently introduced to unify all known dictionary compression techniques in a single theory. A set γ ⊆ [1.n] is a k-attractor for a string S ∈ Σn if and only if every distinct substring of S of length at most k has an occurrence crossing at least one of the positions in γ. Finding the smallest k-attractor is NP-hard for k ≥ 3, but polylogarithmic approximations can be found using reductions from dictionary compressors. It is easy to reduce the k-attractor problem to a set-cover instance where the string's positions are interpreted as sets of substrings. The main result of this paper is a much more powerful reduction based on the truncated suffix tree. Our new characterization of the problem leads to more efficient algorithms for string attractors: we show how to check the validity and minimality of a k-attractor in near-optimal time and how to quickly compute exact solutions. For example, we prove that a minimum 3-attractor can be found in O(n) time when |Σ| ∈ O(3+ϵ√log n) for some constant ϵ > 0, despite the problem being NP-hard for large Σ. © Dominik Kempa, Alberto Policriti, Nicola Prezza, and Eva Rotenberg.Peer reviewe
Compressibility-Aware Quantum Algorithms on Strings
Sublinear time quantum algorithms have been established for many fundamental
problems on strings. This work demonstrates that new, faster quantum algorithms
can be designed when the string is highly compressible. We focus on two popular
and theoretically significant compression algorithms -- the Lempel-Ziv77
algorithm (LZ77) and the Run-length-encoded Burrows-Wheeler Transform (RL-BWT),
and obtain the results below.
We first provide a quantum algorithm running in time
for finding the LZ77 factorization of an input string with
factors. Combined with multiple existing results, this yields an
time quantum algorithm for finding the RL-BWT encoding
with BWT runs. Note that . We complement these
results with lower bounds proving that our algorithms are optimal (up to
polylog factors).
Next, we study the problem of compressed indexing, where we provide a
time quantum algorithm for constructing a recently
designed space structure with equivalent capabilities as the
suffix tree. This data structure is then applied to numerous problems to obtain
sublinear time quantum algorithms when the input is highly compressible. For
example, we show that the longest common substring of two strings of total
length can be computed in time, where is the
number of factors in the LZ77 factorization of their concatenation. This beats
the best known time quantum algorithm when is
sufficiently small
Faster Approximate String Matching for Short Patterns
We study the classical approximate string matching problem, that is, given
strings and and an error threshold , find all ending positions of
substrings of whose edit distance to is at most . Let and
have lengths and , respectively. On a standard unit-cost word RAM with
word size we present an algorithm using time When is
short, namely, or this
improves the previously best known time bounds for the problem. The result is
achieved using a novel implementation of the Landau-Vishkin algorithm based on
tabulation and word-level parallelism.Comment: To appear in Theory of Computing System
Text Indexing and Searching in Sublinear Time
We introduce the first index that can be built in o(n) time for a text of length n, and can also be queried in o(q) time for a pattern of length q. On an alphabet of size ?, our index uses O(n log ?) bits, is built in O(n log ? / ?{log n}) deterministic time, and computes the number of occurrences of the pattern in time O(q/log_? n + log n log_? n). Each such occurrence can then be found in O(log n) time. Other trade-offs between the space usage and the cost of reporting occurrences are also possible
Internal shortest absent word queries
Given a string T of length n over an alphabet Σ ⊂ {1, 2, . . . , nO(1)} of size σ, we are to preprocess T so that given a range [i, j], we can return a representation of a shortest string over Σ that is absent in the fragment T[i] · · · T[j] of T. For any positive integer k ∈ [1, log logσ n], we present an O((n/k) · log logσ n)-size data structure, which can be constructed in O(n logσ n) time, and answers queries in time O(log logσ k)
- …