6,146 research outputs found
Recompression: a simple and powerful technique for word equations
In this paper we present an application of a simple technique of local
recompression, previously developed by the author in the context of compressed
membership problems and compressed pattern matching, to word equations. The
technique is based on local modification of variables (replacing X by aX or Xa)
and iterative replacement of pairs of letters appearing in the equation by a
`fresh' letter, which can be seen as a bottom-up compression of the solution of
the given word equation, to be more specific, building an SLP (Straight-Line
Programme) for the solution of the word equation.
Using this technique we give a new, independent and self-contained proofs of
most of the known results for word equations. To be more specific, the
presented (nondeterministic) algorithm runs in O(n log n) space and in time
polynomial in log N, where N is the size of the length-minimal solution of the
word equation. The presented algorithm can be easily generalised to a generator
of all solutions of the given word equation (without increasing the space
usage). Furthermore, a further analysis of the algorithm yields a doubly
exponential upper bound on the size of the length-minimal solution. The
presented algorithm does not use exponential bound on the exponent of
periodicity. Conversely, the analysis of the algorithm yields an independent
proof of the exponential bound on exponent of periodicity.
We believe that the presented algorithm, its idea and analysis are far
simpler than all previously applied. Furthermore, thanks to it we can obtain a
unified and simple approach to most of known results for word equations.
As a small additional result we show that for O(1) variables (with arbitrary
many appearances in the equation) word equations can be solved in linear space,
i.e. they are context-sensitive.Comment: Submitted to a journal. Since previous version the proofs were
simplified, overall presentation improve
Pattern matching in Lempel-Ziv compressed strings: fast, simple, and deterministic
Countless variants of the Lempel-Ziv compression are widely used in many
real-life applications. This paper is concerned with a natural modification of
the classical pattern matching problem inspired by the popularity of such
compression methods: given an uncompressed pattern s[1..m] and a Lempel-Ziv
representation of a string t[1..N], does s occur in t? Farach and Thorup gave a
randomized O(nlog^2(N/n)+m) time solution for this problem, where n is the size
of the compressed representation of t. We improve their result by developing a
faster and fully deterministic O(nlog(N/n)+m) time algorithm with the same
space complexity. Note that for highly compressible texts, log(N/n) might be of
order n, so for such inputs the improvement is very significant. A (tiny)
fragment of our method can be used to give an asymptotically optimal solution
for the substring hashing problem considered by Farach and Muthukrishnan.Comment: submitte
One-variable word equations in linear time
In this paper we consider word equations with one variable (and arbitrary
many appearances of it). A recent technique of recompression, which is
applicable to general word equations, is shown to be suitable also in this
case. While in general case it is non-deterministic, it determinises in case of
one variable and the obtained running time is O(n + #_X log n), where #_X is
the number of appearances of the variable in the equation. This matches the
previously-best algorithm due to D\k{a}browski and Plandowski. Then, using a
couple of heuristics as well as more detailed time analysis the running time is
lowered to O(n) in RAM model. Unfortunately no new properties of solutions are
shown.Comment: submitted to a journal, general overhaul over the previous versio
Repetition Detection in a Dynamic String
A string UU for a non-empty string U is called a square. Squares have been well-studied both from a combinatorial and an algorithmic perspective. In this paper, we are the first to consider the problem of maintaining a representation of the squares in a dynamic string S of length at most n. We present an algorithm that updates this representation in n^o(1) time. This representation allows us to report a longest square-substring of S in O(1) time and all square-substrings of S in O(output) time. We achieve this by introducing a novel tool - maintaining prefix-suffix matches of two dynamic strings.
We extend the above result to address the problem of maintaining a representation of all runs (maximal repetitions) of the string. Runs are known to capture the periodic structure of a string, and, as an application, we show that our representation of runs allows us to efficiently answer periodicity queries for substrings of a dynamic string. These queries have proven useful in static pattern matching problems and our techniques have the potential of offering solutions to these problems in a dynamic text setting
Synchronization Strings: Codes for Insertions and Deletions Approaching the Singleton Bound
We introduce synchronization strings as a novel way of efficiently dealing
with synchronization errors, i.e., insertions and deletions. Synchronization
errors are strictly more general and much harder to deal with than commonly
considered half-errors, i.e., symbol corruptions and erasures. For every
, synchronization strings allow to index a sequence with an
size alphabet such that one can efficiently transform
synchronization errors into half-errors. This powerful new
technique has many applications. In this paper, we focus on designing insdel
codes, i.e., error correcting block codes (ECCs) for insertion deletion
channels.
While ECCs for both half-errors and synchronization errors have been
intensely studied, the later has largely resisted progress. Indeed, it took
until 1999 for the first insdel codes with constant rate, constant distance,
and constant alphabet size to be constructed by Schulman and Zuckerman. Insdel
codes for asymptotically large or small noise rates were given in 2016 by
Guruswami et al. but these codes are still polynomially far from the optimal
rate-distance tradeoff. This makes the understanding of insdel codes up to this
work equivalent to what was known for regular ECCs after Forney introduced
concatenated codes in his doctoral thesis 50 years ago.
A direct application of our synchronization strings based indexing method
gives a simple black-box construction which transforms any ECC into an equally
efficient insdel code with a slightly larger alphabet size. This instantly
transfers much of the highly developed understanding for regular ECCs over
large constant alphabets into the realm of insdel codes. Most notably, we
obtain efficient insdel codes which get arbitrarily close to the optimal
rate-distance tradeoff given by the Singleton bound for the complete noise
spectrum
Efficient Computation of Sequence Mappability
Sequence mappability is an important task in genome re-sequencing. In the
-mappability problem, for a given sequence of length , our goal
is to compute a table whose th entry is the number of indices such
that length- substrings of starting at positions and have at
most mismatches. Previous works on this problem focused on heuristic
approaches to compute a rough approximation of the result or on the case of
. We present several efficient algorithms for the general case of the
problem. Our main result is an algorithm that works in time and space for
. It requires a carefu l adaptation of the technique of Cole
et al.~[STOC 2004] to avoid multiple counting of pairs of substrings. We also
show -time algorithms to compute all results for a fixed
and all or a fixed and all . Finally we show
that the -mappability problem cannot be solved in strongly subquadratic
time for unless the Strong Exponential Time Hypothesis
fails.Comment: Accepted to SPIRE 201
Covering Problems for Partial Words and for Indeterminate Strings
We consider the problem of computing a shortest solid cover of an
indeterminate string. An indeterminate string may contain non-solid symbols,
each of which specifies a subset of the alphabet that could be present at the
corresponding position. We also consider covering partial words, which are a
special case of indeterminate strings where each non-solid symbol is a don't
care symbol. We prove that indeterminate string covering problem and partial
word covering problem are NP-complete for binary alphabet and show that both
problems are fixed-parameter tractable with respect to , the number of
non-solid symbols. For the indeterminate string covering problem we obtain a
-time algorithm. For the partial word covering
problem we obtain a -time algorithm. We
prove that, unless the Exponential Time Hypothesis is false, no
-time solution exists for either problem, which shows
that our algorithm for this case is close to optimal. We also present an
algorithm for both problems which is feasible in practice.Comment: full version (simplified and corrected); preliminary version appeared
at ISAAC 2014; 14 pages, 4 figure
- …