208 research outputs found
Two strings at Hamming distance 1 cannot be both quasiperiodic
We present a generalization of a known fact from combinatorics on words
related to periodicity into quasiperiodicity. A string is called periodic if it
has a period which is at most half of its length. A string is called
quasiperiodic if it has a non-trivial cover, that is, there exists a string
that is shorter than and such that every position in is inside one of
the occurrences of in . It is a folklore fact that two strings that
differ at exactly one position cannot be both periodic. Here we prove a more
general fact that two strings that differ at exactly one position cannot be
both quasiperiodic. Along the way we obtain new insights into combinatorics of
quasiperiodicities.Comment: 6 pages, 3 figure
Internal Pattern Matching Queries in a Text and Applications
We consider several types of internal queries: questions about subwords of a
text. As the main tool we develop an optimal data structure for the problem
called here internal pattern matching. This data structure provides
constant-time answers to queries about occurrences of one subword in
another subword of a given text, assuming that ,
which allows for a constant-space representation of all occurrences. This
problem can be viewed as a natural extension of the well-studied pattern
matching problem. The data structure has linear size and admits a linear-time
construction algorithm.
Using the solution to the internal pattern matching problem, we obtain very
efficient data structures answering queries about: primitivity of subwords,
periods of subwords, general substring compression, and cyclic equivalence of
two subwords. All these results improve upon the best previously known
counterparts. The linear construction time of our data structure also allows to
improve the algorithm for finding -subrepetitions in a text (a more
general version of maximal repetitions, also called runs). For any fixed
we obtain the first linear-time algorithm, which matches the linear
time complexity of the algorithm computing runs. Our data structure has already
been used as a part of the efficient solutions for subword suffix rank &
selection, as well as substring compression using Burrows-Wheeler transform
composed with run-length encoding.Comment: 31 pages, 9 figures; accepted to SODA 201
Efficient Ranking of Lyndon Words and Decoding Lexicographically Minimal de Bruijn Sequence
We give efficient algorithms for ranking Lyndon words of length n over an
alphabet of size {\sigma}. The rank of a Lyndon word is its position in the
sequence of lexicographically ordered Lyndon words of the same length. The
outputs are integers of exponential size, and complexity of arithmetic
operations on such large integers cannot be ignored. Our model of computations
is the word-RAM, in which basic arithmetic operations on (large) numbers of
size at most {\sigma}^n take O(n) time. Our algorithm for ranking Lyndon words
makes O(n^2) arithmetic operations (this would imply directly cubic time on
word-RAM). However, using an algebraic approach we are able to reduce the total
time complexity on the word-RAM to O(n^2 log {\sigma}). We also present an
O(n^3 log^2 {\sigma})-time algorithm that generates the Lyndon word of a given
length and rank in lexicographic order. Finally we use the connections between
Lyndon words and lexicographically minimal de Bruijn sequences (theorem of
Fredricksen and Maiorana) to develop the first polynomial-time algorithm for
decoding minimal de Bruijn sequence of any rank n (it determines the position
of an arbitrary word of length n within the de Bruijn sequence).Comment: Improved version of a paper presented at CPM 201
On the Greedy Algorithm for the Shortest Common Superstring Problem with Reversals
We study a variation of the classical Shortest Common Superstring (SCS)
problem in which a shortest superstring of a finite set of strings is
sought containing as a factor every string of or its reversal. We call this
problem Shortest Common Superstring with Reversals (SCS-R). This problem has
been introduced by Jiang et al., who designed a greedy-like algorithm with
length approximation ratio . In this paper, we show that a natural
adaptation of the classical greedy algorithm for SCS has (optimal) compression
ratio , i.e., the sum of the overlaps in the output string is at least
half the sum of the overlaps in an optimal solution. We also provide a
linear-time implementation of our algorithm.Comment: Published in Information Processing Letter
- …
