40 research outputs found
Comparison of LZ77-type Parsings
We investigate the relations between different variants of the LZ77 parsing
existing in the literature. All of them are defined as greedily constructed
parsings encoding each phrase by reference to a string occurring earlier in the
input. They differ by the phrase encodings: encoded by pairs (length + position
of an earlier occurrence) or by triples (length + position of an earlier
occurrence + the letter following the earlier occurring part); and they differ
by allowing or not allowing overlaps between the phrase and its earlier
occurrence. For a given string of length over an alphabet of size ,
denote the numbers of phrases in the parsings allowing (resp., not allowing)
overlaps by (resp., ) for "pairs", and by (resp.,
) for "triples". We prove the following bounds and provide series of
examples showing that these bounds are tight:
and
;
and .Comment: 6 page
EERTREE: An Efficient Data Structure for Processing Palindromes in Strings
We propose a new linear-size data structure which provides a fast access to
all palindromic substrings of a string or a set of strings. This structure
inherits some ideas from the construction of both the suffix trie and suffix
tree. Using this structure, we present simple and efficient solutions for a
number of problems involving palindromes.Comment: 21 pages, 2 figures. Accepted to IWOCA 201
On the Combinatorics of Palindromes and Antipalindromes
We prove a number of results on the structure and enumeration of palindromes
and antipalindromes. In particular, we study conjugates of palindromes,
palindromic pairs, rich words, and the counterparts of these notions for
antipalindromes.Comment: 13 pages/ submitted to DLT 201
Subword complexity and power avoidance
We begin a systematic study of the relations between subword complexity of
infinite words and their power avoidance. Among other things, we show that
-- the Thue-Morse word has the minimum possible subword complexity over all
overlap-free binary words and all -power-free binary words, but not
over all -power-free binary words;
-- the twisted Thue-Morse word has the maximum possible subword complexity
over all overlap-free binary words, but no word has the maximum subword
complexity over all -power-free binary words;
-- if some word attains the minimum possible subword complexity over all
square-free ternary words, then one such word is the ternary Thue word;
-- the recently constructed 1-2-bonacci word has the minimum possible subword
complexity over all \textit{symmetric} square-free ternary words.Comment: 29 pages. Submitted to TC
Searching Long Repeats in Streams
We consider two well-known related problems: Longest Repeated Substring (LRS) and Longest Repeated Reversed Substring (LRRS). Their streaming versions cannot be solved exactly; we show that only approximate solutions by Monte Carlo algorithms are possible, and prove a lower bound on consumed memory. For both problems, we present purely linear-time Monte Carlo algorithms working in O(E + n/E) space, where E is the additive approximation error. Within the same space bounds, we then present nearly real-time solutions, which require O(log n) time per symbol and O(n + n/E log n) time overall. The working space exactly matches the lower bound whenever E=O(n^{0.5}) and the size of the alphabet is Omega(n^{0.01})
Palindromic k-Factorization in Pure Linear Time
Given a string s of length n over a general alphabet and an integer k, the problem is to decide whether s is a concatenation of k nonempty palindromes. Two previously known solutions for this problem work in time O(kn) and O(nlog n) respectively. Here we settle the complexity of this problem in the word-RAM model, presenting an O(n)-time online deciding algorithm. The algorithm simultaneously finds the minimum odd number of factors and the minimum even number of factors in a factorization of a string into nonempty palindromes. We also demonstrate how to get an explicit factorization of s into k palindromes with an O(n)-time offline postprocessing
Binary Patterns in Binary Cube-Free Words: Avoidability and Growth
The avoidability of binary patterns by binary cube-free words is investigated
and the exact bound between unavoidable and avoidable patterns is found. All
avoidable patterns are shown to be D0L-avoidable. For avoidable patterns, the
growth rates of the avoiding languages are studied. All such languages, except
for the overlap-free language, are proved to have exponential growth. The exact
growth rates of languages avoiding minimal avoidable patterns are approximated
through computer-assisted upper bounds. Finally, a new example of a
pattern-avoiding language of polynomial growth is given.Comment: 18 pages, 2 tables; submitted to RAIRO TIA (Special issue of Mons
Days 2012