5,611 research outputs found
Internal Pattern Matching Queries in a Text and Applications
We consider several types of internal queries: questions about subwords of a
text. As the main tool we develop an optimal data structure for the problem
called here internal pattern matching. This data structure provides
constant-time answers to queries about occurrences of one subword in
another subword of a given text, assuming that ,
which allows for a constant-space representation of all occurrences. This
problem can be viewed as a natural extension of the well-studied pattern
matching problem. The data structure has linear size and admits a linear-time
construction algorithm.
Using the solution to the internal pattern matching problem, we obtain very
efficient data structures answering queries about: primitivity of subwords,
periods of subwords, general substring compression, and cyclic equivalence of
two subwords. All these results improve upon the best previously known
counterparts. The linear construction time of our data structure also allows to
improve the algorithm for finding -subrepetitions in a text (a more
general version of maximal repetitions, also called runs). For any fixed
we obtain the first linear-time algorithm, which matches the linear
time complexity of the algorithm computing runs. Our data structure has already
been used as a part of the efficient solutions for subword suffix rank &
selection, as well as substring compression using Burrows-Wheeler transform
composed with run-length encoding.Comment: 31 pages, 9 figures; accepted to SODA 201
Partially Ordered Two-way B\"uchi Automata
We introduce partially ordered two-way B\"uchi automata and characterize
their expressive power in terms of fragments of first-order logic FO[<].
Partially ordered two-way B\"uchi automata are B\"uchi automata which can
change the direction in which the input is processed with the constraint that
whenever a state is left, it is never re-entered again. Nondeterministic
partially ordered two-way B\"uchi automata coincide with the first-order
fragment Sigma2. Our main contribution is that deterministic partially ordered
two-way B\"uchi automata are expressively complete for the first-order fragment
Delta2. As an intermediate step, we show that deterministic partially ordered
two-way B\"uchi automata are effectively closed under Boolean operations.
A small model property yields coNP-completeness of the emptiness problem and
the inclusion problem for deterministic partially ordered two-way B\"uchi
automata.Comment: The results of this paper were presented at CIAA 2010; University of
Stuttgart, Computer Scienc
Truly Subquadratic-Time Extension Queries and Periodicity Detection in Strings with Uncertainties
Strings with don\u27t care symbols, also called partial words, and more general indeterminate strings are a natural representation of strings containing uncertain symbols. A considerable effort has been made to obtain efficient algorithms for pattern matching and periodicity detection in such strings. Among those, a number of algorithms have been proposed that behave well on random data, but still their worst-case running time is Theta(n^2). We present the first truly subquadratic-time solutions for a number of such problems on partial words that can also be adapted to indeterminate strings over a constant-sized alphabet. We show that longest common compatible prefix queries (which correspond to longest common extension queries in regular strings) can be answered on-line in O(n * sqrt(n * log(n)) time after O(n * sqrt(n * log(n))-time preprocessing. We also present O(n * sqrt(n * log(n))-time algorithms for computing the prefix array and two types of border array of a partial word
Small overlap monoids II: automatic structures and normal forms
We show that any finite monoid or semigroup presentation satisfying the small
overlap condition C(4) has word problem which is a deterministic rational
relation. It follows that the set of lexicographically minimal words forms a
regular language of normal forms, and that these normal forms can be computed
in linear time. We also deduce that C(4) monoids and semigroups are rational
(in the sense of Sakarovitch), asynchronous automatic, and word hyperbolic (in
the sense of Duncan and Gilman). From this it follows that C(4) monoids satisfy
analogues of Kleene's theorem, and admit decision algorithms for the rational
subset and finitely generated submonoid membership problems. We also prove some
automata-theoretic results which may be of independent interest.Comment: 17 page
Repetitions in partial words
El objeto de esta tesis está representado por las repeticiones de palabras parciales, palabras que, además de las letras regulares, pueden tener un número de sÃmbolos desconocidos,llamados sÃmbolos "agujeros" o "no sé qué". Más concretamente, se presenta y se resuelve una extensión de la noción de repetición establecida por Axel Thue. Investigamos las palabras parciales con un número infinito de agujeros que cumplen estas propiedades y, también las palabras parciales que conservan las propiedades después de la inserción de un número arbitrario de agujeros, posiblemente infinito. Luego, hacemos un recuento del número máximo de 2-repeticiones distintas compatibles con los factores de una palabra parcial. Se demuestra que el problema en el caso general es difÃcil, y estudiamos el problema en el caso de un agujero. Al final, se estudian algunas propiedades de las palabras parciales sin fronteras y primitivas (palabras sin repeticiones) y se da una caracterización del lenguaje de palabras parciales con una factorización crÃtica
Longest common extension
Given a word w of length n and i, j ∈ [n], the longest common extension is the longest substring
starting at both i and j. In this note we estimate the average length of the longest common
extension over all words w and all pairs (i, j), as well as the typical maximum length of the
longest common extension.
We also consider a variant of this problem, due to Blanchet-Sadri and Lazarow, in which the
word is allowed to contain ‘holes’, which are special symbols functioning as ‘jokers’, i.e. are
considered to be equal to any character. In particular, we estimate the average longest common
extension over all words w with a small number of holes, extending a result by Blanchet-Sadri,
Harred and Lazarow, and prove a similar result for words with holes appearing randomly
- …