5,611 research outputs found

    Internal Pattern Matching Queries in a Text and Applications

    Full text link
    We consider several types of internal queries: questions about subwords of a text. As the main tool we develop an optimal data structure for the problem called here internal pattern matching. This data structure provides constant-time answers to queries about occurrences of one subword xx in another subword yy of a given text, assuming that ∣y∣=O(∣x∣)|y|=\mathcal{O}(|x|), which allows for a constant-space representation of all occurrences. This problem can be viewed as a natural extension of the well-studied pattern matching problem. The data structure has linear size and admits a linear-time construction algorithm. Using the solution to the internal pattern matching problem, we obtain very efficient data structures answering queries about: primitivity of subwords, periods of subwords, general substring compression, and cyclic equivalence of two subwords. All these results improve upon the best previously known counterparts. The linear construction time of our data structure also allows to improve the algorithm for finding δ\delta-subrepetitions in a text (a more general version of maximal repetitions, also called runs). For any fixed δ\delta we obtain the first linear-time algorithm, which matches the linear time complexity of the algorithm computing runs. Our data structure has already been used as a part of the efficient solutions for subword suffix rank & selection, as well as substring compression using Burrows-Wheeler transform composed with run-length encoding.Comment: 31 pages, 9 figures; accepted to SODA 201

    Partially Ordered Two-way B\"uchi Automata

    Full text link
    We introduce partially ordered two-way B\"uchi automata and characterize their expressive power in terms of fragments of first-order logic FO[<]. Partially ordered two-way B\"uchi automata are B\"uchi automata which can change the direction in which the input is processed with the constraint that whenever a state is left, it is never re-entered again. Nondeterministic partially ordered two-way B\"uchi automata coincide with the first-order fragment Sigma2. Our main contribution is that deterministic partially ordered two-way B\"uchi automata are expressively complete for the first-order fragment Delta2. As an intermediate step, we show that deterministic partially ordered two-way B\"uchi automata are effectively closed under Boolean operations. A small model property yields coNP-completeness of the emptiness problem and the inclusion problem for deterministic partially ordered two-way B\"uchi automata.Comment: The results of this paper were presented at CIAA 2010; University of Stuttgart, Computer Scienc

    Truly Subquadratic-Time Extension Queries and Periodicity Detection in Strings with Uncertainties

    Get PDF
    Strings with don\u27t care symbols, also called partial words, and more general indeterminate strings are a natural representation of strings containing uncertain symbols. A considerable effort has been made to obtain efficient algorithms for pattern matching and periodicity detection in such strings. Among those, a number of algorithms have been proposed that behave well on random data, but still their worst-case running time is Theta(n^2). We present the first truly subquadratic-time solutions for a number of such problems on partial words that can also be adapted to indeterminate strings over a constant-sized alphabet. We show that nn longest common compatible prefix queries (which correspond to longest common extension queries in regular strings) can be answered on-line in O(n * sqrt(n * log(n)) time after O(n * sqrt(n * log(n))-time preprocessing. We also present O(n * sqrt(n * log(n))-time algorithms for computing the prefix array and two types of border array of a partial word

    Small overlap monoids II: automatic structures and normal forms

    Full text link
    We show that any finite monoid or semigroup presentation satisfying the small overlap condition C(4) has word problem which is a deterministic rational relation. It follows that the set of lexicographically minimal words forms a regular language of normal forms, and that these normal forms can be computed in linear time. We also deduce that C(4) monoids and semigroups are rational (in the sense of Sakarovitch), asynchronous automatic, and word hyperbolic (in the sense of Duncan and Gilman). From this it follows that C(4) monoids satisfy analogues of Kleene's theorem, and admit decision algorithms for the rational subset and finitely generated submonoid membership problems. We also prove some automata-theoretic results which may be of independent interest.Comment: 17 page

    Repetitions in partial words

    Get PDF
    El objeto de esta tesis está representado por las repeticiones de palabras parciales, palabras que, además de las letras regulares, pueden tener un número de símbolos desconocidos,llamados símbolos "agujeros" o "no sé qué". Más concretamente, se presenta y se resuelve una extensión de la noción de repetición establecida por Axel Thue. Investigamos las palabras parciales con un número infinito de agujeros que cumplen estas propiedades y, también las palabras parciales que conservan las propiedades después de la inserción de un número arbitrario de agujeros, posiblemente infinito. Luego, hacemos un recuento del número máximo de 2-repeticiones distintas compatibles con los factores de una palabra parcial. Se demuestra que el problema en el caso general es difícil, y estudiamos el problema en el caso de un agujero. Al final, se estudian algunas propiedades de las palabras parciales sin fronteras y primitivas (palabras sin repeticiones) y se da una caracterización del lenguaje de palabras parciales con una factorización crítica

    Longest common extension

    Get PDF
    Given a word w of length n and i, j ∈ [n], the longest common extension is the longest substring starting at both i and j. In this note we estimate the average length of the longest common extension over all words w and all pairs (i, j), as well as the typical maximum length of the longest common extension. We also consider a variant of this problem, due to Blanchet-Sadri and Lazarow, in which the word is allowed to contain ‘holes’, which are special symbols functioning as ‘jokers’, i.e. are considered to be equal to any character. In particular, we estimate the average longest common extension over all words w with a small number of holes, extending a result by Blanchet-Sadri, Harred and Lazarow, and prove a similar result for words with holes appearing randomly
    • …
    corecore