Search CORE

5,611 research outputs found

Internal Pattern Matching Queries in a Text and Applications

Author: Kociumaka Tomasz
Radoszewski Jakub
Rytter Wojciech
Waleń Tomasz
Publication venue
Publication date: 13/10/2014
Field of study

We consider several types of internal queries: questions about subwords of a text. As the main tool we develop an optimal data structure for the problem called here internal pattern matching. This data structure provides constant-time answers to queries about occurrences of one subword

x

in another subword

y

of a given text, assuming that

|y|=\mathcal{O}(|x|)

, which allows for a constant-space representation of all occurrences. This problem can be viewed as a natural extension of the well-studied pattern matching problem. The data structure has linear size and admits a linear-time construction algorithm. Using the solution to the internal pattern matching problem, we obtain very efficient data structures answering queries about: primitivity of subwords, periods of subwords, general substring compression, and cyclic equivalence of two subwords. All these results improve upon the best previously known counterparts. The linear construction time of our data structure also allows to improve the algorithm for finding

\delta

-subrepetitions in a text (a more general version of maximal repetitions, also called runs). For any fixed

\delta

we obtain the first linear-time algorithm, which matches the linear time complexity of the algorithm computing runs. Our data structure has already been used as a part of the efficient solutions for subword suffix rank & selection, as well as substring compression using Burrows-Wheeler transform composed with run-length encoding.Comment: 31 pages, 9 figures; accepted to SODA 201

arXiv.org e-Print Archive

Crossref

Partially Ordered Two-way B\"uchi Automata

Author: A.P. Sistla
C. Baier
C.A. Kapoutsis
E.M. Clarke Jr.
J.-P. Pécuchet
J.-É. Pin
J.R. Büchi
K. Lodaya
K. Lodaya
M.Y. Vardi
O. Kupferman
T. Schwentick
W. Thomas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

We introduce partially ordered two-way B\"uchi automata and characterize their expressive power in terms of fragments of first-order logic FO[<]. Partially ordered two-way B\"uchi automata are B\"uchi automata which can change the direction in which the input is processed with the constraint that whenever a state is left, it is never re-entered again. Nondeterministic partially ordered two-way B\"uchi automata coincide with the first-order fragment Sigma2. Our main contribution is that deterministic partially ordered two-way B\"uchi automata are expressively complete for the first-order fragment Delta2. As an intermediate step, we show that deterministic partially ordered two-way B\"uchi automata are effectively closed under Boolean operations. A small model property yields coNP-completeness of the emptiness problem and the inclusion problem for deterministic partially ordered two-way B\"uchi automata.Comment: The results of this paper were presented at CIAA 2010; University of Stuttgart, Computer Scienc

arXiv.org e-Print Archive

Crossref

Truly Subquadratic-Time Extension Queries and Periodicity Detection in Strings with Uncertainties

Author: Iliopoulos Costas S.
Radoszewski Jakub
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)
Publication date: 01/01/2016
Field of study

Strings with don\u27t care symbols, also called partial words, and more general indeterminate strings are a natural representation of strings containing uncertain symbols. A considerable effort has been made to obtain efficient algorithms for pattern matching and periodicity detection in such strings. Among those, a number of algorithms have been proposed that behave well on random data, but still their worst-case running time is Theta(n^2). We present the first truly subquadratic-time solutions for a number of such problems on partial words that can also be adapted to indeterminate strings over a constant-sized alphabet. We show that

n

longest common compatible prefix queries (which correspond to longest common extension queries in regular strings) can be answered on-line in O(n * sqrt(n * log(n)) time after O(n * sqrt(n * log(n))-time preprocessing. We also present O(n * sqrt(n * log(n))-time algorithms for computing the prefix array and two types of border array of a partial word

Dagstuhl Research Online Publication Server

King's Research Portal

Small overlap monoids II: automatic structures and normal forms

Author: Kambites Mark
Publication venue
Publication date: 31/10/2008
Field of study

We show that any finite monoid or semigroup presentation satisfying the small overlap condition C(4) has word problem which is a deterministic rational relation. It follows that the set of lexicographically minimal words forms a regular language of normal forms, and that these normal forms can be computed in linear time. We also deduce that C(4) monoids and semigroups are rational (in the sense of Sakarovitch), asynchronous automatic, and word hyperbolic (in the sense of Duncan and Gilman). From this it follows that C(4) monoids satisfy analogues of Kleene's theorem, and admit decision algorithms for the rational subset and finitely generated submonoid membership problems. We also prove some automata-theoretic results which may be of independent interest.Comment: 17 page

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Repetitions in partial words

Author: Mercas Robert
Publication venue: 'Universitat Rovira I Virgili'
Publication date: 01/01/2010
Field of study

El objeto de esta tesis está representado por las repeticiones de palabras parciales, palabras que, además de las letras regulares, pueden tener un número de símbolos desconocidos,llamados símbolos "agujeros" o "no sé qué". Más concretamente, se presenta y se resuelve una extensión de la noción de repetición establecida por Axel Thue. Investigamos las palabras parciales con un número infinito de agujeros que cumplen estas propiedades y, también las palabras parciales que conservan las propiedades después de la inserción de un número arbitrario de agujeros, posiblemente infinito. Luego, hacemos un recuento del número máximo de 2-repeticiones distintas compatibles con los factores de una palabra parcial. Se demuestra que el problema en el caso general es difícil, y estudiamos el problema en el caso de un agujero. Al final, se estudian algunas propiedades de las palabras parciales sin fronteras y primitivas (palabras sin repeticiones) y se da una caracterización del lenguaje de palabras parciales con una factorización crítica

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Tesis Doctorals en Xarxa

Repositori Institucional URV

Longest common extension

Author: Bollobás B
Letzter S
Publication venue: 'Elsevier BV'
Publication date: 01/02/2018
Field of study

Given a word w of length n and i, j ∈ [n], the longest common extension is the longest substring starting at both i and j. In this note we estimate the average length of the longest common extension over all words w and all pairs (i, j), as well as the typical maximum length of the longest common extension. We also consider a variant of this problem, due to Blanchet-Sadri and Lazarow, in which the word is allowed to contain ‘holes’, which are special symbols functioning as ‘jokers’, i.e. are considered to be equal to any character. In particular, we estimate the average longest common extension over all words w with a small number of holes, extending a result by Blanchet-Sadri, Harred and Lazarow, and prove a similar result for words with holes appearing randomly

UCL Discovery