11 research outputs found
On lexicographic enumeration of regular and context-free languages
We show that it is possible to efficiently enumerate the words of a regular language in lexicographic order. The time needed for generating the next word is O(n) when enumerating words of length n. We also define a class of context-free languages for which efficient enumeration is possible
Grammars for Document Spanners
We propose a new grammar-based language for defining information-extractors from documents (text) that is built upon the well-studied framework of document spanners for extracting structured data from text. While previously studied formalisms for document spanners are mainly based on regular expressions, we use an extension of context-free grammars, called {extraction grammars}, to define the new class of context-free spanners. Extraction grammars are simply context-free grammars extended with variables that capture interval positions of the document, namely spans. While regular expressions are efficient for tokenizing and tagging, context-free grammars are also efficient for capturing structural properties. Indeed, we show that context-free spanners are strictly more expressive than their regular counterparts. We reason about the expressive power of our new class and present a pushdown-automata model that captures it. We show that extraction grammars can be evaluated with polynomial data complexity. Nevertheless, as the degree of the polynomial depends on the query, we present an enumeration algorithm for unambiguous extraction grammars that, after quintic preprocessing, outputs the results sequentially, without repetitions, with a constant delay between every two consecutive ones
Detecting palindromes, patterns, and borders in regular languages
Given a language L and a nondeterministic finite automaton M, we consider
whether we can determine efficiently (in the size of M) if M accepts at least
one word in L, or infinitely many words. Given that M accepts at least one word
in L, we consider how long a shortest word can be. The languages L that we
examine include the palindromes, the non-palindromes, the k-powers, the
non-k-powers, the powers, the non-powers (also called primitive words), the
words matching a general pattern, the bordered words, and the unbordered words.Comment: Full version of a paper submitted to LATA 2008. This is a new version
with John Loftus added as a co-author and containing new results on
unbordered word
Evaluation and Enumeration Problems for Regular Path Queries
Regular path queries (RPQs) are a central component of graph databases. We investigate decision- and enumeration problems concerning the evaluation of RPQs under several semantics that have recently been considered: arbitrary paths, shortest paths, and simple paths. Whereas arbitrary and shortest paths can be enumerated in polynomial delay, the situation is much more intricate for simple paths. For instance, already the question if a given graph contains a simple path of a certain length has cases with highly non-trivial solutions and cases that are long-standing open problems. We study RPQ evaluation for simple paths from a parameterized complexity perspective and define a class of simple transitive expressions that is prominent in practice and for which we can prove a dichotomy for the evaluation problem. We observe that, even though simple path semantics is intractable for RPQs in general, it is feasible for the vast majority of RPQs that are used in practice. At the heart of our study on simple paths is a result of independent interest: the two disjoint paths problem in directed graphs is W[1]-hard if parameterized by the length of one of the two paths
On the structural and combinatorial properties in 2-swap word permutation graphs
In this paper, we study the graph induced by the
permutation on words with a fixed Parikh vector. A -swap is defined as a
pair of positions where the word induced by the swap on
is . With these permutations, we define the ,
defined over a given Parikh vector. Each vertex in corresponds to
a unique word with the Parikh vector , with an edge between any pair of
words and if there exists a swap such that . We
provide several key combinatorial properties of this graph, including the exact
diameter of this graph, the clique number of the graph, and the relationships
between subgraphs within this graph. Additionally, we show that for every
vertex in the graph, there exists a Hamiltonian path starting at this vertex.
Finally, we provide an algorithm enumerating these paths from a given input
word of length with a delay of at most between outputting
edges, requiring preprocessing