11 research outputs found

    On lexicographic enumeration of regular and context-free languages

    Get PDF
    We show that it is possible to efficiently enumerate the words of a regular language in lexicographic order. The time needed for generating the next word is O(n) when enumerating words of length n. We also define a class of context-free languages for which efficient enumeration is possible

    Acta Cybernetica : Volume 13. Number 1.

    Get PDF

    Grammars for Document Spanners

    Get PDF
    We propose a new grammar-based language for defining information-extractors from documents (text) that is built upon the well-studied framework of document spanners for extracting structured data from text. While previously studied formalisms for document spanners are mainly based on regular expressions, we use an extension of context-free grammars, called {extraction grammars}, to define the new class of context-free spanners. Extraction grammars are simply context-free grammars extended with variables that capture interval positions of the document, namely spans. While regular expressions are efficient for tokenizing and tagging, context-free grammars are also efficient for capturing structural properties. Indeed, we show that context-free spanners are strictly more expressive than their regular counterparts. We reason about the expressive power of our new class and present a pushdown-automata model that captures it. We show that extraction grammars can be evaluated with polynomial data complexity. Nevertheless, as the degree of the polynomial depends on the query, we present an enumeration algorithm for unambiguous extraction grammars that, after quintic preprocessing, outputs the results sequentially, without repetitions, with a constant delay between every two consecutive ones

    Enumerating Regular Languages with Bounded Delay

    Get PDF

    Detecting palindromes, patterns, and borders in regular languages

    Get PDF
    Given a language L and a nondeterministic finite automaton M, we consider whether we can determine efficiently (in the size of M) if M accepts at least one word in L, or infinitely many words. Given that M accepts at least one word in L, we consider how long a shortest word can be. The languages L that we examine include the palindromes, the non-palindromes, the k-powers, the non-k-powers, the powers, the non-powers (also called primitive words), the words matching a general pattern, the bordered words, and the unbordered words.Comment: Full version of a paper submitted to LATA 2008. This is a new version with John Loftus added as a co-author and containing new results on unbordered word

    Evaluation and Enumeration Problems for Regular Path Queries

    Get PDF
    Regular path queries (RPQs) are a central component of graph databases. We investigate decision- and enumeration problems concerning the evaluation of RPQs under several semantics that have recently been considered: arbitrary paths, shortest paths, and simple paths. Whereas arbitrary and shortest paths can be enumerated in polynomial delay, the situation is much more intricate for simple paths. For instance, already the question if a given graph contains a simple path of a certain length has cases with highly non-trivial solutions and cases that are long-standing open problems. We study RPQ evaluation for simple paths from a parameterized complexity perspective and define a class of simple transitive expressions that is prominent in practice and for which we can prove a dichotomy for the evaluation problem. We observe that, even though simple path semantics is intractable for RPQs in general, it is feasible for the vast majority of RPQs that are used in practice. At the heart of our study on simple paths is a result of independent interest: the two disjoint paths problem in directed graphs is W[1]-hard if parameterized by the length of one of the two paths

    On the structural and combinatorial properties in 2-swap word permutation graphs

    Get PDF
    In this paper, we study the graph induced by the 2-swap\textit{2-swap} permutation on words with a fixed Parikh vector. A 22-swap is defined as a pair of positions s=(i,j)s = (i, j) where the word ww induced by the swap ss on vv is v[1]v[2]v[i1]v[j]v[i+1]v[j1]v[i]v[j+1]v[n]v[1] v[2] \dots v[i - 1] v[j] v[i+1] \dots v[j - 1] v[i] v[j + 1] \dots v[n]. With these permutations, we define the Configuration Graph\textit{Configuration Graph}, G(P)G(P) defined over a given Parikh vector. Each vertex in G(P)G(P) corresponds to a unique word with the Parikh vector PP, with an edge between any pair of words vv and ww if there exists a swap ss such that vs=wv \circ s = w. We provide several key combinatorial properties of this graph, including the exact diameter of this graph, the clique number of the graph, and the relationships between subgraphs within this graph. Additionally, we show that for every vertex in the graph, there exists a Hamiltonian path starting at this vertex. Finally, we provide an algorithm enumerating these paths from a given input word of length nn with a delay of at most O(logn)O(\log n) between outputting edges, requiring O(nlogn)O(n \log n) preprocessing
    corecore