15,297 research outputs found

    On lexicographic enumeration of regular and context-free languages

    Get PDF
    We show that it is possible to efficiently enumerate the words of a regular language in lexicographic order. The time needed for generating the next word is O(n) when enumerating words of length n. We also define a class of context-free languages for which efficient enumeration is possible

    Joining Extractions of Regular Expressions

    Get PDF
    Regular expressions with capture variables, also known as "regex formulas," extract relations of spans (interval positions) from text. These relations can be further manipulated via Relational Algebra as studied in the context of document spanners, Fagin et al.'s formal framework for information extraction. We investigate the complexity of querying text by Conjunctive Queries (CQs) and Unions of CQs (UCQs) on top of regex formulas. We show that the lower bounds (NP-completeness and W[1]-hardness) from the relational world also hold in our setting; in particular, hardness hits already single-character text! Yet, the upper bounds from the relational world do not carry over. Unlike the relational world, acyclic CQs, and even gamma-acyclic CQs, are hard to compute. The source of hardness is that it may be intractable to instantiate the relation defined by a regex formula, simply because it has an exponential number of tuples. Yet, we are able to establish general upper bounds. In particular, UCQs can be evaluated with polynomial delay, provided that every CQ has a bounded number of atoms (while unions and projection can be arbitrary). Furthermore, UCQ evaluation is solvable with FPT (Fixed-Parameter Tractable) delay when the parameter is the size of the UCQ

    On the Structure and Complexity of Rational Sets of Regular Languages

    Get PDF
    In a recent thread of papers, we have introduced FQL, a precise specification language for test coverage, and developed the test case generation engine FShell for ANSI C. In essence, an FQL test specification amounts to a set of regular languages, each of which has to be matched by at least one test execution. To describe such sets of regular languages, the FQL semantics uses an automata-theoretic concept known as rational sets of regular languages (RSRLs). RSRLs are automata whose alphabet consists of regular expressions. Thus, the language accepted by the automaton is a set of regular expressions. In this paper, we study RSRLs from a theoretic point of view. More specifically, we analyze RSRL closure properties under common set theoretic operations, and the complexity of membership checking, i.e., whether a regular language is an element of a RSRL. For all questions we investigate both the general case and the case of finite sets of regular languages. Although a few properties are left as open problems, the paper provides a systematic semantic foundation for the test specification language FQL

    Finite Automata for the Sub- and Superword Closure of CFLs: Descriptional and Computational Complexity

    Full text link
    We answer two open questions by (Gruber, Holzer, Kutrib, 2009) on the state-complexity of representing sub- or superword closures of context-free grammars (CFGs): (1) We prove a (tight) upper bound of 2O(n)2^{\mathcal{O}(n)} on the size of nondeterministic finite automata (NFAs) representing the subword closure of a CFG of size nn. (2) We present a family of CFGs for which the minimal deterministic finite automata representing their subword closure matches the upper-bound of 22O(n)2^{2^{\mathcal{O}(n)}} following from (1). Furthermore, we prove that the inequivalence problem for NFAs representing sub- or superword-closed languages is only NP-complete as opposed to PSPACE-complete for general NFAs. Finally, we extend our results into an approximation method to attack inequivalence problems for CFGs

    Coding-theorem Like Behaviour and Emergence of the Universal Distribution from Resource-bounded Algorithmic Probability

    Full text link
    Previously referred to as `miraculous' in the scientific literature because of its powerful properties and its wide application as optimal solution to the problem of induction/inference, (approximations to) Algorithmic Probability (AP) and the associated Universal Distribution are (or should be) of the greatest importance in science. Here we investigate the emergence, the rates of emergence and convergence, and the Coding-theorem like behaviour of AP in Turing-subuniversal models of computation. We investigate empirical distributions of computing models in the Chomsky hierarchy. We introduce measures of algorithmic probability and algorithmic complexity based upon resource-bounded computation, in contrast to previously thoroughly investigated distributions produced from the output distribution of Turing machines. This approach allows for numerical approximations to algorithmic (Kolmogorov-Chaitin) complexity-based estimations at each of the levels of a computational hierarchy. We demonstrate that all these estimations are correlated in rank and that they converge both in rank and values as a function of computational power, despite fundamental differences between computational models. In the context of natural processes that operate below the Turing universal level because of finite resources and physical degradation, the investigation of natural biases stemming from algorithmic rules may shed light on the distribution of outcomes. We show that up to 60\% of the simplicity/complexity bias in distributions produced even by the weakest of the computational models can be accounted for by Algorithmic Probability in its approximation to the Universal Distribution.Comment: 27 pages main text, 39 pages including supplement. Online complexity calculator: http://complexitycalculator.com

    Dynamic programming for graphs on surfaces

    Get PDF
    We provide a framework for the design and analysis of dynamic programming algorithms for surface-embedded graphs on n vertices and branchwidth at most k. Our technique applies to general families of problems where standard dynamic programming runs in 2O(k·log k). Our approach combines tools from topological graph theory and analytic combinatorics.Postprint (updated version
    • …
    corecore