13 research outputs found

    Parikh's Theorem: A simple and direct automaton construction

    Full text link
    Parikh's theorem states that the Parikh image of a context-free language is semilinear or, equivalently, that every context-free language has the same Parikh image as some regular language. We present a very simple construction that, given a context-free grammar, produces a finite automaton recognizing such a regular language.Comment: 12 pages, 3 figure

    Latent Tree Language Model

    Full text link
    In this paper we introduce Latent Tree Language Model (LTLM), a novel approach to language modeling that encodes syntax and semantics of a given sentence as a tree of word roles. The learning phase iteratively updates the trees by moving nodes according to Gibbs sampling. We introduce two algorithms to infer a tree for a given sentence. The first one is based on Gibbs sampling. It is fast, but does not guarantee to find the most probable tree. The second one is based on dynamic programming. It is slower, but guarantees to find the most probable tree. We provide comparison of both algorithms. We combine LTLM with 4-gram Modified Kneser-Ney language model via linear interpolation. Our experiments with English and Czech corpora show significant perplexity reductions (up to 46% for English and 49% for Czech) compared with standalone 4-gram Modified Kneser-Ney language model.Comment: Accepted to EMNLP 201

    Finite Automata for the Sub- and Superword Closure of CFLs: Descriptional and Computational Complexity

    Full text link
    We answer two open questions by (Gruber, Holzer, Kutrib, 2009) on the state-complexity of representing sub- or superword closures of context-free grammars (CFGs): (1) We prove a (tight) upper bound of 2O(n)2^{\mathcal{O}(n)} on the size of nondeterministic finite automata (NFAs) representing the subword closure of a CFG of size nn. (2) We present a family of CFGs for which the minimal deterministic finite automata representing their subword closure matches the upper-bound of 22O(n)2^{2^{\mathcal{O}(n)}} following from (1). Furthermore, we prove that the inequivalence problem for NFAs representing sub- or superword-closed languages is only NP-complete as opposed to PSPACE-complete for general NFAs. Finally, we extend our results into an approximation method to attack inequivalence problems for CFGs

    Approximating Petri Net Reachability Along Context-free Traces

    Get PDF
    We investigate the problem asking whether the intersection of a context-free language (CFL) and a Petri net language (PNL) is empty. Our contribution to solve this long-standing problem which relates, for instance, to the reachability analysis of recursive programs over unbounded data domain, is to identify a class of CFLs called the finite-index CFLs for which the problem is decidable. The k-index approximation of a CFL can be obtained by discarding all the words that cannot be derived within a budget k on the number of occurrences of non-terminals. A finite-index CFL is thus a CFL which coincides with its k-index approximation for some k. We decide whether the intersection of a finite-index CFL and a PNL is empty by reducing it to the reachability problem of Petri nets with weak inhibitor arcs, a class of systems with infinitely many states for which reachability is known to be decidable. Conversely, we show that the reachability problem for a Petri net with weak inhibitor arcs reduces to the emptiness problem of a finite-index CFL intersected with a PNL.Comment: 16 page

    Certified Context-Free Parsing: A formalisation of Valiant's Algorithm in Agda

    Get PDF
    Valiant (1975) has developed an algorithm for recognition of context free languages. As of today, it remains the algorithm with the best asymptotic complexity for this purpose. In this paper, we present an algebraic specification, implementation, and proof of correctness of a generalisation of Valiant's algorithm. The generalisation can be used for recognition, parsing or generic calculation of the transitive closure of upper triangular matrices. The proof is certified by the Agda proof assistant. The certification is representative of state-of-the-art methods for specification and proofs in proof assistants based on type-theory. As such, this paper can be read as a tutorial for the Agda system

    Interprocedural Reachability for Flat Integer Programs

    Full text link
    We study programs with integer data, procedure calls and arbitrary call graphs. We show that, whenever the guards and updates are given by octagonal relations, the reachability problem along control flow paths within some language w1* ... wd* over program statements is decidable in Nexptime. To achieve this upper bound, we combine a program transformation into the same class of programs but without procedures, with an Np-completeness result for the reachability problem of procedure-less programs. Besides the program, the expression w1* ... wd* is also mapped onto an expression of a similar form but this time over the transformed program statements. Several arguments involving context-free grammars and their generative process enable us to give tight bounds on the size of the resulting expression. The currently existing gap between Np-hard and Nexptime can be closed to Np-complete when a certain parameter of the analysis is assumed to be constant.Comment: 38 pages, 1 figur
    corecore