61 research outputs found

    flap: A Deterministic Parser with Fused Lexing

    Full text link
    Lexers and parsers are typically defined separately and connected by a token stream. This separate definition is important for modularity and reduces the potential for parsing ambiguity. However, materializing tokens as data structures and case-switching on tokens comes with a cost. We show how to fuse separately-defined lexers and parsers, drastically improving performance without compromising modularity or increasing ambiguity. We propose a deterministic variant of Greibach Normal Form that ensures deterministic parsing with a single token of lookahead and makes fusion strikingly simple, and prove that normalizing context free expressions into the deterministic normal form is semantics-preserving. Our staged parser combinator library, flap, provides a standard interface, but generates specialized token-free code that runs two to six times faster than ocamlyacc on a range of benchmarks.Comment: PLDI 2023 with appendi

    Turchin's Relation for Call-by-Name Computations: A Formal Approach

    Full text link
    Supercompilation is a program transformation technique that was first described by V. F. Turchin in the 1970s. In supercompilation, Turchin's relation as a similarity relation on call-stack configurations is used both for call-by-value and call-by-name semantics to terminate unfolding of the program being transformed. In this paper, we give a formal grammar model of call-by-name stack behaviour. We classify the model in terms of the Chomsky hierarchy and then formally prove that Turchin's relation can terminate all computations generated by the model.Comment: In Proceedings VPT 2016, arXiv:1607.0183

    Prime normal form and equivalence of simple grammars

    Get PDF
    AbstractA prefix-free language is prime if it cannot be decomposed into a concatenation of two prefix-free languages. We show that we can check in polynomial time if a language generated by a simple context-free grammar is prime. Our algorithm computes a canonical representation of a simple language, converting its arbitrary simple grammar into prime normal form (PNF); a simple grammar is in PNF if all its nonterminals define primes. We also improve the complexity of testing the equivalence of simple grammars. The best previously known algorithm for this problem worked in O(n13) time. We improve it to O(n7log2n) and O(n5polylogv) time, where n is the total size of the grammars involved, and v is the length of a shortest string derivable from a nonterminal, maximized over all nonterminals

    Probabilistic Parsing Strategies

    Full text link
    We present new results on the relation between purely symbolic context-free parsing strategies and their probabilistic counter-parts. Such parsing strategies are seen as constructions of push-down devices from grammars. We show that preservation of probability distribution is possible under two conditions, viz. the correct-prefix property and the property of strong predictiveness. These results generalize existing results in the literature that were obtained by considering parsing strategies in isolation. From our general results we also derive negative results on so-called generalized LR parsing.Comment: 36 pages, 1 figur

    Context-Free Grammars: Covers, Normal Forms, and Parsing

    Get PDF

    Separation of Test-Free Propositional Dynamic Logics over Context-Free Languages

    Full text link
    For a class L of languages let PDL[L] be an extension of Propositional Dynamic Logic which allows programs to be in a language of L rather than just to be regular. If L contains a non-regular language, PDL[L] can express non-regular properties, in contrast to pure PDL. For regular, visibly pushdown and deterministic context-free languages, the separation of the respective PDLs can be proven by automata-theoretic techniques. However, these techniques introduce non-determinism on the automata side. As non-determinism is also the difference between DCFL and CFL, these techniques seem to be inappropriate to separate PDL[DCFL] from PDL[CFL]. Nevertheless, this separation is shown but for programs without test operators.Comment: In Proceedings GandALF 2011, arXiv:1106.081

    A formalisation of the theory of context-free languages in higher order logic

    No full text
    We present a formalisation of the theory of context-free languages using the HOL4 theorem prover. The formalisation of this theory is not only interesting in its own right, but also gives insight into the kind of manipulations required to port a pen-and-paper proof to a theorem prover. The mechanisation proves to be an ideal case study of how intuitive textbook proofs can blow up in size and complexity, and how details from the textbook can change during formalisation. The mechanised theory provides the groundwork for our subsequent results about SLR parser generation. The theorems, even though well-established in the field, are interesting for the way they have to be “reproven” in a theorem prover. Proofs must be recast to be concrete enough for the prover: patching deductive gaps which are relatively easily grasped in a text proof, but beyond the automatic capabilities of contemporary tools. The library of proofs, techniques and notations developed here provides a basis from which further work on verified language theory can proceed at a quickened pace. We have mechanised classical results involving context-free grammars and pushdown automata. These include but are not limited to the equivalence between those two formalisms, the normalisation of CFGs, and the pumping lemma for proving a language is not context-free. As an application of this theory, we describe the verification of SLR parsing. Among the various properties proven about the parser we show, in particular, soundness: if the parser results in a parse tree on a given input, then the parse tree is valid with respect to the grammar, and the leaves of the parse tree match the input; and completeness: if the input belongs in the language of the grammar then the parser constructs the correct parse tree for the input with respect to the grammar. In addition, we develop a version of the algorithm that is executable by automatic translation from HOL to SML. This alternative version of the algorithm requires some interesting termination proofs. We conclude with a discussion of the issues we faced while mechanising pen-and-paper proofs. Carefully written formal proofs are regarded as rigorous for the audience they target. But when such proofs are implemented in a theorem prover, the level of detail required increases dramatically. We provide a discussion and a broad categorisation of the causes that give rise to this

    Context-Free Grammars: Covers, Normal Forms, and Parsing

    Get PDF
    This monograph develops a theory of grammatical covers, normal forms and parsing. Covers, formally defined in 1969, describe a relation between the sets of parses of two context-free grammars. If this relation exists then in a formal model of parsing it is possible to have, except for the output, for both grammars the same parser. Questions concerning the possibility to cover a certain grammar with grammars that conform to some requirements on the productions or the derivations will be raised and answered. Answers to these cover problems will be obtained by introducing algorithms that describe a transformation of an input grammar into an output grammar which satisfies the requirements. The main emphasis in this monograph is on transformations of context-free grammars to context-free grammars in some normal form. However, not only transformations of this kind will be discussed, but also transformations which yield grammars which have useful parsing properties
    corecore