55 research outputs found

    Practical LR Parser Generation

    Full text link
    Parsing is a fundamental building block in modern compilers, and for industrial programming languages, it is a surprisingly involved task. There are known approaches to generate parsers automatically, but the prevailing consensus is that automatic parser generation is not practical for real programming languages: LR/LALR parsers are considered to be far too restrictive in the grammars they support, and LR parsers are often considered too inefficient in practice. As a result, virtually all modern languages use recursive-descent parsers written by hand, a lengthy and error-prone process that dramatically increases the barrier to new programming language development. In this work we demonstrate that, contrary to the prevailing consensus, we can have the best of both worlds: for a very general, practical class of grammars -- a strict superset of Knuth's canonical LR -- we can generate parsers automatically, and the resulting parser code, as well as the generation procedure itself, is highly efficient. This advance relies on several new ideas, including novel automata optimization procedures; a new grammar transformation ("CPS"); per-symbol attributes; recursive-descent actions; and an extension of canonical LR parsing, which we refer to as XLR, which endows shift/reduce parsers with the power of bounded nondeterministic choice. With these ingredients, we can automatically generate efficient parsers for virtually all programming languages that are intuitively easy to parse -- a claim we support experimentally, by implementing the new algorithms in a new software tool called langcc, and running them on syntax specifications for Golang 1.17.8 and Python 3.9.12. The tool handles both languages automatically, and the generated code, when run on standard codebases, is 1.2x faster than the corresponding hand-written parser for Golang, and 4.3x faster than the CPython parser, respectively

    Object-oriented LR(1) parser generation

    Get PDF
    The LR parser has been around for a long time, and its workings, especially with respect to table compaction and use of the lookahead sets, have puzzled students who are new to the area of study. The aim of this project therefore is to provide an object oriented approach and discoverable algorithm to ease the difficulty of mastering these concepts. This will be accomplished by distributing the table interpreter across objects whose inter-relationships create an analogue to the state table. Hopefully this will provide a greater degree of readability and ease of trace throughout the parser generation process

    Reachability and error diagnosis in LR(1) automata

    Get PDF
    National audienceGiven an LR(1) automaton, what are the states in which an error can be detected? For each such " error state " , what is a minimal input sentence that causes an error in this state? We propose an algorithm that answers these questions. Such an algorithm allows building a collection of pairs of an erroneous input sentence and a diagnostic message, ensuring that this collection covers every error state, and maintaining this property as the grammar evolves. We report on an application of this technique to the CompCert ISO C99 parser, and discuss its strengths and limitations
    • …
    corecore