6 research outputs found

    Faster scannerless GLR parsing

    Get PDF
    Analysis and renovation of large software portfolios requires syntax analysis of multiple, usually embedded, languages and this is beyond the capabilities of many standard parsing techniques. The traditional separation between lexer and parser falls short due to the limitations of tokenization based on regular expressions when handling multiple lexical grammars. In such cases scannerless parsing provides a viable solution. It uses the power of context-free grammars to be able to deal with a wide variety of issues in parsing lexical syntax. However, it comes at the price of less efficiency. The structure of tokens is obtained using a more powerful but more time and memory intensive parsing algorithm. Scannerless grammars are also more non-deterministic than their tokenized counterparts, increasing the burden on the parsing algorithm even further. In this paper we investigate the application of the Right-Nulled Generalized LR parsing algorithm (RNGLR) to scannerless parsing. We adapt the Scannerless Generalized LR parsing and filtering algorithm (SGLR) to implement the optimizations of RNGLR. We present an updated parsing and filtering algorithm, called SRNGLR, and analyze its performance in comparison to SGLR on ambiguous grammars for the programming languages C, Java, Python, SASL, and C++. Measurements show that SRNGLR is on average 33% faster than SGLR, but is 95% faster on the highly ambiguous SASL grammar. For the mainstream languages C, C++, Java and Python the average speedup is 16%

    Evaluating GLR parsing algorithms

    Get PDF
    AbstractWe describe the behaviour of three variants of GLR parsing: (i) Farshi’s original correction to Tomita’s non-general algorithm; (ii) the Right Nulled GLR algorithm which provides a more efficient generalisation of Tomita and (iii) the Binary Right Nulled GLR algorithm, on three types of LR table. We present a guide to the parse-time behaviour of these algorithms which illustrates the inefficiencies in conventional Farshi-style GLR parsing. We also describe the tool GTB (Grammar Tool Box) which provides a platform for comparative studies of parsing algorithms; and use GTB to exercise the three GLR algorithms running with LR(0), SLR(1) and LR(1) tables for ANSI-C, ISO-Pascal and IBM VS-COBOL. We give results showing the size of the structures constructed by these parsers and the amount of searching required during the parse, which abstracts their runtime
    corecore