2 research outputs found

    Don't Panic! Better, Fewer, Syntax Errors for LR Parsers

    Full text link
    Syntax errors are generally easy to fix for humans, but not for parsers in general nor LR parsers in particular. Traditional 'panic mode' error recovery, though easy to implement and applicable to any grammar, often leads to a cascading chain of errors that drown out the original. More advanced error recovery techniques suffer less from this problem but have seen little practical use because their typical performance was seen as poor, their worst case unbounded, and the repairs they reported arbitrary. In this paper we introduce the CPCT+ algorithm, and an implementation of that algorithm, that address these issues. First, CPCT+ reports the complete set of minimum cost repair sequences for a given location, allowing programmers to select the one that best fits their intention. Second, on a corpus of 200,000 real-world syntactically invalid Java programs, CPCT+ is able to repair 98.37% of files within a timeout of 0.5s. Finally, CPCT+ uses the complete set of minimum cost repair sequences to reduce the cascading error problem, where incorrect error recovery causes further spurious syntax errors to be identified. Across the test corpus, CPCT+ reports 435,812 error locations to the user, reducing the cascading error problem substantially relative to the 981,628 error locations reported by panic mode.Comment: 32 pages, 18 figure

    Default Disambiguation for Online Parsers

    Full text link
    Since composed grammars are often ambiguous, grammar composition requires a mechanism for dealing with ambiguity: either ruling it out by using delimiters (which are awkward to work with), or by using disambiguation operators to filter a parse forest down to a single parse tree (where, in general, we cannot be sure that we have covered all possible parse forests). In this paper, we show that default disambiguation, which is inappropriate for batch parsing, works well for online parsing, where it can be overridden by the user if necessary. We extend language boxes -- a delimiter-based algorithm atop incremental parsing -- in such a way that default disambiguation can automatically insert, remove, or resize, language boxes, leading to the automatic language boxes algorithm. The nature of the problem means that default disambiguation cannot always match a user's intention. However, our experimental evaluation shows that automatic language boxes behave acceptably in 98.8% of tests involving compositions of real-world programming languages.Comment: 14 pages, 6 tables, 8 figures. Note: This reverts this paper back to v1 (which was accidentally replaced with a different paper
    corecore