2 research outputs found
Don't Panic! Better, Fewer, Syntax Errors for LR Parsers
Syntax errors are generally easy to fix for humans, but not for parsers in
general nor LR parsers in particular. Traditional 'panic mode' error recovery,
though easy to implement and applicable to any grammar, often leads to a
cascading chain of errors that drown out the original. More advanced error
recovery techniques suffer less from this problem but have seen little
practical use because their typical performance was seen as poor, their worst
case unbounded, and the repairs they reported arbitrary. In this paper we
introduce the CPCT+ algorithm, and an implementation of that algorithm, that
address these issues. First, CPCT+ reports the complete set of minimum cost
repair sequences for a given location, allowing programmers to select the one
that best fits their intention. Second, on a corpus of 200,000 real-world
syntactically invalid Java programs, CPCT+ is able to repair 98.37% of files
within a timeout of 0.5s. Finally, CPCT+ uses the complete set of minimum cost
repair sequences to reduce the cascading error problem, where incorrect error
recovery causes further spurious syntax errors to be identified. Across the
test corpus, CPCT+ reports 435,812 error locations to the user, reducing the
cascading error problem substantially relative to the 981,628 error locations
reported by panic mode.Comment: 32 pages, 18 figure
Default Disambiguation for Online Parsers
Since composed grammars are often ambiguous, grammar composition requires a
mechanism for dealing with ambiguity: either ruling it out by using delimiters
(which are awkward to work with), or by using disambiguation operators to
filter a parse forest down to a single parse tree (where, in general, we cannot
be sure that we have covered all possible parse forests). In this paper, we
show that default disambiguation, which is inappropriate for batch parsing,
works well for online parsing, where it can be overridden by the user if
necessary. We extend language boxes -- a delimiter-based algorithm atop
incremental parsing -- in such a way that default disambiguation can
automatically insert, remove, or resize, language boxes, leading to the
automatic language boxes algorithm. The nature of the problem means that
default disambiguation cannot always match a user's intention. However, our
experimental evaluation shows that automatic language boxes behave acceptably
in 98.8% of tests involving compositions of real-world programming languages.Comment: 14 pages, 6 tables, 8 figures. Note: This reverts this paper back to
v1 (which was accidentally replaced with a different paper