65 research outputs found
Monadic compositional parsing with context using Maltese as a case study
Combinator-based parsing using functional programming provides an elegant, and compositional approach to parser design and development. By its very nature, sensitivity to context usually fails to be properly addressed in these approaches. We identify two techniques to deal with context compositionally, particularly suited for natural language parsing. As case studies, we present parsers for Maltese definite nouns and conjugated verbs of the first form.peer-reviewe
Parsing for agile modeling
Agile modeling refers to a set of methods that allow for a quick initial development of an importer and its further refinement. These requirements are not met simultaneously by the current parsing technology. Problems with parsing became a bottleneck in our research of agile modeling.
In this thesis we introduce a novel approach to specify and build parsers. Our approach allows for expressive, tolerant and composable parsers without sacrificing performance. The approach is based on a context-sensitive extension of parsing expression grammars that allows a grammar engineer to specify complex language restrictions. To insure high parsing performance we automatically analyze a grammar definition and choose different parsing strategies for different parts of the grammar.
We show that context-sensitive parsing expression grammars allow for highly composable, tolerant and variable-grained parsers that can be easily refined. Different parsing strategies significantly insure high-performance of parsers without sacrificing expressiveness of the underlying grammars
LL(1) Parsing with Derivatives and Zippers
In this paper, we present an efficient, functional, and formally verified
parsing algorithm for LL(1) context-free expressions based on the concept of
derivatives of formal languages. Parsing with derivatives is an elegant parsing
technique, which, in the general case, suffers from cubic worst-case time
complexity and slow performance in practice. We specialise the parsing with
derivatives algorithm to LL(1) context-free expressions, where alternatives can
be chosen given a single token of lookahead. We formalise the notion of LL(1)
expressions and show how to efficiently check the LL(1) property. Next, we
present a novel linear-time parsing with derivatives algorithm for LL(1)
expressions operating on a zipper-inspired data structure. We prove the
algorithm correct in Coq and present an implementation as a parser combinators
framework in Scala, with enumeration and pretty printing capabilities.Comment: Appeared at PLDI'20 under the title "Zippy LL(1) Parsing with
Derivatives
Constructing applicative functors
Applicative functors define an interface to computation that is more general, and correspondingly weaker, than that of monads. First used in parser libraries, they are now seeing a wide range of applications. This paper sets out to explore the space of non-monadic applicative functors useful in programming. We work with a generalization, lax monoidal functors, and consider several methods of constructing useful functors of this type, just as transformers are used to construct computational monads. For example, coends, familiar to functional programmers as existential types, yield a range of useful applicative functors, including left Kan extensions. Other constructions are final fixed points, a limited sum construction, and a generalization of the semi-direct product of monoids. Implementations in Haskell are included where possible
A Typed, Algebraic Approach to Parsing
In this paper, we recall the definition of the context-free expressions (or µ-regular expressions), an algebraic presentation of the context-free languages. Then, we define a core type system for the context-free expressions which gives a compositional criterion for identifying those context-free expressions which can be parsed unambiguously by predictive algorithms in the style of recursive descent or LL(1).
Next, we show how these typed grammar expressions can be used to derive a parser combinator library which both guarantees linear-time parsing with no backtracking and single-token lookahead, and which respects the natural denotational semantics of context-free expressions. Finally, we show how to exploit the type information to write a staged version of this library, which produces dramatic increases in performance, even outperforming code generated by the standard parser generator tool ocamlyacc
Syntax Error Recovery in Parsing Expression Grammars
Parsing Expression Grammars (PEGs) are a formalism used to describe top-down
parsers with backtracking. As PEGs do not provide a good error recovery
mechanism, PEG-based parsers usually do not recover from syntax errors in the
input, or recover from syntax errors using ad-hoc, implementation-specific
features. The lack of proper error recovery makes PEG parsers unsuitable for
using with Integrated Development Environments (IDEs), which need to build
syntactic trees even for incomplete, syntactically invalid programs.
We propose a conservative extension, based on PEGs with labeled failures,
that adds a syntax error recovery mechanism for PEGs. This extension associates
recovery expressions to labels, where a label now not only reports a syntax
error but also uses this recovery expression to reach a synchronization point
in the input and resume parsing. We give an operational semantics of PEGs with
this recovery mechanism, and use an implementation based on such semantics to
build a robust parser for the Lua language. We evaluate the effectiveness of
this parser, alone and in comparison with a Lua parser with automatic error
recovery generated by ANTLR, a popular parser generator.Comment: Published on ACM Symposium On Applied Computing 201
- …