2,909 research outputs found
Parsing Expression Grammars Made Practical
Parsing Expression Grammars (PEGs) define languages by specifying
recursive-descent parser that recognises them. The PEG formalism exhibits
desirable properties, such as closure under composition, built-in
disambiguation, unification of syntactic and lexical concerns, and closely
matching programmer intuition. Unfortunately, state of the art PEG parsers
struggle with left-recursive grammar rules, which are not supported by the
original definition of the formalism and can lead to infinite recursion under
naive implementations. Likewise, support for associativity and explicit
precedence is spotty. To remedy these issues, we introduce Autumn, a general
purpose PEG library that supports left-recursion, left and right associativity
and precedence rules, and does so efficiently. Furthermore, we identify infix
and postfix operators as a major source of inefficiency in left-recursive PEG
parsers and show how to tackle this problem. We also explore the extensibility
of the PEG paradigm by showing how one can easily introduce new parsing
operators and how our parser accommodates custom memoization and error handling
strategies. We compare our parser to both state of the art and battle-tested
PEG and CFG parsers, such as Rats!, Parboiled and ANTLR.Comment: "Proceedings of the International Conference on Software Language
Engineering (SLE 2015)" - 167-172 (ISBN : 978-1-4503-3686-4
Left Recursion in Parsing Expression Grammars
Parsing Expression Grammars (PEGs) are a formalism that can describe all
deterministic context-free languages through a set of rules that specify a
top-down parser for some language. PEGs are easy to use, and there are
efficient implementations of PEG libraries in several programming languages.
A frequently missed feature of PEGs is left recursion, which is commonly used
in Context-Free Grammars (CFGs) to encode left-associative operations. We
present a simple conservative extension to the semantics of PEGs that gives
useful meaning to direct and indirect left-recursive rules, and show that our
extensions make it easy to express left-recursive idioms from CFGs in PEGs,
with similar results. We prove the conservativeness of these extensions, and
also prove that they work with any left-recursive PEG.
PEGs can also be compiled to programs in a low-level parsing machine. We
present an extension to the semantics of the operations of this parsing machine
that let it interpret left-recursive PEGs, and prove that this extension is
correct with regards to our semantics for left-recursive PEGs.Comment: Extended version of the paper "Left Recursion in Parsing Expression
Grammars", that was published on 2012 Brazilian Symposium on Programming
Language
Generalizing input-driven languages: theoretical and practical benefits
Regular languages (RL) are the simplest family in Chomsky's hierarchy. Thanks
to their simplicity they enjoy various nice algebraic and logic properties that
have been successfully exploited in many application fields. Practically all of
their related problems are decidable, so that they support automatic
verification algorithms. Also, they can be recognized in real-time.
Context-free languages (CFL) are another major family well-suited to
formalize programming, natural, and many other classes of languages; their
increased generative power w.r.t. RL, however, causes the loss of several
closure properties and of the decidability of important problems; furthermore
they need complex parsing algorithms. Thus, various subclasses thereof have
been defined with different goals, spanning from efficient, deterministic
parsing to closure properties, logic characterization and automatic
verification techniques.
Among CFL subclasses, so-called structured ones, i.e., those where the
typical tree-structure is visible in the sentences, exhibit many of the
algebraic and logic properties of RL, whereas deterministic CFL have been
thoroughly exploited in compiler construction and other application fields.
After surveying and comparing the main properties of those various language
families, we go back to operator precedence languages (OPL), an old family
through which R. Floyd pioneered deterministic parsing, and we show that they
offer unexpected properties in two fields so far investigated in totally
independent ways: they enable parsing parallelization in a more effective way
than traditional sequential parsers, and exhibit the same algebraic and logic
properties so far obtained only for less expressive language families
Higher-Order Operator Precedence Languages
Floyd's Operator Precedence (OP) languages are a deterministic context-free
family having many desirable properties. They are locally and parallely
parsable, and languages having a compatible structure are closed under Boolean
operations, concatenation and star; they properly include the family of Visibly
Pushdown (or Input Driven) languages. OP languages are based on three relations
between any two consecutive terminal symbols, which assign syntax structure to
words. We extend such relations to k-tuples of consecutive terminal symbols, by
using the model of strictly locally testable regular languages of order k at
least 3. The new corresponding class of Higher-order Operator Precedence
languages (HOP) properly includes the OP languages, and it is still included in
the deterministic (also in reverse) context free family. We prove Boolean
closure for each subfamily of structurally compatible HOP languages. In each
subfamily, the top language is called max-language. We show that such languages
are defined by a simple cancellation rule and we prove several properties, in
particular that max-languages make an infinite hierarchy ordered by parameter
k. HOP languages are a candidate for replacing OP languages in the various
applications where they have have been successful though sometimes too
restrictive.Comment: In Proceedings AFL 2017, arXiv:1708.0622
Simple chain grammars and languages
A subclass of the LR(0)-grammars, the class of simple chain grammars is introduced. Although there exist simple chain grammars which are not LL(k) for any k>0, this new class of grammars is very closely related to the LL(1) and simple LL(1) grammars. In fact it can be shown that every simple chain grammar has an equivalent simple LL(1) grammar. Cover properties for simple chain grammars are investigated and a deterministic pushdown transducer which acts as a right parser for simple chain grammars is presented
- …