630 research outputs found
Left Recursion in Parsing Expression Grammars
Parsing Expression Grammars (PEGs) are a formalism that can describe all
deterministic context-free languages through a set of rules that specify a
top-down parser for some language. PEGs are easy to use, and there are
efficient implementations of PEG libraries in several programming languages.
A frequently missed feature of PEGs is left recursion, which is commonly used
in Context-Free Grammars (CFGs) to encode left-associative operations. We
present a simple conservative extension to the semantics of PEGs that gives
useful meaning to direct and indirect left-recursive rules, and show that our
extensions make it easy to express left-recursive idioms from CFGs in PEGs,
with similar results. We prove the conservativeness of these extensions, and
also prove that they work with any left-recursive PEG.
PEGs can also be compiled to programs in a low-level parsing machine. We
present an extension to the semantics of the operations of this parsing machine
that let it interpret left-recursive PEGs, and prove that this extension is
correct with regards to our semantics for left-recursive PEGs.Comment: Extended version of the paper "Left Recursion in Parsing Expression
Grammars", that was published on 2012 Brazilian Symposium on Programming
Language
On the Relation between Context-Free Grammars and Parsing Expression Grammars
Context-Free Grammars (CFGs) and Parsing Expression Grammars (PEGs) have
several similarities and a few differences in both their syntax and semantics,
but they are usually presented through formalisms that hinder a proper
comparison. In this paper we present a new formalism for CFGs that highlights
the similarities and differences between them. The new formalism borrows from
PEGs the use of parsing expressions and the recognition-based semantics. We
show how one way of removing non-determinism from this formalism yields a
formalism with the semantics of PEGs. We also prove, based on these new
formalisms, how LL(1) grammars define the same language whether interpreted as
CFGs or as PEGs, and also show how strong-LL(k), right-linear, and LL-regular
grammars have simple language-preserving translations from CFGs to PEGs
TRX: A Formally Verified Parser Interpreter
Parsing is an important problem in computer science and yet surprisingly
little attention has been devoted to its formal verification. In this paper, we
present TRX: a parser interpreter formally developed in the proof assistant
Coq, capable of producing formally correct parsers. We are using parsing
expression grammars (PEGs), a formalism essentially representing recursive
descent parsing, which we consider an attractive alternative to context-free
grammars (CFGs). From this formalization we can extract a parser for an
arbitrary PEG grammar with the warranty of total correctness, i.e., the
resulting parser is terminating and correct with respect to its grammar and the
semantics of PEGs; both properties formally proven in Coq.Comment: 26 pages, LMC
Parsing Expression Grammars Made Practical
Parsing Expression Grammars (PEGs) define languages by specifying
recursive-descent parser that recognises them. The PEG formalism exhibits
desirable properties, such as closure under composition, built-in
disambiguation, unification of syntactic and lexical concerns, and closely
matching programmer intuition. Unfortunately, state of the art PEG parsers
struggle with left-recursive grammar rules, which are not supported by the
original definition of the formalism and can lead to infinite recursion under
naive implementations. Likewise, support for associativity and explicit
precedence is spotty. To remedy these issues, we introduce Autumn, a general
purpose PEG library that supports left-recursion, left and right associativity
and precedence rules, and does so efficiently. Furthermore, we identify infix
and postfix operators as a major source of inefficiency in left-recursive PEG
parsers and show how to tackle this problem. We also explore the extensibility
of the PEG paradigm by showing how one can easily introduce new parsing
operators and how our parser accommodates custom memoization and error handling
strategies. We compare our parser to both state of the art and battle-tested
PEG and CFG parsers, such as Rats!, Parboiled and ANTLR.Comment: "Proceedings of the International Conference on Software Language
Engineering (SLE 2015)" - 167-172 (ISBN : 978-1-4503-3686-4
Capturing CFLs with Tree Adjoining Grammars
We define a decidable class of TAGs that is strongly equivalent to CFGs and
is cubic-time parsable. This class serves to lexicalize CFGs in the same manner
as the LCFGs of Schabes and Waters but with considerably less restriction on
the form of the grammars. The class provides a normal form for TAGs that
generate local sets in much the same way that regular grammars provide a normal
form for CFGs that generate regular sets.Comment: 8 pages, 3 figures. To appear in proceedings of ACL'9
Graph Interpolation Grammars as Context-Free Automata
A derivation step in a Graph Interpolation Grammar has the effect of scanning
an input token. This feature, which aims at emulating the incrementality of the
natural parser, restricts the formal power of GIGs. This contrasts with the
fact that the derivation mechanism involves a context-sensitive device similar
to tree adjunction in TAGs. The combined effect of input-driven derivation and
restricted context-sensitiveness would be conceivably unfortunate if it turned
out that Graph Interpolation Languages did not subsume Context Free Languages
while being partially context-sensitive. This report sets about examining
relations between CFGs and GIGs, and shows that GILs are a proper superclass of
CFLs. It also brings out a strong equivalence between CFGs and GIGs for the
class of CFLs. Thus, it lays the basis for meaningfully investigating the
amount of context-sensitiveness supported by GIGs, but leaves this
investigation for further research
Efficient Monitoring of Parametric Context Free Patterns
Recent developments in runtime verification and monitoring show that parametric regular and temporal logic specifications can be efficiently monitored against large programs. However, these logics reduce to ordinary finite automata, limiting their expressivity. For example, neither can specify structured properties that refer to the call stack of the program. While context-free grammars (CFGs) are expressive and well-understood, existing techniques of monitoring CFGs generate massive runtime overhead in real-life applications. This paper shows for the first time that monitoring parametric CFGs is practical (on the order of 10% or lower for average cases, several times faster than the state-of-the-art). We present a monitor synthesis algorithm for CFGs based on an LR(1) parsing algorithm, modified with stack cloning to account for good prefix matching. In addition, a logic-independent mechanism is introduced to support partial matching, allowing patterns to be checked against fragments of execution traces
- …