2,815 research outputs found
On the Relation between Context-Free Grammars and Parsing Expression Grammars
Context-Free Grammars (CFGs) and Parsing Expression Grammars (PEGs) have
several similarities and a few differences in both their syntax and semantics,
but they are usually presented through formalisms that hinder a proper
comparison. In this paper we present a new formalism for CFGs that highlights
the similarities and differences between them. The new formalism borrows from
PEGs the use of parsing expressions and the recognition-based semantics. We
show how one way of removing non-determinism from this formalism yields a
formalism with the semantics of PEGs. We also prove, based on these new
formalisms, how LL(1) grammars define the same language whether interpreted as
CFGs or as PEGs, and also show how strong-LL(k), right-linear, and LL-regular
grammars have simple language-preserving translations from CFGs to PEGs
A Reference Interpreter for the Graph Programming Language GP 2
GP 2 is an experimental programming language for computing by graph
transformation. An initial interpreter for GP 2, written in the functional
language Haskell, provides a concise and simply structured reference
implementation. Despite its simplicity, the performance of the interpreter is
sufficient for the comparative investigation of a range of test programs. It
also provides a platform for the development of more sophisticated
implementations.Comment: In Proceedings GaM 2015, arXiv:1504.0244
Parallel parsing made practical
The property of local parsability allows to parse inputs through inspecting only a bounded-length string around the current token. This in turn enables the construction of a scalable, data-parallel parsing algorithm, which is presented in this work. Such an algorithm is easily amenable to be automatically generated via a parser generator tool, which was realized, and is also presented in the following. Furthermore, to complete the framework of a parallel input analysis, a parallel scanner can also combined with the parser. To prove the practicality of a parallel lexing and parsing approach, we report the results of the adaptation of JSON and Lua to a form fit for parallel parsing (i.e. an operator-precedence grammar) through simple grammar changes and scanning transformations. The approach is validated with performance figures from both high performance and embedded multicore platforms, obtained analyzing real-world inputs as a test-bench. The results show that our approach matches or dominates the performances of production-grade LR parsers in sequential execution, and achieves significant speedups and good scaling on multi-core machines. The work is concluded by a broad and critical survey of the past work on parallel parsing and future directions on the integration with semantic analysis and incremental parsing
TRX: A Formally Verified Parser Interpreter
Parsing is an important problem in computer science and yet surprisingly
little attention has been devoted to its formal verification. In this paper, we
present TRX: a parser interpreter formally developed in the proof assistant
Coq, capable of producing formally correct parsers. We are using parsing
expression grammars (PEGs), a formalism essentially representing recursive
descent parsing, which we consider an attractive alternative to context-free
grammars (CFGs). From this formalization we can extract a parser for an
arbitrary PEG grammar with the warranty of total correctness, i.e., the
resulting parser is terminating and correct with respect to its grammar and the
semantics of PEGs; both properties formally proven in Coq.Comment: 26 pages, LMC
Stream Processing using Grammars and Regular Expressions
In this dissertation we study regular expression based parsing and the use of
grammatical specifications for the synthesis of fast, streaming
string-processing programs.
In the first part we develop two linear-time algorithms for regular
expression based parsing with Perl-style greedy disambiguation. The first
algorithm operates in two passes in a semi-streaming fashion, using a constant
amount of working memory and an auxiliary tape storage which is written in the
first pass and consumed by the second. The second algorithm is a single-pass
and optimally streaming algorithm which outputs as much of the parse tree as is
semantically possible based on the input prefix read so far, and resorts to
buffering as many symbols as is required to resolve the next choice. Optimality
is obtained by performing a PSPACE-complete pre-analysis on the regular
expression.
In the second part we present Kleenex, a language for expressing
high-performance streaming string processing programs as regular grammars with
embedded semantic actions, and its compilation to streaming string transducers
with worst-case linear-time performance. Its underlying theory is based on
transducer decomposition into oracle and action machines, and a finite-state
specialization of the streaming parsing algorithm presented in the first part.
In the second part we also develop a new linear-time streaming parsing
algorithm for parsing expression grammars (PEG) which generalizes the regular
grammars of Kleenex. The algorithm is based on a bottom-up tabulation algorithm
reformulated using least fixed points and evaluated using an instance of the
chaotic iteration scheme by Cousot and Cousot
Applying Software Engineering Techniques to Parser Design: The Development of a C# Parser
In this paper we describe the development of a parser for the C# programming language. We outline the development process used,
detail its application to the development of a C# parser and present a number of metrics that describe the parser’s evolution. This
paper presents and reinforces an argument for the application of software engineering techniques in the area of parser design. The
development of a parser for the C# programming language is in itself important to software engineering, since parsers form the basis
for tools such as metrics generators, refactoring tools, pretty-printers and reverse engineering tools
Verifying context-sensitive treebanks and heuristic parses in polynomial time
Proceedings of the 17th Nordic Conference of Computational Linguistics
NODALIDA 2009.
Editors: Kristiina Jokinen and Eckhard Bick.
NEALT Proceedings Series, Vol. 4 (2009), 190-197.
© 2009 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/9206
Applying Software Engineering Techniques to Parser Design: The Development of a C# Parser
In this paper we describe the development of a parser for the C# programming language. We outline the development process used,
detail its application to the development of a C# parser and present a number of metrics that describe the parser’s evolution. This
paper presents and reinforces an argument for the application of software engineering techniques in the area of parser design. The
development of a parser for the C# programming language is in itself important to software engineering, since parsers form the basis
for tools such as metrics generators, refactoring tools, pretty-printers and reverse engineering tools
Best-First Surface Realization
Current work in surface realization concentrates on the use of general,
abstract algorithms that interpret large, reversible grammars. Only little
attention has been paid so far to the many small and simple applications that
require coverage of a small sublanguage at different degrees of sophistication.
The system TG/2 described in this paper can be smoothly integrated with deep
generation processes, it integrates canned text, templates, and context-free
rules into a single formalism, it allows for both textual and tabular output,
and it can be parameterized according to linguistic preferences. These features
are based on suitably restricted production system techniques and on a generic
backtracking regime.Comment: 10 pages, LaTeX source, one EPS figur
- …