1,028 research outputs found
Stream Processing using Grammars and Regular Expressions
In this dissertation we study regular expression based parsing and the use of
grammatical specifications for the synthesis of fast, streaming
string-processing programs.
In the first part we develop two linear-time algorithms for regular
expression based parsing with Perl-style greedy disambiguation. The first
algorithm operates in two passes in a semi-streaming fashion, using a constant
amount of working memory and an auxiliary tape storage which is written in the
first pass and consumed by the second. The second algorithm is a single-pass
and optimally streaming algorithm which outputs as much of the parse tree as is
semantically possible based on the input prefix read so far, and resorts to
buffering as many symbols as is required to resolve the next choice. Optimality
is obtained by performing a PSPACE-complete pre-analysis on the regular
expression.
In the second part we present Kleenex, a language for expressing
high-performance streaming string processing programs as regular grammars with
embedded semantic actions, and its compilation to streaming string transducers
with worst-case linear-time performance. Its underlying theory is based on
transducer decomposition into oracle and action machines, and a finite-state
specialization of the streaming parsing algorithm presented in the first part.
In the second part we also develop a new linear-time streaming parsing
algorithm for parsing expression grammars (PEG) which generalizes the regular
grammars of Kleenex. The algorithm is based on a bottom-up tabulation algorithm
reformulated using least fixed points and evaluated using an instance of the
chaotic iteration scheme by Cousot and Cousot
Cyclic Operator Precedence Grammars for Parallel Parsing
Operator precedence languages (OPL) enjoy the local parsability property,
which essentially means that a code fragment enclosed within a pair of markers
-- playing the role of parentheses -- can be compiled with no knowledge of its
external context. Such a property has been exploited to build parallel
compilers for languages formalized as OPLs. It has been observed, however, that
when the syntax trees of the sentences have a linear substructure, its parsing
must necessarily proceed sequentially making it impossible to split such a
subtree into chunks to be processed in parallel. Such an inconvenience is due
to the fact that so far much literature on OPLs has assumed the hypothesis that
equality precedence relation cannot be cyclic. This hypothesis was motivated by
the need to keep the mathematical notation as simple as possible.
We present an enriched version of operator precedence grammars, called
cyclic, that allows to use a simplified version of regular expressions in the
right hand sides of grammar's rules; for this class of operator precedence
grammars the acyclicity hypothesis of the equality precedence relation is no
more needed to guarantee the algebraic properties of the generated languages.
The expressive power of the cyclic grammars is now fully equivalent to that of
other formalisms defining OPLs such as operator precedence automata, monadic
second order logic and operator precedence expressions. As a result cyclic
operator precedence grammars now produce also unranked syntax trees and
sentences with flat unbounded substructures that can be naturally partitioned
into chunks suitable for parallel parsing.Comment: 23 pages, 8 figures. arXiv admin note: text overlap with
arXiv:2006.0123
Parsing for agile modeling
Agile modeling refers to a set of methods that allow for a quick initial development of an importer and its further refinement. These requirements are not met simultaneously by the current parsing technology. Problems with parsing became a bottleneck in our research of agile modeling.
In this thesis we introduce a novel approach to specify and build parsers. Our approach allows for expressive, tolerant and composable parsers without sacrificing performance. The approach is based on a context-sensitive extension of parsing expression grammars that allows a grammar engineer to specify complex language restrictions. To insure high parsing performance we automatically analyze a grammar definition and choose different parsing strategies for different parts of the grammar.
We show that context-sensitive parsing expression grammars allow for highly composable, tolerant and variable-grained parsers that can be easily refined. Different parsing strategies significantly insure high-performance of parsers without sacrificing expressiveness of the underlying grammars
Tunnel Parsing with counted repetitions
The article describes a new and efficient algorithm for parsing, called Tunnel Parsing, that parses from left to right on the basis of a context-free grammar without left recursion and rules that recognize empty words. The algorithm is applicable mostly for domain-specific languages. In the article, particular attention is paid to the parsing of grammar element repetitions. As a result of the parsing, a statically typed concrete syntax tree is built from top to bottom, that accurately reflects the grammar. The parsing is not done through a recursion, but through an iteration. The Tunnel Parsing algorithm uses the grammars directly without a prior refactoring and is with a linear time complexity for deterministic context-free grammars
Proceedings of the Third Symposium on Programming Languages and Software Tools : Kääriku, Estonia, August 23-24 1993
http://www.ester.ee/record=b1064507*es
Programming Language Techniques for Natural Language Applications
It is easy to imagine machines that can communicate in natural language. Constructing such machines is more difficult. The aim of this thesis is to demonstrate
how declarative grammar formalisms that distinguish between abstract and concrete syntax make it easier to develop natural language applications.
We describe how the type-theorectical grammar formalism Grammatical
Framework (GF) can be used as a high-level language for natural language
applications. By taking advantage of techniques from the field of programming
language implementation, we can use GF grammars to perform portable
and efficient parsing and linearization, generate speech recognition language
models, implement multimodal fusion and fission, generate support code for
abstract syntax transformations, generate dialogue managers, and implement
speech translators and web-based syntax-aware editors.
By generating application components from a declarative grammar, we can
reduce duplicated work, ensure consistency, make it easier to build multilingual
systems, improve linguistic quality, enable re-use across system domains, and
make systems more portable
Fling - A Fluent API Generator
We present the first general and practical solution of the fluent API problem - an algorithm, that given a deterministic language (equivalently, LR(k), k >= 0 language) encodes it in an unbounded parametric polymorphism type system employing only a polynomial number of types. The theoretical result is accompanied by an actual tool Fling - a fluent API compiler-compiler in the venue of YACC, tailored for embedding DSLs in Java
Proceedings
Proceedings of the NODALIDA 2011 Workshop
Constraint Grammar Applications.
Editors: Eckhard Bick, Kristin Hagen, Kaili Müürisep, Trond Trosterud.
NEALT Proceedings Series, Vol. 14 (2011), vi+69 pp.
© 2011 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/19231
- …