Search CORE

2,776 research outputs found

Stream Processing using Grammars and Regular Expressions

Author: Rasmussen Ulrik Terp
Publication venue
Publication date: 01/01/2016
Field of study

In this dissertation we study regular expression based parsing and the use of grammatical specifications for the synthesis of fast, streaming string-processing programs. In the first part we develop two linear-time algorithms for regular expression based parsing with Perl-style greedy disambiguation. The first algorithm operates in two passes in a semi-streaming fashion, using a constant amount of working memory and an auxiliary tape storage which is written in the first pass and consumed by the second. The second algorithm is a single-pass and optimally streaming algorithm which outputs as much of the parse tree as is semantically possible based on the input prefix read so far, and resorts to buffering as many symbols as is required to resolve the next choice. Optimality is obtained by performing a PSPACE-complete pre-analysis on the regular expression. In the second part we present Kleenex, a language for expressing high-performance streaming string processing programs as regular grammars with embedded semantic actions, and its compilation to streaming string transducers with worst-case linear-time performance. Its underlying theory is based on transducer decomposition into oracle and action machines, and a finite-state specialization of the streaming parsing algorithm presented in the first part. In the second part we also develop a new linear-time streaming parsing algorithm for parsing expression grammars (PEG) which generalizes the regular grammars of Kleenex. The algorithm is based on a bottom-up tabulation algorithm reformulated using least fixed points and evaluated using an instance of the chaotic iteration scheme by Cousot and Cousot

arXiv.org e-Print Archive

Copenhagen University Research Information System

Pattern matching in compilers

Author: Bílka Ondřej
Publication venue
Publication date: 01/01/2012
Field of study

In this thesis we develop tools for effective and flexible pattern matching. We introduce a new pattern matching system called amethyst. Amethyst is not only a generator of parsers of programming languages, but can also serve as an alternative to tools for matching regular expressions. Our framework also produces dynamic parsers. Its intended use is in the context of IDE (accurate syntax highlighting and error detection on the fly). Amethyst offers pattern matching of general data structures. This makes it a useful tool for implementing compiler optimizations such as constant folding, instruction scheduling, and dataflow analysis in general. The parsers produced are essentially top-down parsers. Linear time complexity is obtained by introducing the novel notion of structured grammars and regularized regular expressions. Amethyst uses techniques known from compiler optimizations to produce effective parsers.Comment: master thesi

arXiv.org e-Print Archive

CU Digital Repository

National Repository of Grey Literature

A one-pass valency-oriented chunker for German

Author: Barbaresi Adrien
Publication venue: HAL CCSD
Publication date: 07/12/2013
Field of study

International audienceNon-finite state parsers provide fine-grained information. However, they are computationally demanding. Therefore, it is interesting to see how far a shallow parsing approach is able to go. In a pattern-based matching operation, the transducer described here consists of POS-tags using regular expressions that take advantage of the characteristics of German grammar. The process aims at finding linguistically relevant phrases with a good precision, which enables in turn an estimation of the actual valency of a given verb. The chunker reads its input exactly once instead of using cascades, which greatly benefits computational efficiency. This finite-state chunking approach does not return a tree structure, but rather yields various kinds of linguistic information useful to the language researcher. Possible applications include simulation of text comprehension on the syntactical level, creation of selective benchmarks and failure analysis

HAL-ENS-LYON

Implementation and Optimization of PEG Parsers for Use on FPGAs

Author: Sinha Shikhar
Publication venue: Dartmouth Digital Commons
Publication date: 01/06/2021
Field of study

DARPA’s Guaranteed Architecture for Physical Security (GAPS) project requires a device to provably enforce security policies. As part of a solution that GE and Dartmouth have proposed for the GAPS project, parsers for Parsing Expression Grammars (PEGs) are required to run on a Field-Programmable Gate Array (FPGA). There exist programs, like Pegmatite, which produce PEG parsers written in VHDL, but these parsers have not yet been run on FPGAs. They have been run in simulators where they have been tested for correctness, but they need to be adapted for execution on FPGAs (Lucas et al., 2021). This thesis explores the process of modifying existing VHDL PEG parsers to run on FPGAs and optimizing their performance. We contribute two techniques to achieve performance improvements: (1) exploiting data parallelism, and (2) parsing the input packet as it arrives instead of waiting for the entire packet to be received. We were not able to execute our solution consistently on FPGAs, so we present an analysis of these techniques through simulations

Dartmouth Digital Commons (Dartmouth College)

Parsing for agile modeling

Author: Kurš Jan
Publication venue: Universität Bern
Publication date: 01/01/2016
Field of study

Agile modeling refers to a set of methods that allow for a quick initial development of an importer and its further refinement. These requirements are not met simultaneously by the current parsing technology. Problems with parsing became a bottleneck in our research of agile modeling. In this thesis we introduce a novel approach to specify and build parsers. Our approach allows for expressive, tolerant and composable parsers without sacrificing performance. The approach is based on a context-sensitive extension of parsing expression grammars that allows a grammar engineer to specify complex language restrictions. To insure high parsing performance we automatically analyze a grammar definition and choose different parsing strategies for different parts of the grammar. We show that context-sensitive parsing expression grammars allow for highly composable, tolerant and variable-grained parsers that can be easily refined. Different parsing strategies significantly insure high-performance of parsers without sacrificing expressiveness of the underlying grammars

BORIS Theses