5,126 research outputs found
Pattern matching in compilers
In this thesis we develop tools for effective and flexible pattern matching.
We introduce a new pattern matching system called amethyst. Amethyst is not
only a generator of parsers of programming languages, but can also serve as an
alternative to tools for matching regular expressions.
Our framework also produces dynamic parsers. Its intended use is in the
context of IDE (accurate syntax highlighting and error detection on the fly).
Amethyst offers pattern matching of general data structures. This makes it a
useful tool for implementing compiler optimizations such as constant folding,
instruction scheduling, and dataflow analysis in general.
The parsers produced are essentially top-down parsers. Linear time complexity
is obtained by introducing the novel notion of structured grammars and
regularized regular expressions. Amethyst uses techniques known from compiler
optimizations to produce effective parsers.Comment: master thesi
Left Recursion in Parsing Expression Grammars
Parsing Expression Grammars (PEGs) are a formalism that can describe all
deterministic context-free languages through a set of rules that specify a
top-down parser for some language. PEGs are easy to use, and there are
efficient implementations of PEG libraries in several programming languages.
A frequently missed feature of PEGs is left recursion, which is commonly used
in Context-Free Grammars (CFGs) to encode left-associative operations. We
present a simple conservative extension to the semantics of PEGs that gives
useful meaning to direct and indirect left-recursive rules, and show that our
extensions make it easy to express left-recursive idioms from CFGs in PEGs,
with similar results. We prove the conservativeness of these extensions, and
also prove that they work with any left-recursive PEG.
PEGs can also be compiled to programs in a low-level parsing machine. We
present an extension to the semantics of the operations of this parsing machine
that let it interpret left-recursive PEGs, and prove that this extension is
correct with regards to our semantics for left-recursive PEGs.Comment: Extended version of the paper "Left Recursion in Parsing Expression
Grammars", that was published on 2012 Brazilian Symposium on Programming
Language
A syntactic language model based on incremental CCG parsing
Syntactically-enriched language models (parsers) constitute a promising component in applications such as machine translation and speech-recognition. To maintain a useful level of accuracy, existing parsers are non-incremental and must span a combinatorially growing space of possible structures as every input word is processed. This prohibits their incorporation into standard linear-time decoders. In this paper, we present an incremental, linear-time dependency parser based on Combinatory Categorial Grammar (CCG) and classification techniques. We devise a deterministic transform of CCGbank canonical derivations into incremental ones, and train our parser on this data. We discover that a cascaded, incremental version provides an appealing balance between efficiency and accuracy
Linear Parsing Expression Grammars
PEGs were formalized by Ford in 2004, and have several pragmatic operators
(such as ordered choice and unlimited lookahead) for better expressing modern
programming language syntax. Since these operators are not explicitly defined
in the classic formal language theory, it is significant and still challenging
to argue PEGs' expressiveness in the context of formal language theory.Since
PEGs are relatively new, there are several unsolved problems.One of the
problems is revealing a subclass of PEGs that is equivalent to DFAs. This
allows application of some techniques from the theory of regular grammar to
PEGs. In this paper, we define Linear PEGs (LPEGs), a subclass of PEGs that is
equivalent to DFAs. Surprisingly, LPEGs are formalized by only excluding some
patterns of recursive nonterminal in PEGs, and include the full set of ordered
choice, unlimited lookahead, and greedy repetition, which are characteristic of
PEGs. Although the conversion judgement of parsing expressions into DFAs is
undecidable in general, the formalism of LPEGs allows for a syntactical
judgement of parsing expressions.Comment: Parsing expression grammars, Boolean finite automata, Packrat parsin
- âŠ