2,004 research outputs found
Graph Interpolation Grammars: a Rule-based Approach to the Incremental Parsing of Natural Languages
Graph Interpolation Grammars are a declarative formalism with an operational
semantics. Their goal is to emulate salient features of the human parser, and
notably incrementality. The parsing process defined by GIGs incrementally
builds a syntactic representation of a sentence as each successive lexeme is
read. A GIG rule specifies a set of parse configurations that trigger its
application and an operation to perform on a matching configuration. Rules are
partly context-sensitive; furthermore, they are reversible, meaning that their
operations can be undone, which allows the parsing process to be
nondeterministic. These two factors confer enough expressive power to the
formalism for parsing natural languages.Comment: 41 pages, Postscript onl
Generalizing input-driven languages: theoretical and practical benefits
Regular languages (RL) are the simplest family in Chomsky's hierarchy. Thanks
to their simplicity they enjoy various nice algebraic and logic properties that
have been successfully exploited in many application fields. Practically all of
their related problems are decidable, so that they support automatic
verification algorithms. Also, they can be recognized in real-time.
Context-free languages (CFL) are another major family well-suited to
formalize programming, natural, and many other classes of languages; their
increased generative power w.r.t. RL, however, causes the loss of several
closure properties and of the decidability of important problems; furthermore
they need complex parsing algorithms. Thus, various subclasses thereof have
been defined with different goals, spanning from efficient, deterministic
parsing to closure properties, logic characterization and automatic
verification techniques.
Among CFL subclasses, so-called structured ones, i.e., those where the
typical tree-structure is visible in the sentences, exhibit many of the
algebraic and logic properties of RL, whereas deterministic CFL have been
thoroughly exploited in compiler construction and other application fields.
After surveying and comparing the main properties of those various language
families, we go back to operator precedence languages (OPL), an old family
through which R. Floyd pioneered deterministic parsing, and we show that they
offer unexpected properties in two fields so far investigated in totally
independent ways: they enable parsing parallelization in a more effective way
than traditional sequential parsers, and exhibit the same algebraic and logic
properties so far obtained only for less expressive language families
Comparing and evaluating extended Lambek calculi
Lambeks Syntactic Calculus, commonly referred to as the Lambek calculus, was
innovative in many ways, notably as a precursor of linear logic. But it also
showed that we could treat our grammatical framework as a logic (as opposed to
a logical theory). However, though it was successful in giving at least a basic
treatment of many linguistic phenomena, it was also clear that a slightly more
expressive logical calculus was needed for many other cases. Therefore, many
extensions and variants of the Lambek calculus have been proposed, since the
eighties and up until the present day. As a result, there is now a large class
of calculi, each with its own empirical successes and theoretical results, but
also each with its own logical primitives. This raises the question: how do we
compare and evaluate these different logical formalisms? To answer this
question, I present two unifying frameworks for these extended Lambek calculi.
Both are proof net calculi with graph contraction criteria. The first calculus
is a very general system: you specify the structure of your sequents and it
gives you the connectives and contractions which correspond to it. The calculus
can be extended with structural rules, which translate directly into graph
rewrite rules. The second calculus is first-order (multiplicative
intuitionistic) linear logic, which turns out to have several other,
independently proposed extensions of the Lambek calculus as fragments. I will
illustrate the use of each calculus in building bridges between analyses
proposed in different frameworks, in highlighting differences and in helping to
identify problems.Comment: Empirical advances in categorial grammars, Aug 2015, Barcelona,
Spain. 201
Towards Zero-Overhead Disambiguation of Deep Priority Conflicts
**Context** Context-free grammars are widely used for language prototyping
and implementation. They allow formalizing the syntax of domain-specific or
general-purpose programming languages concisely and declaratively. However, the
natural and concise way of writing a context-free grammar is often ambiguous.
Therefore, grammar formalisms support extensions in the form of *declarative
disambiguation rules* to specify operator precedence and associativity, solving
ambiguities that are caused by the subset of the grammar that corresponds to
expressions.
**Inquiry** Implementing support for declarative disambiguation within a
parser typically comes with one or more of the following limitations in
practice: a lack of parsing performance, or a lack of modularity (i.e.,
disallowing the composition of grammar fragments of potentially different
languages). The latter subject is generally addressed by scannerless
generalized parsers. We aim to equip scannerless generalized parsers with novel
disambiguation methods that are inherently performant, without compromising the
concerns of modularity and language composition.
**Approach** In this paper, we present a novel low-overhead implementation
technique for disambiguating deep associativity and priority conflicts in
scannerless generalized parsers with lightweight data-dependency.
**Knowledge** Ambiguities with respect to operator precedence and
associativity arise from combining the various operators of a language. While
*shallow conflicts* can be resolved efficiently by one-level tree patterns,
*deep conflicts* require more elaborate techniques, because they can occur
arbitrarily nested in a tree. Current state-of-the-art approaches to solving
deep priority conflicts come with a severe performance overhead.
**Grounding** We evaluated our new approach against state-of-the-art
declarative disambiguation mechanisms. By parsing a corpus of popular
open-source repositories written in Java and OCaml, we found that our approach
yields speedups of up to 1.73x over a grammar rewriting technique when parsing
programs with deep priority conflicts--with a modest overhead of 1-2 % when
parsing programs without deep conflicts.
**Importance** A recent empirical study shows that deep priority conflicts
are indeed wide-spread in real-world programs. The study shows that in a corpus
of popular OCaml projects on Github, up to 17 % of the source files contain
deep priority conflicts. However, there is no solution in the literature that
addresses efficient disambiguation of deep priority conflicts, with support for
modular and composable syntax definitions
- …