1,814 research outputs found
A Syntactic Approach to Macro-Grammars for Context-Free Languages
We commence this thesis by setting a backdrop for the work. In [HN05] Herranz and Nogueira introduced the MTP (More Than Parsing) tool. They designed MTP to be an automatic parser generator. Their main contribution consisted in a syntax formalism which avoids the burden of annotated grammars, an inconvenience present in most modern automatic parser generators, while at the same time supplying the parser with quality data structures. The syntax formalism they introduced, GONF(Generalized Object Normal Form), is similar to the well-known BNF formalism for describing context-free grammars; the grammars used at present to build parsers. GONF allows for the use of parameterized non-terminals in the description of grammars. However, it was necessary to prove that this extension did not cause the formalism to generate grammars not in the context-free class. GONF's parameterized non-terminals are simply macros, like those in regular programming languages. Grammars with macros have been studied in [Fis68, TN04, TN08]. It was proved in these works that macro-grammars can actually generate context-sensitive languages. They also introduce some attempts at limiting macrogrammars to generate only context-free languages, though results are not completely satisfactory and present some issues. Based on these works, I present a new formal framework for macro-grammars. I also provide a new characterization for macro-grammars and two di_erent practical restrictions which ensure a given macro-grammar remains within the context-free class boundaries. We can apply these restrictions to GONF or any other notation for macro-grammars
Automatic acquisition of LFG resources for German - as good as it gets
We present data-driven methods for the acquisition of LFG resources from two German treebanks. We discuss problems specific to semi-free word order languages as well as problems arising fromthe data structures determined
by the design of the different treebanks. We compare two ways of encoding semi-free word order, as done in the two German treebanks, and argue that the design of the TiGer treebank is more adequate for the acquisition of LFG
resources. Furthermore, we describe an architecture for LFG grammar acquisition for German, based on the two German treebanks, and compare our results with a hand-crafted German LFG grammar
Developing and applying heterogeneous phylogenetic models with XRate
Modeling sequence evolution on phylogenetic trees is a useful technique in
computational biology. Especially powerful are models which take account of the
heterogeneous nature of sequence evolution according to the "grammar" of the
encoded gene features. However, beyond a modest level of model complexity,
manual coding of models becomes prohibitively labor-intensive. We demonstrate,
via a set of case studies, the new built-in model-prototyping capabilities of
XRate (macros and Scheme extensions). These features allow rapid implementation
of phylogenetic models which would have previously been far more
labor-intensive. XRate's new capabilities for lineage-specific models,
ancestral sequence reconstruction, and improved annotation output are also
discussed. XRate's flexible model-specification capabilities and computational
efficiency make it well-suited to developing and prototyping phylogenetic
grammar models. XRate is available as part of the DART software package:
http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog
Regularly Controlled Bidirectional Linear Basic Grammars
We investigate the bidirectional application of grammar productions -- i.e., using the productions in the reversed direction too -- to linear basic grammars. As in the case of regularly controlled bidirectional context-free grammars (or RCB grammars), we provide bidirectional linear basic grammars with a regular control language over the rules (i.e., productions and their corresponding reductions). Our main result shows that under the so-called RS/B/f-mode of derivation, bidirectionality gives rise to a dramatic increase in generating power compared with (regularly controlled unidirectional) linear basic grammars.\ud
\u
Macro Grammars and Holistic Triggering for Efficient Semantic Parsing
To learn a semantic parser from denotations, a learning algorithm must search
over a combinatorially large space of logical forms for ones consistent with
the annotated denotations. We propose a new online learning algorithm that
searches faster as training progresses. The two key ideas are using macro
grammars to cache the abstract patterns of useful logical forms found thus far,
and holistic triggering to efficiently retrieve the most relevant patterns
based on sentence similarity. On the WikiTableQuestions dataset, we first
expand the search space of an existing model to improve the state-of-the-art
accuracy from 38.7% to 42.7%, and then use macro grammars and holistic
triggering to achieve an 11x speedup and an accuracy of 43.7%.Comment: EMNLP 201
Pantry: A Macro Library for Python
Python lacks a simple way to create custom syntax and constructs that goes outside of its own syntax rules. A paradigm that allows for these possibilities to exist within languages is macros. Macros allow for a shorter set of syntax to expand into a longer set of instructions at compile-time. This gives the capability to evolve the language to fit personal needs.
Pantry, implements a hygienic text-substitution macro system for Python. Pantry achieves this through the introduction of an additional preparsing step that utilizes parsing and lexing of the source code. Pantry proposes a way to simply declare a pattern to be recognized, articulate instructions that replace the pattern, and replace the pattern in the source code. This form of meta-programming allows its users to be able to more concisely write their Python code and present the language in a more natural and intuitive manner.
We validate Pantry’s utility through use cases inspired by Python Enhancement Proposals (PEPs) and go through five of them. These are requests from the Python community for features to be implemented into Python. Pantry fulfills these desires through the composition of macros that that performs the new feature
Automatic acquisition of Spanish LFG resources from the Cast3LB treebank
In this paper, we describe the automatic annotation of the Cast3LB Treebank with LFG f-structures for the subsequent extraction of Spanish probabilistic grammar and lexical resources. We adapt the approach and methodology of Cahill et al. (2004), O’Donovan et al. (2004) and elsewhere for English to Spanish and the Cast3LB treebank encoding. We report on the quality and coverage of the automatic f-structure annotation. Following the pipeline and integrated models of Cahill et al. (2004), we extract wide-coverage
probabilistic LFG approximations and parse unseen Spanish text into f-structures. We also extend Bikel’s (2002) Multilingual Parse Engine to include a Spanish language module. Using the retrained Bikel parser in the pipeline model gives the best results against a manually constructed gold standard (73.20% predsonly f-score). We also extract Spanish lexical resources: 4090 semantic form types with 98 frame types. Subcategorised prepositions and particles are included in the frames
- …