1,814 research outputs found

    A Syntactic Approach to Macro-Grammars for Context-Free Languages

    Full text link
    We commence this thesis by setting a backdrop for the work. In [HN05] Herranz and Nogueira introduced the MTP (More Than Parsing) tool. They designed MTP to be an automatic parser generator. Their main contribution consisted in a syntax formalism which avoids the burden of annotated grammars, an inconvenience present in most modern automatic parser generators, while at the same time supplying the parser with quality data structures. The syntax formalism they introduced, GONF(Generalized Object Normal Form), is similar to the well-known BNF formalism for describing context-free grammars; the grammars used at present to build parsers. GONF allows for the use of parameterized non-terminals in the description of grammars. However, it was necessary to prove that this extension did not cause the formalism to generate grammars not in the context-free class. GONF's parameterized non-terminals are simply macros, like those in regular programming languages. Grammars with macros have been studied in [Fis68, TN04, TN08]. It was proved in these works that macro-grammars can actually generate context-sensitive languages. They also introduce some attempts at limiting macrogrammars to generate only context-free languages, though results are not completely satisfactory and present some issues. Based on these works, I present a new formal framework for macro-grammars. I also provide a new characterization for macro-grammars and two di_erent practical restrictions which ensure a given macro-grammar remains within the context-free class boundaries. We can apply these restrictions to GONF or any other notation for macro-grammars

    Automatic acquisition of LFG resources for German - as good as it gets

    Get PDF
    We present data-driven methods for the acquisition of LFG resources from two German treebanks. We discuss problems specific to semi-free word order languages as well as problems arising fromthe data structures determined by the design of the different treebanks. We compare two ways of encoding semi-free word order, as done in the two German treebanks, and argue that the design of the TiGer treebank is more adequate for the acquisition of LFG resources. Furthermore, we describe an architecture for LFG grammar acquisition for German, based on the two German treebanks, and compare our results with a hand-crafted German LFG grammar

    Developing and applying heterogeneous phylogenetic models with XRate

    Get PDF
    Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog

    Regularly Controlled Bidirectional Linear Basic Grammars

    Get PDF
    We investigate the bidirectional application of grammar productions -- i.e., using the productions in the reversed direction too -- to linear basic grammars. As in the case of regularly controlled bidirectional context-free grammars (or RCB grammars), we provide bidirectional linear basic grammars with a regular control language over the rules (i.e., productions and their corresponding reductions). Our main result shows that under the so-called RS/B/f-mode of derivation, bidirectionality gives rise to a dramatic increase in generating power compared with (regularly controlled unidirectional) linear basic grammars.\ud \u

    Macro Grammars and Holistic Triggering for Efficient Semantic Parsing

    Full text link
    To learn a semantic parser from denotations, a learning algorithm must search over a combinatorially large space of logical forms for ones consistent with the annotated denotations. We propose a new online learning algorithm that searches faster as training progresses. The two key ideas are using macro grammars to cache the abstract patterns of useful logical forms found thus far, and holistic triggering to efficiently retrieve the most relevant patterns based on sentence similarity. On the WikiTableQuestions dataset, we first expand the search space of an existing model to improve the state-of-the-art accuracy from 38.7% to 42.7%, and then use macro grammars and holistic triggering to achieve an 11x speedup and an accuracy of 43.7%.Comment: EMNLP 201

    Pantry: A Macro Library for Python

    Get PDF
    Python lacks a simple way to create custom syntax and constructs that goes outside of its own syntax rules. A paradigm that allows for these possibilities to exist within languages is macros. Macros allow for a shorter set of syntax to expand into a longer set of instructions at compile-time. This gives the capability to evolve the language to fit personal needs. Pantry, implements a hygienic text-substitution macro system for Python. Pantry achieves this through the introduction of an additional preparsing step that utilizes parsing and lexing of the source code. Pantry proposes a way to simply declare a pattern to be recognized, articulate instructions that replace the pattern, and replace the pattern in the source code. This form of meta-programming allows its users to be able to more concisely write their Python code and present the language in a more natural and intuitive manner. We validate Pantry’s utility through use cases inspired by Python Enhancement Proposals (PEPs) and go through five of them. These are requests from the Python community for features to be implemented into Python. Pantry fulfills these desires through the composition of macros that that performs the new feature

    Automatic acquisition of Spanish LFG resources from the Cast3LB treebank

    Get PDF
    In this paper, we describe the automatic annotation of the Cast3LB Treebank with LFG f-structures for the subsequent extraction of Spanish probabilistic grammar and lexical resources. We adapt the approach and methodology of Cahill et al. (2004), O’Donovan et al. (2004) and elsewhere for English to Spanish and the Cast3LB treebank encoding. We report on the quality and coverage of the automatic f-structure annotation. Following the pipeline and integrated models of Cahill et al. (2004), we extract wide-coverage probabilistic LFG approximations and parse unseen Spanish text into f-structures. We also extend Bikel’s (2002) Multilingual Parse Engine to include a Spanish language module. Using the retrained Bikel parser in the pipeline model gives the best results against a manually constructed gold standard (73.20% predsonly f-score). We also extract Spanish lexical resources: 4090 semantic form types with 98 frame types. Subcategorised prepositions and particles are included in the frames
    corecore