6,711 research outputs found
Nonparametric Bayesian Inference and Efficient Parsing for Tree-adjoining Grammars
In the line of research extending statistical parsing to more expressive grammar formalisms, we demonstrate for the first time the use of tree-adjoining grammars (TAG). We present a Bayesian nonparametric model for estimating a probabilistic TAG from a parsed corpus, along with novel block sampling methods and approximation transformations for TAG that allow efficient parsing. Our work shows performance improvements on the Penn Treebank and finds more compact yet linguistically rich representations of the data, but more importantly provides techniques in grammar transformation and statistical inference that make practical the use of these more expressive systems, thereby enabling further experimentation along these lines.Engineering and Applied Science
Mild context-sensitivity and tuple-based generalizations of context-free grammar
This paper classifies a family of grammar formalisms that extend context-free grammar by talking about tuples of terminal strings, rather than independently combining single terminal words into larger single phrases. These include a number of well-known formalisms, such as head grammar and linear context-free rewriting systems, but also a new formalism, (simple) literal movement grammar, which strictly extends the previously known formalisms, while preserving polynomial time recognizability. The descriptive capacity of simple literal movement grammars is illustrated both formally through a weak generative capacity argument and in a more practical sense by the description of conjunctive cross-serial relative clauses in Dutch. After sketching a complexity result and drawing a number of conclusions from the illustrations, it is then suggested that the notion of mild context-sensitivity currently in use, that depends on the rather loosely defined concept of constant growth, needs a modification to apply sensibly to the illustrated facts; an attempt at such a revision is proposed
Korean Grammar Using TAGs
This paper addresses various issues related to representing the Korean language using Tree Adjoining Grammars. Topics covered include Korean grammar using TAGs, Machine Translation between Korean and English using Synchronous Tree Adjoining Grammars (STAGs), handling scrambling using Multi Component TAGs (MC-TAGs), and recovering empty arguments. The data for the parsing is from US military communication messages
Monoid automata for displacement context-free languages
In 2007 Kambites presented an algebraic interpretation of
Chomsky-Schutzenberger theorem for context-free languages. We give an
interpretation of the corresponding theorem for the class of displacement
context-free languages which are equivalent to well-nested multiple
context-free languages. We also obtain a characterization of k-displacement
context-free languages in terms of monoid automata and show how such automata
can be simulated on two stacks. We introduce the simultaneous two-stack
automata and compare different variants of its definition. All the definitions
considered are shown to be equivalent basing on the geometric interpretation of
memory operations of these automata.Comment: Revised version for ESSLLI Student Session 2013 selected paper
A declarative characterization of different types of multicomponent tree adjoining grammars
Multicomponent Tree Adjoining Grammars (MCTAGs) are a formalism that has been shown to be useful for many natural language applications. The definition of non-local MCTAG however is problematic since it refers to the process of the derivation itself: a simultaneity constraint must be respected concerning the way the members of the elementary tree sets are added. Looking only at the result of a derivation (i.e., the derived tree and the derivation tree), this simultaneity is no longer visible and therefore cannot be checked. I.e., this way of characterizing MCTAG does not allow to abstract away from the concrete order of derivation. In this paper, we propose an alternative definition of MCTAG that characterizes the trees in the tree language of an MCTAG via the properties of the derivation trees (in the underlying TAG) the MCTAG licences. We provide similar characterizations for various types of MCTAG. These characterizations give a better understanding of the formalisms, they allow a more systematic comparison of different types of MCTAG, and, furthermore, they can be exploited for parsing
Principles and Implementation of Deductive Parsing
We present a system for generating parsers based directly on the metaphor of
parsing as deduction. Parsing algorithms can be represented directly as
deduction systems, and a single deduction engine can interpret such deduction
systems so as to implement the corresponding parser. The method generalizes
easily to parsers for augmented phrase structure formalisms, such as
definite-clause grammars and other logic grammar formalisms, and has been used
for rapid prototyping of parsing algorithms for a variety of formalisms
including variants of tree-adjoining grammars, categorial grammars, and
lexicalized context-free grammars.Comment: 69 pages, includes full Prolog cod
- âŚ