83 research outputs found
Data-Oriented Parsing with Discontinuous Constituents and Function Tags
Statistical parsers are e ective but are typically limited to producing projective dependencies or constituents. On the other hand, linguisti- cally rich parsers recognize non-local relations and analyze both form and function phenomena but rely on extensive manual grammar development. We combine advantages of the two by building a statistical parser that produces richer analyses. We investigate new techniques to implement treebank-based parsers that allow for discontinuous constituents. We present two systems. One system is based on a string-rewriting Linear Context-Free Rewriting System (LCFRS), while using a Probabilistic Discontinuous Tree Substitution Grammar (PDTSG) to improve disambiguation performance. Another system encodes the discontinuities in the labels of phrase structure trees, allowing for efficient context-free grammar parsing. The two systems demonstrate that tree fragments as used in tree-substitution grammar improve disambiguation performance while capturing non-local relations on an as-needed basis. Additionally, we present results of models that produce function tags, resulting in a more linguistically adequate model of the data. We report substantial accuracy improvements in discontinuous parsing for German, English, and Dutch, including results on spoken Dutch
Data-Oriented Parsing with discontinuous constituents and function tags
Statistical parsers are e ective but are typically limited to producing projective dependencies or constituents. On the other hand, linguisti- cally rich parsers recognize non-local relations and analyze both form and function phenomena but rely on extensive manual grammar development. We combine advantages of the two by building a statistical parser that produces richer analyses.
We investigate new techniques to implement treebank-based parsers that allow for discontinuous constituents. We present two systems. One system is based on a string-rewriting Linear Context-Free Rewriting System (LCFRS), while using a Probabilistic Discontinuous Tree Substitution Grammar (PDTSG) to improve disambiguation performance. Another system encodes the discontinuities in the labels of phrase structure trees, allowing for efficient context-free grammar parsing.
The two systems demonstrate that tree fragments as used in tree-substitution grammar improve disambiguation performance while capturing non-local relations on an as-needed basis. Additionally, we present results of models that produce function tags, resulting in a more linguistically adequate model of the data. We report substantial accuracy improvements in discontinuous parsing for German, English, and Dutch, including results on spoken Dutch
Hybrid grammars for parsing of discontinuous phrase structures and non-projective dependency structures
We explore the concept of hybrid grammars, which formalize and generalize a range of existing frameworks for dealing with discontinuous syntactic structures. Covered are both discontinuous phrase structures and non-projective dependency structures. Technically, hybrid grammars are related to synchronous grammars, where one grammar component generates linear structures and another generates hierarchical structures. By coupling lexical elements of both components together, discontinuous structures result. Several types of hybrid grammars are characterized. We also discuss grammar induction from treebanks. The main advantage over existing frameworks is the ability of hybrid grammars to separate discontinuity of the desired structures from time complexity of parsing. This permits exploration of a large variety of parsing algorithms for discontinuous structures, with different properties. This is confirmed by the reported experimental results, which show a wide variety of running time, accuracy and frequency of parse failures.Publisher PDFPeer reviewe
Discontinuous Data-Oriented Parsing: A mildly context-sensitive all-fragments grammar
Recent advances in parsing technology have made treebank parsing with discontinuous constituents possible, with parser output of competitive quality (Kallmeyer and Maier, 2010). We apply Data-Oriented Parsing (DOP) to a grammar formalism that allows for discontinuous trees (LCFRS). Decisions during parsing are conditioned on all possible fragments, resulting in improved performance. Despite the fact that both DOP and discontinuity present formidable challenges in terms of computational complexity, the model is reasonably efficient, and surpasses the state of the art in discontinuous parsing.
Recommended from our members
Curbing Feature Coding: Strictly Local Feature Assignment
Graf (2017) warns that every syntactic formalism faces a severe overgeneration problem because of the hidden power of subcategorization. Any constraint definable in monadic second-order logic can be compiled into the category system so that it is indirectly enforced as part of subcategorization. Not only does this kind of feature coding deprive syntactic proposals of their empirical bite, it also undermines computational efforts to limit syntactic formalisms via subregular complexity. This paper presents a subregular solution to feature coding. Instead of features being a cheap resource that comes for free, features must be assigned by a transduction. In particular, category features must be assigned by an input strictly local (ISL) tree-to-tree transduction, defined here for the first time. The restriction to ISL transductions correctly rules out various deviant category systems
Phrase Structure and Ancient Anatolian languages: Methodology and challenges for a Luwian syntactic annotation
For the Marie Sk\u142odowska Curie(MSCA) funded project \u201cSLUW \u2013 Acomputer aided study of the (morpho)-syntax of Luwian\u201d (European Grant Nr. 655954) a collection of phrasestructure trees from the Luwian corpus iscurrently being prepared. Luwian is alanguage belonging to the Anatolianbranch of Indo-European; its structuresare different from those of English andthe language itself is partly obscure. Thepresent paper will describe some specialneeds, open challenges and methodologiesrelevant for the annotation of phrasestructureof Luwian
- …