30 research outputs found
Discontinuous Data-Oriented Parsing: A mildly context-sensitive all-fragments grammar
Recent advances in parsing technology have made treebank parsing with discontinuous constituents possible, with parser output of competitive quality (Kallmeyer and Maier, 2010). We apply Data-Oriented Parsing (DOP) to a grammar formalism that allows for discontinuous trees (LCFRS). Decisions during parsing are conditioned on all possible fragments, resulting in improved performance. Despite the fact that both DOP and discontinuity present formidable challenges in terms of computational complexity, the model is reasonably efficient, and surpasses the state of the art in discontinuous parsing.
Parsing as Reduction
We reduce phrase-representation parsing to dependency parsing. Our reduction
is grounded on a new intermediate representation, "head-ordered dependency
trees", shown to be isomorphic to constituent trees. By encoding order
information in the dependency labels, we show that any off-the-shelf, trainable
dependency parser can be used to produce constituents. When this parser is
non-projective, we can perform discontinuous parsing in a very natural manner.
Despite the simplicity of our approach, experiments show that the resulting
parsers are on par with strong baselines, such as the Berkeley parser for
English and the best single system in the SPMRL-2014 shared task. Results are
particularly striking for discontinuous parsing of German, where we surpass the
current state of the art by a wide margin
Synchronous Context-Free Grammars and Optimal Linear Parsing Strategies
Synchronous Context-Free Grammars (SCFGs), also known as syntax-directed
translation schemata, are unlike context-free grammars in that they do not have
a binary normal form. In general, parsing with SCFGs takes space and time
polynomial in the length of the input strings, but with the degree of the
polynomial depending on the permutations of the SCFG rules. We consider linear
parsing strategies, which add one nonterminal at a time. We show that for a
given input permutation, the problems of finding the linear parsing strategy
with the minimum space and time complexity are both NP-hard
A declarative characterization of different types of multicomponent tree adjoining grammars
Multicomponent Tree Adjoining Grammars (MCTAGs) are a formalism that has been shown to be useful for many natural language applications. The definition of non-local MCTAG however is problematic since it refers to the process of the derivation itself: a simultaneity constraint must be respected concerning the way the members of the elementary tree sets are added. Looking only at the result of a derivation (i.e., the derived tree and the derivation tree), this simultaneity is no longer visible and therefore cannot be checked. I.e., this way of characterizing MCTAG does not allow to abstract away from the concrete order of derivation. In this paper, we propose an alternative definition of MCTAG that characterizes the trees in the tree language of an MCTAG via the properties of the derivation trees (in the underlying TAG) the MCTAG licences. We provide similar characterizations for various types of MCTAG. These characterizations give a better understanding of the formalisms, they allow a more systematic comparison of different types of MCTAG, and, furthermore, they can be exploited for parsing
Neural Combinatory Constituency Parsing
æ±äșŹéœç«ć€§ćŠTokyo Metropolitan Universityć棫ïŒæ
ć ±ç§ćŠïŒdoctoral thesi
A derivational model of discontinuous parsing
The notion of latent-variable probabilistic context-free derivation of syntactic structures is enhanced to allow heads and unrestricted discontinuities. The chosen formalization covers both constituent parsing and dependency parsing. The derivational model is accompanied by an equivalent probabilistic automaton model. By the new framework, one obtains a probability distribution over the space of all discontinuous parses. This lends itself to intrinsic evaluation in terms of perplexity, as shown in experiments.Postprin
Two characterisation results of multiple context-free grammars and their application to parsing
In the first part of this thesis, a Chomsky-SchĂŒtzenberger characterisation and an automaton characterisation of multiple context-free grammars are proved. Furthermore, a framework for approximation of automata with storage is described. The second part develops each of the three theoretical results into a parsing algorithm
Parsing TAG with Abstract Categorial Grammar.
International audienceThis paper presents informally an Earley algorithm for TAG which behaves as the algorithm given by [SJ88]. This algorithm is a specialization to TAG of a more general algorithm dedicated to second order ACGs. As second order ACGs allows to encode Linear Context Free Rewriting Systems (LCFRS) [dGP04], the main purpose of this paper is to give a rough presentation of formal tools which can be used to design efficient algorithms for LCFRS