18,759 research outputs found
QuestionBank: creating a corpus of parse-annotated questions
This paper describes the development of QuestionBank, a corpus of 4000 parse-annotated questions for (i) use in training parsers employed in QA, and (ii) evaluation
of question parsing. We present a series of experiments to investigate the effectiveness of QuestionBank as both an
exclusive and supplementary training resource for a state-of-the-art parser in parsing both question and non-question test sets. We introduce a new method for recovering empty nodes and their antecedents (capturing long distance dependencies) from parser output in CFG trees using LFG f-structure reentrancies. Our main findings are (i) using QuestionBank training data improves parser performance to 89.75% labelled bracketing f-score, an increase of almost 11% over the baseline; (ii) back-testing experiments on non-question data (Penn-II WSJ Section 23) shows that the retrained parser does not suffer a performance drop on non-question material; (iii) ablation experiments show that the size of training material provided by QuestionBank is sufficient to achieve optimal results; (iv) our method for recovering empty nodes captures long distance dependencies in questions from the ATIS corpus with high precision (96.82%) and low recall (39.38%). In summary, QuestionBank
provides a useful new resource in parser-based QA research
On the Complexity and Performance of Parsing with Derivatives
Current algorithms for context-free parsing inflict a trade-off between ease
of understanding, ease of implementation, theoretical complexity, and practical
performance. No algorithm achieves all of these properties simultaneously.
Might et al. (2011) introduced parsing with derivatives, which handles
arbitrary context-free grammars while being both easy to understand and simple
to implement. Despite much initial enthusiasm and a multitude of independent
implementations, its worst-case complexity has never been proven to be better
than exponential. In fact, high-level arguments claiming it is fundamentally
exponential have been advanced and even accepted as part of the folklore.
Performance ended up being sluggish in practice, and this sluggishness was
taken as informal evidence of exponentiality.
In this paper, we reexamine the performance of parsing with derivatives. We
have discovered that it is not exponential but, in fact, cubic. Moreover,
simple (though perhaps not obvious) modifications to the implementation by
Might et al. (2011) lead to an implementation that is not only easy to
understand but also highly performant in practice.Comment: 13 pages; 12 figures; implementation at
http://bitbucket.org/ucombinator/parsing-with-derivatives/ ; published in
PLDI '16, Proceedings of the 37th ACM SIGPLAN Conference on Programming
Language Design and Implementation, June 13 - 17, 2016, Santa Barbara, CA,
US
Interaction Grammars
Interaction Grammar (IG) is a grammatical formalism based on the notion of
polarity. Polarities express the resource sensitivity of natural languages by
modelling the distinction between saturated and unsaturated syntactic
structures. Syntactic composition is represented as a chemical reaction guided
by the saturation of polarities. It is expressed in a model-theoretic framework
where grammars are constraint systems using the notion of tree description and
parsing appears as a process of building tree description models satisfying
criteria of saturation and minimality
MUSE CSP: An Extension to the Constraint Satisfaction Problem
This paper describes an extension to the constraint satisfaction problem
(CSP) called MUSE CSP (MUltiply SEgmented Constraint Satisfaction Problem).
This extension is especially useful for those problems which segment into
multiple sets of partially shared variables. Such problems arise naturally in
signal processing applications including computer vision, speech processing,
and handwriting recognition. For these applications, it is often difficult to
segment the data in only one way given the low-level information utilized by
the segmentation algorithms. MUSE CSP can be used to compactly represent
several similar instances of the constraint satisfaction problem. If multiple
instances of a CSP have some common variables which have the same domains and
constraints, then they can be combined into a single instance of a MUSE CSP,
reducing the work required to apply the constraints. We introduce the concepts
of MUSE node consistency, MUSE arc consistency, and MUSE path consistency. We
then demonstrate how MUSE CSP can be used to compactly represent lexically
ambiguous sentences and the multiple sentence hypotheses that are often
generated by speech recognition algorithms so that grammar constraints can be
used to provide parses for all syntactically correct sentences. Algorithms for
MUSE arc and path consistency are provided. Finally, we discuss how to create a
MUSE CSP from a set of CSPs which are labeled to indicate when the same
variable is shared by more than a single CSP.Comment: See http://www.jair.org/ for any accompanying file
From chunks to function-argument structure : a similarity-based approach
Chunk parsing has focused on the recognition of partial constituent structures at the level of individual chunks. Little attention has been paid to the question of how such partial analyses can be combined into larger structures for complete utterances. Such larger structures are not only desirable for a deeper syntactic analysis. They also constitute a necessary prerequisite for assigning function-argument structure. The present paper offers a similaritybased algorithm for assigning functional labels such as subject, object, head, complement, etc. to complete syntactic structures on the basis of prechunked input. The evaluation of the algorithm has concentrated on measuring the quality of functional labels. It was performed on a German and an English treebank using two different annotation schemes at the level of function argument structure. The results of 89.73% correct functional labels for German and 90.40%for English validate the general approach
- …