Search CORE

15 research outputs found

Object-Extraction and Question-Parsing using CCG

Author: Clark Stephen
Curran James R.
Steedman Mark
Publication venue
Publication date: 01/01/2004
Field of study

QuestionBank: creating a corpus of parse-annotated questions

Author: Cahill Aoife
Judge John
van Genabith Josef
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2006
Field of study

This paper describes the development of QuestionBank, a corpus of 4000 parse-annotated questions for (i) use in training parsers employed in QA, and (ii) evaluation of question parsing. We present a series of experiments to investigate the effectiveness of QuestionBank as both an exclusive and supplementary training resource for a state-of-the-art parser in parsing both question and non-question test sets. We introduce a new method for recovering empty nodes and their antecedents (capturing long distance dependencies) from parser output in CFG trees using LFG f-structure reentrancies. Our main findings are (i) using QuestionBank training data improves parser performance to 89.75% labelled bracketing f-score, an increase of almost 11% over the baseline; (ii) back-testing experiments on non-question data (Penn-II WSJ Section 23) shows that the retrained parser does not suffer a performance drop on non-question material; (iii) ablation experiments show that the size of training material provided by QuestionBank is sufficient to achieve optimal results; (iv) our method for recovering empty nodes captures long distance dependencies in questions from the ATIS corpus with high precision (96.82%) and low recall (39.38%). In summary, QuestionBank provides a useful new resource in parser-based QA research

CiteSeerX

Irish Universities

DCU Online Research Access Service

Strong domain variation and treebank-induced LFG resources

Author: Burke Michael
Cahill Aoife
Judge John
O'Donovan Ruth
van Genabith Josef
Way Andy
Publication venue: CSLI Publications
Publication date: 01/01/2005
Field of study

In this paper we present a number of experiments to test the portability of existing treebank induced LFG resources. We test the LFG parsing resources of Cahill et al. (2004) on the ATIS corpus which represents a considerably different domain to the Penn-II Treebank Wall Street Journal sections, from which the resources were induced. This testing shows an under-performance at both c- and f-structure level as a result of the domain variation. We show that in order to adapt the LFG resources of Cahill et al. (2004) to this new domain, all that is necessary is to retrain the c-structure parser on data from the new domain

CiteSeerX

Irish Universities

DCU Online Research Access Service

Constructive Type-Logical Supertagging with Self-Attention Networks

Author: Deoskar Tejaswini
Kogkalidis Konstantinos
Moortgat Michael
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 24/05/2019
Field of study

We propose a novel application of self-attention networks towards grammar induction. We present an attention-based supertagger for a refined type-logical grammar, trained on constructing types inductively. In addition to achieving a high overall type accuracy, our model is able to learn the syntax of the grammar's type system along with its denotational semantics. This lifts the closed world assumption commonly made by lexicalized grammar supertaggers, greatly enhancing its generalization potential. This is evidenced both by its adequate accuracy over sparse word types and its ability to correctly construct complex types never seen during training, which, to the best of our knowledge, was as of yet unaccomplished.Comment: REPL4NLP 4, ACL 201

arXiv.org e-Print Archive

Utrecht University Repository

Unsupervised Induction of Cross-Lingual Semantic Relations

Author: Lewis Mike
Steedman Mark
Publication venue
Publication date: 01/01/2013
Field of study

Edinburgh Research Explorer

Semi-supervised CCG Lexicon Extension

Author: Steedman Mark
Thomforde Emily
Publication venue
Publication date: 01/01/2011
Field of study

This paper introduces Chart Inference (CI), an algorithm for deriving a CCG category for an unknown word from a partial parse chart. It is shown to be faster and more precise than a baseline brute-force method, and to achieve wider coverage than a rule-based system. In addition, we show the application of CI to a domain adaptation task for question words, which are largely missing in the Penn Treebank. When used in combination with self-training, CI increases the precision of the baseline StatCCG parser over subjectextraction questions by 50%. An error analysis shows that CI contributes to the increase by expanding the number of category types available to the parser, while self-training adjusts the counts.

CiteSeerX

Edinburgh Research Explorer

Porting a lexicalized-grammar parser to the biomedical domain

Author: Clark Stephen
Rimell Laura
Publication venue: Elsevier Inc.
Publication date: 31/10/2009
Field of study

AbstractThis paper introduces a state-of-the-art, linguistically motivated statistical parser to the biomedical text mining community, and proposes a method of adapting it to the biomedical domain requiring only limited resources for data annotation. The parser was originally developed using the Penn Treebank and is therefore tuned to newspaper text. Our approach takes advantage of a lexicalized grammar formalism, Combinatory Categorial Grammar (ccg), to train the parser at a lower level of representation than full syntactic derivations. The ccg parser uses three levels of representation: a first level consisting of part-of-speech (pos) tags; a second level consisting of more fine-grained ccg lexical categories; and a third, hierarchical level consisting of ccg derivations. We find that simply retraining the pos tagger on biomedical data leads to a large improvement in parsing performance, and that using annotated data at the intermediate lexical category level of representation improves parsing accuracy further. We describe the procedure involved in evaluating the parser, and obtain accuracies for biomedical data in the same range as those reported for newspaper text, and higher than those previously reported for the biomedical resource on which we evaluate. Our conclusion is that porting newspaper parsers to the biomedical domain, at least for parsers which use lexicalized grammars, may not be as difficult as first thought

Elsevier - Publisher Connector