Search CORE

78 research outputs found

Improved CCG Parsing with Semi-supervised Supertagging

Author: Lewis Mike
Steedman Mark
Publication venue
Publication date: 01/10/2014
Field of study

Current supervised parsers are limited by the size of their labelled training data, making improving them with unlabelled data an im-portant goal. We show how a state-of-the-art CCG parser can be enhanced, by pre-dicting lexical categories using unsupervised vector-space embeddings of words. The use of word embeddings enables our model to better generalize from the labelled data, and allows us to accurately assign lexical cate-gories without depending on a POS-tagger. Our approach leads to substantial improve-ments in dependency parsing results over the standard supervised CCG parser when evalu-ated on Wall Street Journal (0.8%), Wikipedia (1.8%) and biomedical (3.4%) text. We com-pare the performance of two recently proposed approaches for classification using a wide va-riety of word embeddings. We also give a de-tailed error analysis demonstrating where us-ing embeddings outperforms traditional fea-ture sets, and showing how including POS fea-tures can decrease accuracy

CiteSeerX

Edinburgh Research Explorer

Keystroke dynamics as signal for shallow syntactic parsing

Author: Plank Barbara
Publication venue
Publication date: 01/01/2016
Field of study

Keystroke dynamics have been extensively used in psycholinguistic and writing research to gain insights into cognitive processing. But do keystroke logs contain actual signal that can be used to learn better natural language processing models? We postulate that keystroke dynamics contain information about syntactic structure that can inform shallow syntactic parsing. To test this hypothesis, we explore labels derived from keystroke logs as auxiliary task in a multi-task bidirectional Long Short-Term Memory (bi-LSTM). Our results show promising results on two shallow syntactic parsing tasks, chunking and CCG supertagging. Our model is simple, has the advantage that data can come from distinct sources, and produces models that are significantly better than models trained on the text annotations alone.Comment: In COLING 201

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Generating CCG Categories

Author: Ji Tao
Lan Man
Liu Yufang
Wu Yuanbin
Publication venue
Publication date: 15/03/2021
Field of study

Previous CCG supertaggers usually predict categories using multi-class classification. Despite their simplicity, internal structures of categories are usually ignored. The rich semantics inside these structures may help us to better handle relations among categories and bring more robustness into existing supertaggers. In this work, we propose to generate categories rather than classify them: each category is decomposed into a sequence of smaller atomic tags, and the tagger aims to generate the correct sequence. We show that with this finer view on categories, annotations of different categories could be shared and interactions with sentence contexts could be enhanced. The proposed category generator is able to achieve state-of-the-art tagging (95.5% accuracy) and parsing (89.8% labeled F1) performances on the standard CCGBank. Furthermore, its performances on infrequent (even unseen) categories, out-of-domain texts and low resource language give promising results on introducing generation models to the general CCG analyses.Comment: Accepted by AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Shift-Reduce CCG Parsing using Neural Network Models

Author: Ambati B.R.
Deoskar T.
Steedman M.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

International Migration, Integration and Social Cohesion online publications

Recommended from our members

Inducing grammars from linguistic universals and realistic amounts of supervision

Author: Garrette Daniel Hunter
Publication venue
Publication date: 20/01/2017
Field of study

The best performing NLP models to date are learned from large volumes of manually-annotated data. For tasks like part-of-speech tagging and grammatical parsing, high performance can be achieved with plentiful supervised data. However, such resources are extremely costly to produce, making them an unlikely option for building NLP tools in under-resourced languages or domains. This dissertation is concerned with reducing the annotation required to learn NLP models, with the goal of opening up the range of domains and languages to which NLP technologies may be applied. In this work, we explore the possibility of learning from a degree of supervision that is at or close to the amount that could reasonably be collected from annotators for a particular domain or language that currently has none. We show that just a small amount of annotation input — even that which can be collected in just a few hours — can provide enormous advantages if we have learning algorithms that can appropriately exploit it. This work presents new algorithms, models, and approaches designed to learn grammatical information from weak supervision. In particular, we look at ways of intersecting a variety of different forms of supervision in complementary ways, thus lowering the overall annotation burden. Sources of information include tag dictionaries, morphological analyzers, constituent bracketings, and partial tree annotations, as well as unannotated corpora. For example, we present algorithms that are able to combine faster-to-obtain type-level annotation with unannotated text to remove the need for slower-to-obtain token-level annotation. Much of this dissertation describes work on Combinatory Categorial Grammar (CCG), a grammatical formalism notable for its use of structured, logic-backed categories that describe how each word and constituent fits into the overall syntax of the sentence. This work shows how linguistic universals intrinsic to the CCG formalism itself can be encoded as Bayesian priors to improve learning.Computer Science

Texas ScholarWorks

A* CCG Parsing with a Supertag-factored Model

Author: Lewis Mike
Steedman Mark
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2014
Field of study

We introduce a new CCG parsing model which is factored on lexical category as-signments. Parsing is then simply a de-terministic search for the most probable category sequence that supports a CCG derivation. The parser is extremely simple, with a tiny feature set, no POS tagger, and no statistical model of the derivation or dependencies. Formulating the model in this way allows a highly effective heuris-tic for A ∗ parsing, which makes parsing extremely fast. Compared to the standard C&C CCG parser, our model is more ac-curate out-of-domain, is four times faster, has higher coverage, and is greatly simpli-fied. We also show that using our parser improves the performance of a state-of-the-art question answering system.

CiteSeerX

Crossref

Edinburgh Research Explorer

Shift-Reduce CCG Parsing using Neural Network Models

Author: Ambati Bharat Ram
Deoskar Tejaswini
Steedman Mark
Publication venue
Publication date: 01/01/2016
Field of study

Edinburgh Research Explorer

UvA-DARE

International Migration, Integration and Social Cohesion online publications