Search CORE

453 research outputs found

A syntactified direct translation model with linear-time decoding

Author: Hassan Hany
Sima'an Khalil
Way Andy
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2009
Field of study

Recent syntactic extensions of statistical translation models work with a synchronous context-free or tree-substitution grammar extracted from an automatically parsed parallel corpus. The decoders accompanying these extensions typically exceed quadratic time complexity. This paper extends the Direct Translation Model 2 (DTM2) with syntax while maintaining linear-time decoding. We employ a linear-time parsing algorithm based on an eager, incremental interpretation of Combinatory Categorial Grammar (CCG). As every input word is processed, the local parsing decisions resolve ambiguity eagerly, by selecting a single supertag–operator pair for extending the dependency parse incrementally. Alongside translation features extracted from the derived parse tree, we explore syntactic features extracted from the incremental derivation process. Our empirical experiments show that our model significantly outperforms the state-of-the art DTM2 system

Crossref

Irish Universities

DCU Online Research Access Service

International Migration, Integration and Social Cohesion online publications

A syntactic language model based on incremental CCG parsing

Author: Hassan Hany
Sima'an Khalil
Way Andy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Syntactically-enriched language models (parsers) constitute a promising component in applications such as machine translation and speech-recognition. To maintain a useful level of accuracy, existing parsers are non-incremental and must span a combinatorially growing space of possible structures as every input word is processed. This prohibits their incorporation into standard linear-time decoders. In this paper, we present an incremental, linear-time dependency parser based on Combinatory Categorial Grammar (CCG) and classification techniques. We devise a deterministic transform of CCGbank canonical derivations into incremental ones, and train our parser on this data. We discover that a cascaded, incremental version provides an appealing balance between efficiency and accuracy

Crossref

Irish Universities

DCU Online Research Access Service

International Migration, Integration and Social Cohesion online publications

Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification

Author: Goldwater Sharon
Kwiatkowski Tom
Steedman Mark
Zettlemoyer Luke
Publication venue
Publication date: 01/01/2010
Field of study

Edinburgh Research Explorer

Grammar induction for mildly context sensitive languages using variational Bayesian inference

Author: Bergen Leon
Bruno Chris
Harasim Daniel
O'Donnell Timothy J.
Portelance Eva
Publication venue
Publication date: 06/08/2014
Field of study

The following technical report presents a formal approach to probabilistic minimalist grammar induction. We describe a formalization of a minimalist grammar. Based on this grammar, we define a generative model for minimalist derivations. We then present a generalized algorithm for the application of variational Bayesian inference to lexicalized mildly context sensitive language grammars which in this paper is applied to the previously defined minimalist grammar

arXiv.org e-Print Archive

Dryad Digital Repository (Duke University)

A Probabilistic Model of Syntactic and Semantic Acquisition from Child-Directed Utterances and their Meanings

Author: Goldwater Sharon
Kwiatkowski Tom
Steedman Mark
Zettlemoyer Luke
Publication venue
Publication date: 01/04/2012
Field of study

Edinburgh Research Explorer

Interaction Grammars

Author: Bruno Guillaume
Bruno Guillaume
Guy Perrier
Guy Perrier
Thème Sym
Équipe-projet Calligramme
Publication venue
Publication date: 01/01/2008
Field of study

Interaction Grammar (IG) is a grammatical formalism based on the notion of polarity. Polarities express the resource sensitivity of natural languages by modelling the distinction between saturated and unsaturated syntactic structures. Syntactic composition is represented as a chemical reaction guided by the saturation of polarities. It is expressed in a model-theoretic framework where grammars are constraint systems using the notion of tree description and parsing appears as a process of building tree description models satisfying criteria of saturation and minimality

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Unsupervised Dependency Parsing: Let's Use Supervised Parsers

Author: Le Phong
Zuidema Willem
Publication venue
Publication date: 01/01/2015
Field of study

We present a self-training approach to unsupervised dependency parsing that reuses existing supervised and unsupervised parsing algorithms. Our approach, called `iterated reranking' (IR), starts with dependency trees generated by an unsupervised parser, and iteratively improves these trees using the richer probability models used in supervised parsing that are in turn trained on these trees. Our system achieves 1.8% accuracy higher than the state-of-the-part parser of Spitkovsky et al. (2013) on the WSJ corpus.Comment: 11 page

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

CCG Parsing and Multiword Expressions

Author: de Lhoneux Miryam
Publication venue
Publication date: 17/05/2015
Field of study

This thesis presents a study about the integration of information about Multiword Expressions (MWEs) into parsing with Combinatory Categorial Grammar (CCG). We build on previous work which has shown the benefit of adding information about MWEs to syntactic parsing by implementing a similar pipeline with CCG parsing. More specifically, we collapse MWEs to one token in training and test data in CCGbank, a corpus which contains sentences annotated with CCG derivations. Our collapsing algorithm however can only deal with MWEs when they form a constituent in the data which is one of the limitations of our approach. We study the effect of collapsing training and test data. A parsing effect can be obtained if collapsed data help the parser in its decisions and a training effect can be obtained if training on the collapsed data improves results. We also collapse the gold standard and show that our model significantly outperforms the baseline model on our gold standard, which indicates that there is a training effect. We show that the baseline model performs significantly better on our gold standard when the data are collapsed before parsing than when the data are collapsed after parsing which indicates that there is a parsing effect. We show that these results can lead to improved performance on the non-collapsed standard benchmark although we fail to show that it does so significantly. We conclude that despite the limited settings, there are noticeable improvements from using MWEs in parsing. We discuss ways in which the incorporation of MWEs into parsing can be improved and hypothesize that this will lead to more substantial results. We finally show that turning the MWE recognition part of the pipeline into an experimental part is a useful thing to do as we obtain different results with different recognizers.Comment: MSc thesis, The University of Edinburgh, 2014, School of Informatics, MSc Artificial Intelligenc

arXiv.org e-Print Archive

CiteSeerX