Search CORE

18 research outputs found

Probabilistic Parsing Strategies

Author: Nederhof Mark-Jan
Satta Giorgio
Publication venue
Publication date: 01/01/2002
Field of study

We present new results on the relation between purely symbolic context-free parsing strategies and their probabilistic counter-parts. Such parsing strategies are seen as constructions of push-down devices from grammars. We show that preservation of probability distribution is possible under two conditions, viz. the correct-prefix property and the property of strong predictiveness. These results generalize existing results in the literature that were obtained by considering parsing strategies in isolation. From our general results we also derive negative results on so-called generalized LR parsing.Comment: 36 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Università di Padova

Probabilistic parsing strategies.

Author: Giorgio Satta
Mark-Jan Nederhof
Publication venue
Publication date: 01/01/2004
Field of study

Abstract. We present new results on the relation between purely symbolic context-free parsing strategies and their probabilistic counterparts. Such parsing strategies are seen as constructions of pushdown devices from grammars. We show that preservation of probability distribution is possible under two conditions, viz. the correct-prefix property and the property of strong predictiveness. These results generalize existing results in the literature that were obtained by considering parsing strategies in isolation. From our general results, we also derive negative results on so-called generalized LR parsing

CiteSeerX

Probabilistic parsing

Author: Nederhof Mark Jan
Satta Giorgio
Publication venue: Springer
Publication date: 06/01/2011
Field of study

Postprin

St Andrews Research Repository

Pushdown automata in statistical machine translation

Author: Adrià de Gispert
Aho Alfred V.
Bar-Hillel Y.
Bill Byrne
Blackwood Graeme
Brants Thorsten
Chang Yin-Wen
Chelba Ciprian
Cyril Allauzen
Dyer Chris
Gonzalo Iglesias
Hopkins M.
Huang Liang
Huang Liang
Huang Liang
Koo Terry
Kumar Shankar
Ljolje Andrej
Michael Riley
Mohri Mehryar
Nederhof Mark-Jan
Roark Brian
Roark Brian
Rush Alexander M.
Stolcke Andreas
Stolcke Andreas
Wu Dekai
Zens Richard
Publication venue: Computational Linguistics
Publication date: 01/01/2013
Field of study

This article describes the use of pushdown automata (PDA) in the context of statistical machine translation and alignment under a synchronous context-free grammar. We use PDAs to compactly represent the space of candidate translations generated by the grammar when applied to an input sentence. General-purpose PDA algorithms for replacement, composition, shortest path, and expansion are presented. We describe HiPDT, a hierarchical phrase-based decoder using the PDA representation and these algorithms. We contrast the complexity of this decoder with a decoder based on a finite state automata representation, showing that PDAs provide a more suitable framework to achieve exact decoding for larger synchronous context-free grammars and smaller language models. We assess this experimentally on a large-scale Chinese-to-English alignment and translation task. In translation, we propose a two-pass decoding strategy involving a weaker language model in the first-pass to address the results of PDA complexity analysis. We study in depth the experimental conditions and tradeoffs in which HiPDT can achieve state-of-the-art performance for large-scale SMT. </jats:p

CiteSeerX

Crossref

Apollo (Cambridge)

Algebraic Methods in Language Processing:Proceedings of the twenty-first Twente workshop on language technology

Author
Publication venue: 'University Library/University of Twente'
Publication date: 15/08/2003
Field of study

University of Twente Research Information

The mat sat on the cat : investigating structure in the evaluation of order in machine translation

Author: McCaffery Martin
Publication venue: The University of St Andrews
Publication date: 14/11/2017
Field of study

We present a multifaceted investigation into the relevance of word order in machine translation. We introduce two tools, DTED and DERP, each using dependency structure to detect differences between the structures of machine-produced translations and human-produced references. DTED applies the principle of Tree Edit Distance to calculate edit operations required to convert one structure into another. Four variants of DTED have been produced, differing in the importance they place on words which match between the two sentences. DERP represents a more detailed procedure, making use of the dependency relations between words when evaluating the disparities between paths connecting matching nodes. In order to empirically evaluate DTED and DERP, and as a standalone contribution, we have produced WOJ-DB, a database of human judgments. Containing scores relating to translation adequacy and more specifically to word order quality, this is intended to support investigations into a wide range of translation phenomena. We report an internal evaluation of the information in WOJ-DB, then use it to evaluate variants of DTED and DERP, both to determine their relative merit and their strength relative to third-party baselines. We present our conclusions about the importance of structure to the tools and their relevance to word order specifically, then propose further related avenues of research suggested or enabled by our work

St Andrews Research Repository

Probabilistic Parsing Strategies

Author: Nederhof M. J.
Satta Giorgio
Publication venue
Publication date: 01/01/2003
Field of study

Archivio istituzionale della ricerca - Università di Padova

Probabilistic parsing strategies

Author: Nederhof Mark Jan
Satta G.
Publication venue
Publication date: 01/05/2006
Field of study

We present new results on the relation between purely symbolic context-free parsing strategies and their probabilistic counterparts. Such parsing strategies are seen as constructions of push-down devices from grammars. We show that preservation of probability distribution is possible under two conditions, viz. the correct-prefix property and the property of strong predictiveness. These results generalize existing results in the literature that were obtained by considering parsing strategies in isolation. From our general results, we also derive negative results on so-called generalized LR parsing.</p