Comparing constituency and dependency representations for SMT phrase-extraction

Hearne, Mary; Ozdowska, Sylwia; Tinsley, John

research

Comparing constituency and dependency representations for SMT phrase-extraction

Authors: Mary Hearne
Sylwia Ozdowska
John Tinsley
Publication date: 1 January 2008
Publisher

Abstract

We consider the value of replacing and/or combining string-based methods with syntax-based methods for phrase-based statistical machine translation (PBSMT), and we also consider the relative merits of using constituency-annotated vs. dependency-annotated training data. We automatically derive two subtree-aligned treebanks, dependency-based and constituency-based, from a parallel English–French corpus and extract syntactically motivated word- and phrase-pairs. We automatically measure PB-SMT quality. The results show that combining string-based and syntax-based word- and phrase-pairs can improve translation quality irrespective of the type of syntactic annotation. Furthermore, using dependency annotation yields greater translation quality than constituency annotation for PB-SMT

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

DCU Online Research Access Service

oai:doras.dcu.ie:15193

Last time updated on 10/07/2013

Irish Universities

Last time updated on 30/12/2017

Name not available

oai:doras.dcu.ie:15193

Last time updated on 09/02/2018