Search CORE

4 research outputs found

Robust language pair-independent sub-tree alignment

Author: Hearne Mary
Tinsley John
Way Andy
Zhechev Ventsislav
Publication venue: European Association for Machine Translation
Publication date: 01/01/2007
Field of study

Data-driven approaches to machine translation (MT) achieve state-of-the-art results. Many syntax-aware approaches, such as Example-Based MT and Data-Oriented Translation, make use of tree pairs aligned at sub-sentential level. Obtaining sub-sentential alignments manually is time-consuming and error-prone, and requires expert knowledge of both source and target languages. We propose a novel, language pair-independent algorithm which automatically induces alignments between phrase-structure trees. We evaluate the alignments themselves against a manually aligned gold standard, and perform an extrinsic evaluation by using the aligned data to train and test a DOT system. Our results show that translation accuracy is comparable to that of the same translation system trained on manually aligned data, and coverage improves

Irish Universities

DCU Online Research Access Service

Automatic generation of parallel treebanks: an efficient unsupervised system

Author: Zhechev Ventsislav
Publication venue: Dublin City University. School of Computing
Publication date: 01/01/2009
Field of study

The need for syntactically annotated data for use in natural language processing has increased dramatically in recent years. This is true especially for parallel treebanks, of which very few exist. The ones that exist are mainly hand-crafted and too small for reliable use in data-oriented applications. In this work I introduce a novel open-source platform for the fast and robust automatic generation of parallel treebanks through sub-tree alignment, using a limited amount of external resources. The intrinsic and extrinsic evaluations that I undertook demonstrate that my system is a feasible alternative to the manual annotation of parallel treebanks. Therefore, I expect the presented platform to help boost research in the field of syntaxaugmented machine translation and lead to advancements in other fields where parallel treebanks can be employed

CiteSeerX

Irish Universities

DCU Online Research Access Service