Search CORE

36,403 research outputs found

A detailed analysis of phrase-based and syntax-based machine translation: the search for systematic differences

Author: Jennifer Foster
Johann Roturier
Raphael Rubino
Rasoul Samad
Zadeh Kaljahi
Publication venue
Publication date: 01/11/2012
Field of study

This paper describes a range of automatic and manual comparisons of phrase-based and syntax-based statistical machine translation methods applied to English-German and English-French translation of user-generated content. The syntax-based methods underperform the phrase-based models and the relaxation of syntactic constraints to broaden translation rule coverage means that these models do not necessarily generate output which is more grammatical than the output produced by the phrase-based models. Although the systems generate different output and can potentially be fruitfully combined, the lack of systematic difference between these models makes the combination task more challenging

CiteSeerX

Irish Universities

DCU Online Research Access Service

Comparing constituency and dependency representations for SMT phrase-extraction

Author: Hearne Mary
Ozdowska Sylwia
Tinsley John
Publication venue
Publication date: 01/01/2008
Field of study

We consider the value of replacing and/or combining string-based methods with syntax-based methods for phrase-based statistical machine translation (PBSMT), and we also consider the relative merits of using constituency-annotated vs. dependency-annotated training data. We automatically derive two subtree-aligned treebanks, dependency-based and constituency-based, from a parallel English–French corpus and extract syntactically motivated word- and phrase-pairs. We automatically measure PB-SMT quality. The results show that combining string-based and syntax-based word- and phrase-pairs can improve translation quality irrespective of the type of syntactic annotation. Furthermore, using dependency annotation yields greater translation quality than constituency annotation for PB-SMT

Irish Universities

DCU Online Research Access Service

N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination

Author: Khalilov Maxim
Rodríguez Fonollosa José Adrián
Publication venue
Publication date: 30/03/2009
Field of study

In this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven translation system underlain by a phrase-based model and a target part parse tree. In N-gram-based SMT, the translation process is based on bilingual units related to word-to-word alignment and statistical modeling of the bilingual context following a maximumentropy framework. We provide a stepby- step comparison of the systems and report results in terms of automatic evaluation metrics and required computational resources for a smaller Arabic-to-English translation task (1.5M tokens in the training corpus). Human error analysis clarifies advantages and disadvantages of the systems under consideration. Finally, we combine the output of both systems to yield significant improvements in translation quality.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Pivot-based Hybrid Machine Translation to Support Multilingual Communication

Author: Nasution Arbi Haza
Setiawan Panji Rachmat
Suryani Des
Syafitri Nesi
Publication venue
Publication date: 21/12/2017
Field of study

Machine Translation (MT) is very useful in support- ing multicultural communication. Existing Statistical Machine Translation (SMT) which requires high quality and quantity of corpora and Rule-Based Machine Translation (RBMT) which requires bilingual dictionaries, morphological, syntax, and se- mantic analyzer are scarce for low-resource languages. Due to the lack of language resources, it is difficult to create MT from high-resource languages to low-resource languages like Indonesian ethnic languages. Nevertheless, Indonesian ethnic languages’ characteristics motivate us to introduce a Pivot- Based Hybrid Machine Translation (PHMT) by combining SMT and RBMT with Indonesian as a pivot which we further utilize in a multilingual communication support system. We evaluate PHMT translation quality with fluency and adequacy as metrics and then evaluate usability of the system. Despite the medium average translation quality (3.05 fluency score and 3.06 adequacy score), the 3.71 average mean scores of the usability evaluation indicates that the system is useful to support multilingual collaboration

Repository Universitas Islam Riau

Towards human linguistic machine translation evaluation

Author: Farrus Mireia
Ruiz Costa-Jussà Marta
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2013
Field of study

When evaluating machine translation outputs, linguistics is usually taken into account implicitly. Annotators have to decide whether a sentence is better than another or not, using, for example, adequacy and fluency criteria or, as recently proposed, editing the translation output so that it has the same meaning as a reference translation, and it is understandable. Therefore, the important fields of linguistics of meaning (semantics) and grammar (syntax) are indirectly considered. In this study, we propose to go one step further towards a linguistic human evaluation. The idea is to introduce linguistics implicitly by formulating precise guidelines. These guidelines strictly mark the difference between the sub-fields of linguistics such as: morphology, syntax, semantics, and orthography. We show our guidelines have a high inter-annotation agreement and wide-error coverage. Additionally, we examine how the linguistic human evaluation data correlate with: among different types of machine translation systems (rule and statistical-based); and with adequacy and fluency.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

UPF Digital Repository