Search CORE

1,787 research outputs found

Results of the WMT15 Tuning Shared Task

Author: Bojar Ondřej
Kamran Amir
Stanojević Miloš
Publication venue
Publication date: 01/01/2015
Field of study

This paper presents the results of the WMT15 Tuning Shared Task. We provided the participants of this task with a complete machine translation system and asked them to tune its internal parameters (feature weights). The tuned systems were used to translate the test set and the outputs were manually ranked for translation quality. We received 4 submissions in the English-Czech and 6 in the Czech-English translation direction. In addition, we ran 3 baseline setups, tuning the parameters with standard optimizers for BLEU score

Crossref

Publikationsserver der RWTH Aachen University

Biblio at Institute of Formal and Applied Linguistics

Agreement Constraints for Statistical Machine Translation into German

Author: Koehn Philipp
Williams Philip
Publication venue
Publication date: 01/07/2011
Field of study

Edinburgh Research Explorer

Benchmarking SMT performance for Farsi using the TEP++ Corpus

Author: Liu Qun
Passban Peyman
Way Andy
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/05/2015
Field of study

Statistical machine translation (SMT) suffers from various problems which are exacerbated where training data is in short supply. In this paper we address the data sparsity problem in the Farsi (Persian) language and introduce a new parallel corpus, TEP++. Compared to previous results the new dataset is more efficient for Farsi SMT engines and yields better output. In our experiments using TEP++ as bilingual training data and BLEU as a metric, we achieved improvements of +11.17 (60%) and +7.76 (63.92%) in the Farsi– English and English–Farsi directions, respectively. Furthermore we describe an engine (SF2FF) to translate between formal and informal Farsi which in terms of syntax and terminology can be seen as different languages. The SF2FF engine also works as an intelligent normalizer for Farsi texts. To demonstrate its use, SF2FF was used to clean the IWSLT–2013 dataset to produce normalized data, which gave improvements in translation quality over FBK’s Farsi engine when used as training dat

Irish Universities

DCU Online Research Access Service

Syntax and Rich Morphology in MT

Author: Bojar Ondřej
Publication venue
Publication date: 01/01/2010
Field of study

The talk describes in detail the issues specific to English-to-Czech MT: sentence syntax and target-side rich morphology

Biblio at Institute of Formal and Applied Linguistics

English-to-Czech MT: Large Data and Beyond

Author: Bojar Ondřej
Publication venue
Publication date: 06/12/2018
Field of study

CU Digital Repository

Proceedings of the 17th Annual Conference of the European Association for Machine Translation

Author
Publication venue: Hrvatsko društvo za jezične tehnologije
Publication date: 01/01/2014
Field of study

Proceedings of the 17th Annual Conference of the European Association for Machine Translation (EAMT

Repozitorij Filozofskog fakulteta u Zagrebu' at University of Zagreb

Proceedings

Author: Bick Eckhard
Hagen Kristin
Müürisep Kaili
Trosterud Trond
Publication venue
Publication date: 17/11/2011
Field of study

Proceedings of the NODALIDA 2011 Workshop Constraint Grammar Applications. Editors: Eckhard Bick, Kristin Hagen, Kaili Müürisep, Trond Trosterud. NEALT Proceedings Series, Vol. 14 (2011), vi+69 pp. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/19231

DSpace at Tartu University Library

Domain adaptation for statistical machine translation of corporate and user-generated content

Author: Banerjee Pratyush
Publication venue: Dublin City University. School of Computing
Publication date: 01/03/2013
Field of study

The growing popularity of Statistical Machine Translation (SMT) techniques in recent years has led to the development of multiple domain-specic resources and adaptation scenarios. In this thesis we address two important and industrially relevant adaptation scenarios, each suited to different kinds of content. Initially focussing on professionally edited `enterprise-quality' corporate content, we address a specic scenario of data translation from a mixture of different domains where, for each of them domain-specific data is available. We utilise an automatic classifier to combine multiple domain-specific models and empirically show that such a configuration results in better translation quality compared to both traditional and state-of-the-art techniques for handling mixed domain translation. In the second phase of our research we shift our focus to the translation of possibly `noisy' user-generated content in web-forums created around products and services of a multinational company. Using professionally edited translation memory (TM) data for training, we use different normalisation and data selection techniques to adapt SMT models to noisy forum content. In this scenario, we also study the effect of mixture adaptation using a combination of in-domain and out-of-domain data at different component levels of an SMT system. Finally we focus on the task of optimal supplementary training data selection from out-of-domain corpora using a novel incremental model merging mechanism to adapt TM-based models to improve forum-content translation quality

Irish Universities

DCU Online Research Access Service