Search CORE

90 research outputs found

Recommended from our members

Phrase-level System Combination for Machine Translation Based on Target-to-Target Decoding

Author: Ma Wei-Yun
McKeown Kathleen
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2012
Field of study

In this paper, we propose a novel lattice-based MT combination methodology that we call Target-to-Target Decoding (TTD). The combination process is carried out as a “translation” from backbone to the combination result. This perspective suggests the use of existing phrase-based MT techniques in the combination framework. We show how phrase extraction rules and confidence estimations inspired from machine translation improve results. We also propose system-specific LMs for estimating N-gram consensus. Our results show that our approach yields a strong improvement over the best single MT system and competes with other state-of-the-art combination systems

Columbia University Academic Commons

Marker-based filtering of bilingual phrase pairs for SMT

Author: Sánchez-Martínez Felipe
Way Andy
Publication venue: European Association for Machine Translation
Publication date: 01/01/2009
Field of study

State-of-the-art statistical machine translation systems make use of a large translation table obtained after scoring a set of bilingual phrase pairs automatically extracted from a parallel corpus. The number of bilingual phrase pairs extracted from a pair of aligned sentences grows exponentially as the length of the sentences increases; therefore, the number of entries in the phrase table used to carry out the translation may become unmanageable, especially when online, 'on demand' translation is required in real time. We describe the use of closed-class words to filter the set of bilingual phrase pairs extracted from the parallel corpus by taking into account the alignment information and the type of the words involved in the alignments. On four European language pairs, we show that our simple yet novel approach can filter the phrase table by up to a third yet still provide competitive results compared to the baseline. Furthermore, it provides a nice balance between the unfiltered approach and pruning using stop words, where the deterioration in translation quality is unacceptably high

Repositorio Institucional de la Universidad de Alicante

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

DCU Online Research Access Service

System Combination via Quality Estimation for Grammatical Error Correction

Author: Ng Hwee Tou
Qorib Muhammad Reza
Publication venue
Publication date: 23/10/2023
Field of study

Quality estimation models have been developed to assess the corrections made by grammatical error correction (GEC) models when the reference or gold-standard corrections are not available. An ideal quality estimator can be utilized to combine the outputs of multiple GEC systems by choosing the best subset of edits from the union of all edits proposed by the GEC base systems. However, we found that existing GEC quality estimation models are not good enough in differentiating good corrections from bad ones, resulting in a low F0.5 score when used for system combination. In this paper, we propose GRECO, a new state-of-the-art quality estimation model that gives a better estimate of the quality of a corrected sentence, as indicated by having a higher correlation to the F0.5 score of a corrected sentence. It results in a combined GEC system with a higher F0.5 score. We also propose three methods for utilizing GEC quality estimation models for system combination with varying generality: model-agnostic, model-agnostic with voting bias, and model-dependent method. The combined GEC system outperforms the state of the art on the CoNLL-2014 test set and the BEA-2019 test set, achieving the highest F0.5 scores published to date.Comment: EMNLP 202

arXiv.org e-Print Archive

Identifying Semantic Divergences in Parallel Text without Annotations

Author: Carpuat Marine
Niu Xing
Vyas Yogarshi
Publication venue
Publication date: 01/01/2018
Field of study

Recognizing that even correct translations are not always semantically equivalent, we automatically detect meaning divergences in parallel sentence pairs with a deep neural model of bilingual semantic similarity which can be trained for any parallel corpus without any manual annotation. We show that our semantic model detects divergences more accurately than models based on surface features derived from word alignments, and that these divergences matter for neural machine translation.Comment: Accepted as a full paper to NAACL 201

arXiv.org e-Print Archive

Crossref

Findings of the 2009 workshop on statistical machine translation

Author: Callison-Burch Chris
Koehn Philipp
Monz Christof
Schroeder Josh
Publication venue
Publication date: 01/01/2009
Field of study

Crossref

Edinburgh Research Explorer

International Migration, Integration and Social Cohesion online publications

Findings of the 2015 Workshop on Statistical Machine Translation

Author: Bojar Ondrej
Chatterjee Rajen
Federmann Christian
Haddow Barry
Hokamp Chris
Huck Matthias
Koehn Philipp
Logacheva Varvara
Monz Christof
Negri Matteo
Post Matt
Scarton Carolina
Specia Lucia
Turchi Marco
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2015
Field of study

This paper presents the results of the WMT15 shared tasks, which included a standard news translation task, a metrics task, a tuning task, a task for run-time estimation of machine translation quality, and an automatic post-editing task. This year, 68 machine translation systems from 24 institutions were submitted to the ten translation directions in the standard translation task. An additional 7 anonymized systems were included, and were then evaluated both automatically and manually. The quality estimation task had three subtasks, with a total of 10 teams, submitting 34 entries. The pilot automatic postediting task had a total of 4 teams, submitting 7 entries

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Edinburgh Research Explorer

Publikationsserver der RWTH Aachen University

Biblio at Institute of Formal and Applied Linguistics

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Findings of the IWSLT 2022 Evaluation Campaign.

Author: Alexander Waibel
Anna Currey
Antonios Anastasopoulos
Barry Haddow
Benjamin Hsu
Changhan Wang
Christian Federmann
Clara Emmanuel
Dávid Javorský
Elizabeth Salesky
Georgiana Dinu
Hongyu Gong
Jan Niehues
Jiatong Shi
John Ortega
Juan Pino
Katsuhito Sudoh
Kenton Murray
Kevin Duh
Loc Barrault
Luisa Bentivogli
Maha Elbayad
Marcello Federico
Marcely Zanon Boito
Marco Turchi
Maria Nǎdejde
Matteo Negri
Matthias Sperber
Ondřej Bojar
Paul McNamee
Prashant Mathur
Roldano Cattoni
Roman Grundkiewicz
Satoshi Nakamura
Sebastian Stüker
Shinji Watanabe
Souhir Gahbiche
Surafel Lakew
Vĕra Kloudová
Xing Niu
Xutai Ma
Yannick Estève
Yogesh Virkar
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2022
Field of study

The evaluation campaign of the 19th International Conference on Spoken Language Translation featured eight shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Speech to speech translation, (iv) Low-resource speech translation, (v) Multilingual speech translation, (vi) Dialect speech translation, (vii) Formality control for speech translation, (viii) Isometric speech translation. A total of 27 teams participated in at least one of the shared tasks. This paper details, for each shared task, the purpose of the task, the data that were released, the evaluation metrics that were applied, the submissions that were received and the results that were achieved

Archivio della ricerca - Fondazione Bruno Kessler

Findings of the IWSLT 2022 Evaluation Campaign

Author: Anastasopoulos Antonios
Barrault Loı̈c
Bentivogli Luisa
Bojar Ondřej
Cattoni Roldano
Currey Anna
Dinu Georgiana
Duh Kevin
Elbayad Maha
Emmanuel Clara
Estève Yannick
Federico Marcello
Federmann Christian
Gahbiche Souhir
Gong Hongyu
Grundkiewicz Roman
Haddow Barry
Hsu Benjamin
Javorský Dávid
Kloudová Vĕra
Lakew Surafel
Ma Xutai
Mathur Prashant
McNamee Paul
Murray Kenton
Nakamura Satoshi
Negri Matteo
Niehues Jan
Niu Xing
Nǎdejde Maria
Ortega John
Pino Juan
Salesky Elizabeth
Shi Jiatong
Sperber Matthias
Stüker Sebastian
Sudoh Katsuhito
Turchi Marco
Virkar Yogesh
Waibel Alexander
Wang Changhan
Watanabe Shinji
Zanon Boito Marcely
Publication venue: Association for Computational Linguistics
Publication date: 21/06/2022
Field of study

KITopen

Findings of the 2014 Workshop on Statistical Machine Translation

Author: Bojar Ondrej
Buck Christian
Federmann Christian
Haddow Barry
Koehn Philipp
Leveling Johannes
Monz Christof
Pecina Pavel
Post Matt
Saint-Amand Herve
Soricut Radu
Specia Lucia
Tamchyna Ales
Publication venue
Publication date: 01/01/2014
Field of study

This paper presents the results of the WMT14 shared tasks, which included a standard news translation task, a separate medical translation task, a task for run-time estimation of machine translation quality, and a metrics task. This year, 143 machine translation systems from 23 institutions were submitted to the ten translation directions in the standard translation task. An additional 6 anonymized systems were included, and were then evaluated both automatically and manually. The quality estimation task had four subtasks, with a total of 10 teams, submitting 57 entries

Crossref

Edinburgh Research Explorer

Biblio at Institute of Formal and Applied Linguistics

International Migration, Integration and Social Cohesion online publications