69 research outputs found
Marker-based filtering of bilingual phrase pairs for SMT
State-of-the-art statistical machine translation
systems make use of a large translation table obtained after scoring a set of bilingual phrase pairs automatically extracted from a parallel corpus. The number of bilingual phrase pairs extracted from a pair of aligned sentences grows exponentially as the length of the sentences increases; therefore, the number of entries in the phrase table used to carry out the translation may become unmanageable, especially when online, 'on demand' translation is required in real time. We describe
the use of closed-class words to filter the set of bilingual phrase pairs extracted from the parallel corpus by taking into account the alignment information
and the type of the words involved in the alignments. On four European language pairs, we show that our simple yet novel approach can filter the phrase table by up to
a third yet still provide competitive results compared to the baseline. Furthermore, it provides a nice balance between the unfiltered approach and pruning using stop
words, where the deterioration in translation quality is unacceptably high
Findings of the IWSLT 2022 Evaluation Campaign.
The evaluation campaign of the 19th International Conference on Spoken Language Translation featured eight shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Speech to speech translation, (iv) Low-resource speech translation, (v) Multilingual speech translation, (vi) Dialect speech translation, (vii) Formality control for speech translation, (viii) Isometric speech translation. A total of 27 teams participated in at least one of the shared tasks. This paper details, for each shared task, the purpose of the task, the data that were released, the evaluation metrics that were applied, the submissions that were received and the results that were achieved
Results of the WMT15 Tuning Shared Task
This paper presents the results of the WMT15 Tuning Shared Task. We provided the
participants of this task with a complete machine translation system and asked them to tune its
internal parameters (feature weights). The tuned systems were used to translate the test set and
the outputs were manually ranked for translation quality. We received 4 submissions in the
English-Czech and 6 in the Czech-English translation direction. In addition, we ran
3 baseline setups, tuning the
parameters with standard optimizers for BLEU score
Developing Deployable Spoken Language Translation Systems given Limited Resources
Approaches are presented that support the deployment of spoken language translation systems. Newly developed methods allow low cost portability to new language pairs. Proposed translation model pruning techniques achieve a high translation performance even in low memory situations. The named entity and specialty vocabulary coverage, particularly on small and mobile devices, is targeted to an individual user by translation model personalization
Phrase-level System Combination for Machine Translation Based on Target-to-Target Decoding
In this paper, we propose a novel lattice-based MT combination methodology that we call Target-to-Target Decoding (TTD). The combination process is carried out as a “translation” from backbone to the combination result. This perspective suggests the use of existing phrase-based MT techniques in the combination framework. We show how phrase extraction rules and confidence estimations inspired from machine translation improve results. We also propose system-specific LMs for estimating N-gram consensus. Our results show that our approach yields a strong improvement over the best single MT system and competes with other state-of-the-art combination systems
Coupling hierarchical word reordering and decoding in phrase-based statistical machine translation
In this paper, we start with the existing idea of taking reordering rules automatically derived from syntactic representations, and applying them in a preprocessing step before translation to make the source sentence structurally more like the target; and we propose a new approach to hierarchically extracting these rules. We evaluate this, combined with a lattice-based decoding, and show improvements over stateof-the-art distortion models.Postprint (published version
- …