142 research outputs found
Results of the WMT15 Tuning Shared Task
This paper presents the results of the WMT15 Tuning Shared Task. We provided the
participants of this task with a complete machine translation system and asked them to tune its
internal parameters (feature weights). The tuned systems were used to translate the test set and
the outputs were manually ranked for translation quality. We received 4 submissions in the
English-Czech and 6 in the Czech-English translation direction. In addition, we ran
3 baseline setups, tuning the
parameters with standard optimizers for BLEU score
Native Language Identification on Text and Speech
This paper presents an ensemble system combining the output of multiple SVM
classifiers to native language identification (NLI). The system was submitted
to the NLI Shared Task 2017 fusion track which featured students essays and
spoken responses in form of audio transcriptions and iVectors by non-native
English speakers of eleven native languages. Our system competed in the
challenge under the team name ZCD and was based on an ensemble of SVM
classifiers trained on character n-grams achieving 83.58% accuracy and ranking
3rd in the shared task.Comment: Proceedings of the Workshop on Innovative Use of NLP for Building
Educational Applications (BEA
Phrase-level System Combination for Machine Translation Based on Target-to-Target Decoding
In this paper, we propose a novel lattice-based MT combination methodology that we call Target-to-Target Decoding (TTD). The combination process is carried out as a ātranslationā from backbone to the combination result. This perspective suggests the use of existing phrase-based MT techniques in the combination framework. We show how phrase extraction rules and confidence estimations inspired from machine translation improve results. We also propose system-specific LMs for estimating N-gram consensus. Our results show that our approach yields a strong improvement over the best single MT system and competes with other state-of-the-art combination systems
Combining semantic and syntactic generalization in example-based machine translation
In this paper, we report our experiments in combining two EBMT systems that rely on generalized templates, Marclator and CMU-EBMT, on an EnglishāGerman translation task. Our goal was to see whether a statistically signiļ¬cant improvement could be achieved over the individual performances of these two systems. We observed that this was not the case. However, our system consistently outperformed a lexical EBMT baseline system
A Grain of Salt for the WMT Manual Evaluation
The Workshop on Statistical Machine Translation (WMT) has become one of ACL's
flagship workshops, held annually since 2006. In addition to soliciting papers
from the research community, WMT also features a shared translation task for
evaluating MT systems. This shared task is notable for having manual evaluation
as its cornerstone.
The Workshop's overview paper, playing a descriptive and administrative role, reports
the main results of the evaluation without delving deep into analyzing those results.
The aim of this paper is to investigate and explain some interesting idiosyncrasies
in the reported results, which only become apparent when performing a more thorough
analysis of the collected annotations. Our analysis sheds some light on how the
reported results should (and should not) be interpreted, and also gives rise to some helpful
recommendation for the organizers of WMT
Findings of the 2011 Workshop on Statistical Machine Translation
This paper presents the results of the WMT11 shared tasks, which included a translation task, a system combination task, and a task for machine translation evaluation metrics. We conducted a large-scale manual evaluation of 148 machine translation systems and 41 system combination entries. We used the ranking of these systems to measure how strongly automatic metrics correlate with human judgments of translation quality for 21 evaluation metrics. This year featured a Haitian Creole to English task translating SMS messages sent to an emergency response service in the aftermath of the Haitian earthquake. We also conducted a pilot 'tunable metrics' task to test whether optimizing a fixed system to different metrics would result in perceptibly different translation quality
Putting Human Assessments of Machine Translation Systems in Order
Human assessment is often considered the gold standard in evaluation of translation systems. But in order for the evaluation to be meaningful, the rankings obtained from human assessment must be consistent and repeatable. Recent analysis by Bojar et al. (2011) raised several concerns about the rankings derived from human assessments of English-Czech translation systems in the 2010 Workshop on Machine Translation. We extend their analysis to all of the ranking tasks from 2010 and 2011, and show through an extension of their reasoning that the ranking is naturally cast as an instance of finding the minimum feedback arc set in a tournament, a wellknown NP-complete problem. All instances of this problem in the workshop data are efficiently solvable, but in some cases the rankings it produces are surprisingly different from the ones previously published. This leads to strong caveats and recommendations for both producers and consumers of these rankings.
Unfolding and Shrinking Neural Machine Translation Ensembles
Ensembling is a well-known technique in neural machine translation (NMT) to
improve system performance. Instead of a single neural net, multiple neural
nets with the same topology are trained separately, and the decoder generates
predictions by averaging over the individual models. Ensembling often improves
the quality of the generated translations drastically. However, it is not
suitable for production systems because it is cumbersome and slow. This work
aims to reduce the runtime to be on par with a single system without
compromising the translation quality. First, we show that the ensemble can be
unfolded into a single large neural network which imitates the output of the
ensemble system. We show that unfolding can already improve the runtime in
practice since more work can be done on the GPU. We proceed by describing a set
of techniques to shrink the unfolded network by reducing the dimensionality of
layers. On Japanese-English we report that the resulting network has the size
and decoding speed of a single NMT network but performs on the level of a
3-ensemble system.Comment: Accepted at EMNLP 201
- ā¦